The Packed Rewriting System (Transfer)

Copyright © 1999-2001 by the Xerox Corporation and Copyright © 2002-2007 by the Palo Alto Research Center. All rights reserved.

Introduction

The Packed Rewriting System (PRS)--usually called the Transfer System--applies rewrite rules to a set of packed input terms/facts to produce a set of packed output terms/facts. The system was originally designed to implement a transfer component in a machine translation system, although packed term rewriting has found application in a number of areas beside transfer. For transfer, the input is a set of packed facts representing the f-structures obtained by parsing a sentence. The output is a set of packed facts representing the f-structures from which sentences in the target language are to be generated. While this documentation primarily focuses on f-structure to f-structure transfer, it should be borne in mind that the transfer system is not just restricted to applications involving f-structure input or output.

Packed rewriting is significant, since an ambiguous sentence gives rise to more than one structure and it is possible for the number of structures to be in the hundreds or even thousands. However, the set of structures for a given sentence is packed into a single representation making it possible for common parts to appear only once. Thanks to the fact that the structures are packed, common parts of alternative structures can often be operated on by the transfer system as a unit, the results being reflected in each of the alternatives.

The transfer system operates on a source f-structure, represented as a set of (transfer) facts, to transform it incrementally into a target structure. The operation controlled by a transfer grammar consists of a list of rules whose order is important because each rule has the potential of changing the situation that the subsequent rules will encounter. In this respect, the rules are like phonological rules of the Chomsky/Halle variety. In particular rules can prevent later rules from applying by removing material that they would otherwise have applied to (bleeding) or they can enable the application of later rules by introducing material that they need (feeding). In this respect, this is different from other systems that have been proposed for transfer that build new structures based on observation, but not modification, of existing ones. In this system, as the process continues, the initial source-language structure takes on more and more of the properties of a target-language structure. Source facts that no rule in the sequence applies to simply become part of the target structure. Notice that it follows from what we have said that the transfer process never fails. Even if not a single rules applied, the output would simply be identical to the input. (See note).

The Transfer Process

To introduce the notation for writing transfer rules, we will consider a fairly simple example. For an even simpler example, see the XLE transfer walkthrough. More advanced transfer rule constructs are discussed in a later section. (Note: The rule notation was changed in June 2004 to remove some of the irritations of the original prolog syntax. The system can still use rules in the old prolog syntax, which is documented here.)

Suppose that the we wanted to translate the English sentence Mary sleeps into the French Marie dort. We would first parse the English sentence to get an English f-structure. Since transfer rules do not operate directly on f-structures, this must first be converted to a set of transfer facts, which are then provided as input to the transfer rules. The input facts are then converted to output transfer facts by the transfer system. The output transfer facts are then converted into a (French) f-structure, which can be fed into the French generator to produce a sentence.

Transfer Facts

The first thing to consider is the nature of transfer facts, and then how to convert f-structures to transfer facts. Transfer facts typically consists of a predicate, an opening parenthesis, a comma separated list of arguments, and a closing parenthesis, e.g.

predicate(argument1, argument2, argument3)

It is also possible to have atomic facts with no arguments, e.g.

predicate

Predicates should be atomic symbols, containing no commas, periods or parentheses. For reasons that will become clear below, predicates should not begin with the characters +, -, @, *, or, %. Arguments can be either atomic or non-atomic (i.e. embedded predicates), and may begin with +, -, @, or *. Atoms should not contain whitespace or any of the following characters:

. , ; | "
Should you need to include such characters in an atom, you can escape them by preceding them with a backquote (`). To include a backquote in an atom, escape it as well (i.e. ``).

Converting F-Structures to Transfer Facts

Let us now consider how the following f-structure can be represented as a set of transfer facts.

The following is the corresponding list of transfer facts: PRED(var(19),sleep),                The outermost structure
SUBJ(var(19),var(2)),
arg(var(19),1,var(2)),
lex_id(var(19),3),
LAYOUT-TYPE(var(19),unspec),
PASSIVE(var(19),-),
STMT-TYPE(var(19),declarative),
VTYPE(var(19),main),
TNS-ASP(var(19),var(3)),

PRED(var(2),Mary),
lex_id(var(2),1),
ANIM(var(2),+),                    <SUBJ>
CASE(var(2),nom),
GEND(var(2),fem),
NUM(var(2),sg),
PERS(var(2),3),
NTYPE(var(2),var(4)),
 
MOOD(var(3),indicative),            <TNS-ASP>
PERF(var(3),-),
PROG(var(3),-),
TENSE(var(3),pres),

PROPER(var(4),name)                <SUBJ, NTYPE>

Node Indices and Attribute-Value Pairs

To see how these facts correspond to the f-structure, it is first necessary to grasp the convention lying behind the use of var(n) arguments. These are to be interpreted as standing for f-structure nodes / indices. Thus the outermost node/index labeled 19 in the f-structure is represented by var(19), and the value of the SUBJ attribute labeled as 2 is represented by var(2). There are two other f-structure nodes that do not receive explicit labels in the graphical representation, namely the values of the TNS-ASP and NTYPE attributes. These are assigned the indices var(3) and var(4) in the transfer facts. (Note: Unfortunately, in practice the numeric labels assigned to the graphical f-structure representations rarely if ever correspond so directly to the numeric values assigned to the indices in the transfer facts. Thus label 19 might map onto a transfer index var(0) and label 2 might map onto an index var(1). This mismatch is the result of a similar labeling mismatch between graphical and prolog representations of f-structures. For ease of exposition, in this example we assume that 19 maps onto var(19), and so on.)

Looking at all the facts with var(19) as their first argument, we can see that most of them correspond directly to attribute value pairs in the f-structure. The fact that f-structure 2 is the value of the SUBJ attribute of 19 in the f-structure is represented as

 SUBJ(var(19), var(2))

The fact that the value of the PASSIVE attribute of 19 is - is represented as

PASSIVE(var(19), -)

The fact that the valueTNS-ASP attribute of 19 is a complex structure (indexed var(3)) is represented as

TNS-ASP(var(19), var(3))

Representing Semantic Forms

The PRED, arg and lex_id facts do not each correspond to an attribute-value pair. Instead, they decompose the single PRED-SemanticForm attribute-value pair into individual components.  A semantic form consists of four elements: the predicate (e.g. sleep); a unique semform identifier (not explicitly shown in the graphical f-structure representation, but present nonetheless); an ordered sequence of thematic arguments; and a (possibly empty) sequence of non-thematic arguments. The attribute-value pair for 19

PRED sleep'<[2:Mary]>'

In the prolog f-structure file that the parser writes and which provides input to the transfer system, this would be written as

eq(attr('PRED', var(19)), semform(sleep, 3, [var(2)], []))

where the four arguments to the semform are the predicate, the semform id, the list of thematic arguments, and the (empty) list of non-thematic arguments. This gets broken down into three transfer facts,

PRED(var(19), sleep)

says that the PRED FN of 19 is sleep.

lex_id(var(19), 3)

says that the semform id of 19's semform is 3.

arg(var(19), 1, var(2))

says that the first thematic argument of 19's semform is 2. Note that this fact has three arguments, rather than the usual two, since thematic arguments are numbered due to the fact that there may be more than one for a given semform.

Two more complicated examples of semantic forms and their corresponding transfer facts follow. The f-structure of the sentence "Place the scanner on the table" contains the semantic form

PRED 'place<[9:pro], [35:scanner], [58:on]>'

In the prolog f-structure file that the parser writes, this appears as (assuming no renumbering of indices):

cf(1,eq(attr(var(0),'PRED'), semform(place,1,[var(9),var(35),var(58)],[])))

In the transfer system, this is represented as follows:

PRED(var(0),place),
arg(var(0),1,var(9)),
arg(var(0),2,var(35)),
arg(var(0),3,var(58)),
lex_id(var(0),1)

The three thematic arguments each give rise to a separate arg fact, with the numbers 1, 2 and 3 indicating which are the first, second and third argument.

The main clause of the sentence "To replace the print head, you need to perform maintenance tasks.", has one argument and one non-argument, as follows:

Notice that the second argument appears outside the angle brackets.  The relevant transfer facts are the following:
PRED(var(0),need),
arg(var(0),1,var(163)),

lex_id(var(0),15),
nonarg(var(0),1,var(76))
Here, nonarg is used to indicate the (first and only) non-thematic argument. Semantic forms without any thematic and/or non-thematic arguments are indicated by the absence of any arg and/or nonarg facts

Sets

The simple example of Mary sleeps does not illustrate the representation of set-valued attributes. Consider the sentence Mary saw the big black dog which is assigned the following structure:
xxx
Notice that the ADJUNCT of the OBJ of the sentence is a structure enclosed in curly brackets, the standard representation for sets.  In the transfer system, sets are represented by the in_set predicate. For the above example, it would be like this We can see that the value of 49's ADJUNCT attribute is a set value, that we have indexed as var(5). There are two items in the set, namely var(59) and var(68).  LFG contains a special device for representing scope relations that would otherwise be lost, or reconstructable only as the result of a more or less complex computation based on the c-structure of the sentence.  The f-structure displayed above contains an example.  The f-structures for "big" and "black" occur together as members of a set and are unordered except for an attribute shown as ">s" in the structure for "big" which has the structure of "black" as its value. In this case, we say that "big" outscopes "black" because "big" precedes  "black" in the English string.  If we continue to assume that var(59) and var(68) are the indices of "big" and "black" in the representation used in the transfer component, the representation for the scope relation is scopes(var(59), var(68)).

Basic Transfer Rules

We have seen the transfer facts that are derived from the f-structure for the sentence Mary sleeps. For reference in the following example, once again these facts are

PRED(var(19),sleep),
SUBJ(var(19),var(2)),
arg(var(19),1,var(2)),
lex_id(var(19),3),
LAYOUT-TYPE(var(19),unspec),
PASSIVE(var(19),-),
STMT-TYPE(var(19),declarative),
VTYPE(var(19),main),
TNS-ASP(var(19),var(3)),
PRED(var(2),Mary),
lex_id(var(2),1),
ANIM(var(2),+),   
CASE(var(2),nom),
GEND(var(2),fem),
NUM(var(2),sg),
PERS(var(2),3),
NTYPE(var(2),var(4)),
MOOD(var(3),indicative),
PERF(var(3),-),
PREG(var(3),-),
TENSE(var(3),pres),
PROPER(var(4),name)

We now consider rules for rewriting these facts. In this section we only discuss the basic constructs of transfer rules. More advanced topics are discussed later.

Let us jump in, and look at the following (somewhat contrived) transfer grammar

" PRS (1.0) "
" Rule file must begin with this line in order to pick up current
  definitions of the transfer rule syntax"
" Comments are enclosed between double quotes"
  
"Give the transfer rule set a name:"
  
ruleset = simple_example.
  
  
"----------------------------------------------------
 Rule 1: Rewrite the verbal pred sleep to dormir
----------------------------------------------------"
  
PRED(%X, sleep), +VTYPE(%X, main) ==> PRED(%X, dormir). 
  
  
  
"----------------------------------------------------------
 Rule 2: Delete the progressive attribute if present tense
----------------------------------------------------------"
  
+TENSE(%X, pres), PROG(%X, %%) ==> 0.
  
  
  
"-------------------------------------------------------------
 Rule 3: Remove the indicative mood attribute for declarative
         statements, and replace declarative by decl
-------------------------------------------------------------"
  
STMT-TYPE(%X,declarative), +TNS-ASP(%X,%TA), MOOD(%TA,indicative) 
  ==> STMT-TYPE(%X,decl).
  
  
"-----------------------------------------------------------
 Rules 4 and 5: Rewrite Mary to either Marie or Maria
-----------------------------------------------------------"
  
PRED(%X,Mary) ?=> PRED(%X, Marie).
  
PRED(%X,Mary) ==> PRED(%X, Maria).

The first thing that a grammar must do is declare which rule syntax it is assuming. This is specified by the first non-blank line in the rule file that has the comment "PRS (1.0)", which stands for Packed Rewrite Syntax Version 1.0. If the first non-blank line does not specify which version of the synatx to use, the system will assume that the old style prolog syntax is being used.

Once the version of the syntax has been specified, the grammar must be given a name. This allows the system to have multiple sets of rules, with different grammar names, loaded at one time. The rules are named at the top of the file by the statement

ruleset = .

It is also possible to use the older form of the declaration

grammar = .

(The names fs_triples, sem_triples and kr_triples are reserved to identify system-defined rules that map f-structures, semantic representations and KR representations, respectively, to a triples format used in running regression tests.  You should not use these names unless you intend to alter the behavior of test suite comparisons). Note that this statement must be terminated by a period, as must the rules that follow it.

Variables in Patterns

The first rule rewrites the English semantic predicate "sleep" as the French predicate "dormir". There are a number of things to notice about this rule. The first is that it contains variables. A variable is written as an atom that starts with a percent sign (%). Thus %X, %TA and %% from the rules provided above are all variables. Variables are used to match arguments, or parts of arguments, in transfer facts. Thus the rule pattern PRED(%X,sleep matches the transfer fact PRED(var(19), sleep), setting the variable %X to the value var(19). All other occurrences of the variable %X in the same rule will get instantiated to the value var(19) as a result of the match. (The scope of a variable is limited to a single rule: Occurrences of the same variable in different rules are not linked, and instantiating a variable in one rule will not affect any of the variables with the same name in other rules.)

The rule itself consists of a lefthand side, a rewrite arrow, and a righthand side. The lefthand side is a comma separated sequence of patterns that are intended to be matched against individual transfer facts. The rewrite arrow can be either ==> (obligatory rewrite) or ?=>  (optional rewrite). For an obligatory rule, if all the patterns on the lefthand side can be matched against facts, then the (instantiated) patterns on the righthand side must be added to the set of transfer facts. For an optional rule, there is a choice: either apply the rule or not. This choice has the effect of forking the transfer rewriting process along two separate and independent paths. We will only consider the effects of optional rules when we discuss rules 3 and 4 from the example above.

To see how rule 1 operates, let us match the first pattern on the left hand side with the transfer fact PRED(var(19), sleep). This has the effect of instantiating the rest of the rule as shown (with the already matched pattern shown italicized)

PRED(%X, sleep), +VTYPE(%X, main) ==> PRED(%X,dormir)

matches

PRED(var(19), sleep)

to give

PRED(var(19), sleep), +VTYPE(var(19), main)
  ==> PRED(var(19), dormir)

Note how all occurrences of %X are instantiated to var(19).

Consumption of Transfer Facts

The second pattern on the lefthand side of rule 1 is now an exact match with the input fact VTYPE(var(19), main). The significance of the + sign preceding the pattern is as follows. Normally, when a fact matches a pattern on the lefthand side of a rule, that fact is "consumed" in the sense that it is removed from the set of transfer facts. (Or rather, it is removed once all the other patterns on the lefthand side have been successfully matched and the rule is applied.) Thus by matching the fact PRED(var(19), sleep) against the pattern in the rule, this facts is taken out of the input set of transfer facts. The + sign preceding a pattern means match against a fact, but don't consume it. That is, retain it in the set of transfer facts. So, after matching the second pattern, we have a complete match of the lefthand side, and know that application of the rule will remove PRED(var(19), sleep) but keep VTYPE(var(19), main) in the set of facts.

The righthand side of the matched rule now serves as an instruction to add a new transfer fact to the set of facts, PRED(var(19), dormir). Thus, if we look at the input facts affected by this rule

PRED(var(19), sleep), VTYPE(var(19), main)

we can see that after applying the rule the relevant output facts are

PRED(var(19), dormir), VTYPE(var(19), main)

That is, we have removed PRED(var(19), sleep) and replaced it by PRED(var(19),dormir).

Anonymous variables and empty RHSs

The modified set of transfer facts obtained from running Rule 1 serves as input to Rule 2 (repeated below for convenience):

+TENSE(%X, pres), PROG(%X, %%) ==> 0.

The first pattern matches the fact TENSE(var(3), pres). The second pattern, now instantiated to PROG(var(3), %%), matches the fact PROG(var(3), -). The %% is an anonymous variable. This matches in the same way as an ordinary variable, but does not lead to any instantiation of the variable. Thus multiple occurrences of %% in a rule can match with different items. The effect of the pattern in this rule is to find the progressive attribute for var(3) without caring about the value of the attribute represented as %%.

The use of anonymous variables to represent "don't care" values is strongly encouraged. When the rule compiler finds an instance of a normal variable that has just a single occurrence in a rule, it issues a warning message naming the offending singleton variable. If you are consistent about the use of anonymous variables to represent "don't care" values, these singleton variable warning messages are a useful way of detecting possible typos in your rules. A very common mistake is mistyping variable names (e.g. SUBJ and Subj) when they are both intended to refer to the same item. Such typos usually result in at least one of the variables being a singleton, and the warning message will alert you to this. However, if you consistently use singleton non-anonymous variables for "don't care" values, messages that might alert you to the presence or typos will get drowned out in a flood of innocuous warnings. A variable name beginning with a double percent (e.g. %%temp) is a non-anonymous singleton variable. The rule compiler will not complain about singleton occurrences of such non-anonymous singletons. But multiple occurrences of these variables within a rule are treated as linked.

The righthand side of the second rule, 0, is how we say that no new facts are to be added as the result of applying the rule. Thus, applying the rule means that we match and keep

TENSE(var(3),pres)
but match and discard

PROG(var(3), -)

In other words, the rule removes the single fact PROG(var(3), -) from the list of transfer facts, and passes the updated set of transfer facts on as input to the next rule.

Rule 3 illustrates the use of an intermediate variable, %TA, to link structures.

STMT-TYPE(%X, declarative), +TNS-ASP(%X, %TA), MOOD(%TA, indicative) 
  ==> STMT-TYPE(%X, decl).

The rule is meant to apply to declarative statements with indicative mood, as determined by checking whether %X has an a STMT-TYPE attribute with the value 'declarative' and a TNS-ASP attribute that in turn has a MOOD attribute with the value 'indicative'. The variable %TA is used to link the mood to the statement type via the intermediate TNS-ASP structure. The effect of the rule is two-fold: it removes the MOOD attribute if it is indicative from the TNS-ASP of any declarative statement (while keeping the rest of TNS-ASP in place thanks to the + preceding the pattern), and it reformats "declarative" as "decl".

This is one of those cases where misspelling %TA would be a problem. If we had accidentally broken the link between variables by writing

STMT-TYPE(%X, declarative), +TNS-ASP(%X, %TA), MOOD(%T_A, indicative) 
  ==> STMT-TYPE(%X, decl)

the rule would depart from the original intention of affecting only declarative statements with indicative mood. Instead, it would remove all indicative mood attributes, regardless of whether they belong to declarative statements, so long as there is a declarative statement type somewhere in the sentence (which would be rewritten as 'decl'). However, this version of the rule would cause a warning about %TA and %T_A being singleton variables, which would alert us to the link between variables being broken by a misspelling.

Optional Rules

Rules 4 and 5 operate as a pair, stating that "Mary" can translate as either "Marie" or "Maria":

PRED(%X,Mary) ?=> PRED(%X, Marie).
PRED(%X,Mary) ==> PRED(%X, Maria).

The first rule is optional, and says that we have a choice of replacing "Mary" by "Marie" (when we apply the rule), or leaving "Mary" as it is (when we don't apply the rule). The second rule is obligatory, and says that we must replace "Mary" by "Maria". However, the second rule will only match when we opt not to apply the first, optional, rule. This is because applying the first rule will remove the fact PRED(var(2),Mary) from the list of facts passed on as input to the next rule. And with this fact removed, the second rule no longer matches anything. But if we do not apply the first rule, then the fact PRED(var(2),Mary) remains in the list of facts, and the next rule obligatorily rewrites "Mary" to "Maria".

If we wanted to have three options for translating "Mary": "Marie", "Maria" or leave it as "Mary", then we would use the following

PRED(%X, Mary) ?=> PRED(%X, Marie).
PRED(%X, Mary) ?=> PRED(%X, Maria).

The effect of making the second rule optional is to further split the possibility space. The first rule splits the possibilities in two: either leave "Mary" in place or replace it with "Marie". Under the first option, where "Mary" is left in place, the second rule provides a second split: either continue to leave "Mary" in place or replace it with "Maria". This gives three possibilities in total.

  Rule 3 Rule 4
Mary --> Marie  
Mary --> Mary --> Maria
    --> Mary

Contexted Rewriting

(Note: You don't need to understand this section on contexted rewriting in order to write transfer rules. But it will probably help, and it will also assist you in interpreting the output of the transfer system, especially when it is being run in debug mode.)

Optional rules, as we have just seen, have the effect of splitting up the space of transfer alternatives. The transfer system is able to handle these splits efficiently (so that the alternatives do not multiply out) by making use of contexted rewriting. Up to now, it has deliberately not been mentioned that transfer facts are contexted. This is because in our simple example, all the input facts existed in a single, true context. However, with the application of optional rules, we now have some facts that only exist in the context where the optional rule is applied, some facts that only exist in the context where the rule is not applied, and other facts that exist in either context. At the start of applying rule 4, we have the fact PRED(var(2),Mary) existing in the single true context, alongside all the other input facts. We can represent this as follows (where 1 stands for the true context):

cf(1, PRED(var(2),Mary))
cf(1, GEND(var(2), fem))
....

After matching this fact against the lefthand side of rule 3, the single true context gets split into two disjoint sub-contexts (call them A1 and A2). In context A1 the rule is applied, and the fact PRED(var(2),Mary) is replaced by PRED(var(2),Marie). In context A2 the rule is not applied, so we keep PRED(var(2),Mary) but do not add PRED(var(2),Marie).  However, all the other facts are unaffected, and continue on in the true context, which covers both A1 and A2. The transfer system represents this situation in two parts, as follows. First, it records that the optional rule has affected the space of possible choices, so that the single unitary choice has now been split into two disjoint parts. This corresponds to an equivalence

1 <-> xor(A1, A2)
which says that the true context is logically equivalent to the disjunction of the two pairwise disjoint subcontexts, A1 and A2.   Secondly, the system records which facts hold under which contexts, thus
cf(A1, PRED(var(2),Marie))
cf(A2, PRED(var(2),Mary))
cf(1,  GEND(var(2), fem))
....

These contexted facts provide input to the next rule. Let us assume that this is the optional version of the rule that rewrites "Mary" to "Maria".  We can see that the lefthand side of the rule only matches with a fact that holds in the A2 context. This means that under context A2, and A2 only, we have the option of applying the rule or not. This splits context A2 into two disjoint parts (call them B1 and B2). In context B1 the second rule is applied, and "Mary" is replaced by "Maria". In context B2 the rule is not applied, and "Mary" remains in place. The resulting choice space and set of contexted facts is now
1  <-> xor(A1, A2)
A2 <-> xor(B1, B2)

cf(A1, PRED(var(2),Marie))
cf(B1, PRED(var(2),Maria))
cf(B2, PRED(var(2),Mary))
cf(1, GEND(var(2), fem))
....

Hopefully, this brief description of contexted rewriting will help in understanding of the effects of optional rules. However, optional rules do not conceptually depend on having contexted rewriting. You can also think about them in uncontexted terms. Thus whenever you match an optional rule, the output facts are split into two separate and complete copies, one where the rule is applied and the other where it is not. Independent, non-interacting rewriting processes apply to each copy and continue in parallel, perhaps splitting further as other optional rules are encountered. Thus all transfer rules should be explicable in terms of what they do when applied to a single, uncontexted set of facts. Put another way, whether the inputs to your rules are going to be contexted or uncontexted should make no difference to the way that you write these rules.

This last point is important when thinking about how to write rules that can manipulate packed f-structure input. It says that you should not be thinking about this in the first place! Just write the rules so that they apply correctly to any single, unambiguous f-structure that can be unpacked from the f-structure chart. The transfer system implementation in terms of contexted rewriting will then take care that these rules do the right thing when applied to the packed chart. This also means that it is safe (and a lot easier!) to develop and debug rules by applying them only to single selected (i.e., unpacked) f-structures.

Templates

There are two devices for notational abbreviation that can make transfer rules clearer and easier to write: templates and macros. Templates are parameterized abbreviations for entire rules, or even sequences of rules. Macros are parameterized abbreviations for commonly occurring sequences of patterns.

To illustrate the use of templates, consider the following (simplified) rule for translating a pair of nominal preds:

PRED(%X, man), +NTYPE(%X, %%) ==> PRED(%X,homme).

The above transfer rule checks that the "man" PRED has an NTYPE attribute, so that it doesn't accidentally apply to instances of the verb "to man" (e.g., Man the barricades!). It then replaces "man" with "homme". This is a commonly occurring pattern; since many English nouns are homonymous with verbs, it is necessary to do an NTYPE check when translating them. However, it would be tedious and error prone to write different copies of the above rule over and over again for different noun-noun pairs. So instead, we can define a template as follows:

noun_noun(%English, %French) ::
  PRED(%X, %English), +NTYPE(%X, %%) ==> PRED(%X, %French).

We can then write a series of highly abbreviated rules by calling this template with different values for the %English and %French parameters:

@noun_noun(man,   homme).
@noun_noun(woman, femme).
@noun_noun(girl,  fille).
....

When the templates are called during rule compilation, these rules will expand out to

PRED(%X, man),   +NTYPE(%X,%%) ==> PRED(%X, homme).
PRED(%X, woman), +NTYPE(%X,%%) ==> PRED(%X, femme).
PRED(%X, girl),  +NTYPE(%X,%%) ==> PRED(%X, fille).

A significant advantage of using templates is that if you realize that you need to alter the way you handle noun-noun translation, it is only necessary to edit a single template definition, and not multiple instances of the rules it generates.

Templates can also define sequences of rules. We could, for example, define a template called 'intrans_refl' for verbs in English that translate into French as ordinary transitives when they occur with a direct object in English but as reflexives when they occur without an object.

intrans_refl(%English, %French) ::

PRED(%X, %English), +OBJ(%X, %%) ==> PRED(%X, %French);

PRED(%X, %English) ==> REFLEXIVE(%X, +), PRED(%X, %French).

An example template call would be

This single template call expands out into two rules
PRED(%X, stop), +OBJ(%X, %%) ==> PRED(%X, arrêter).

PRED(%X, stop) ==> REFLEXIVE(%X, +), PRED(%X, arrêter).

The first rule will consume any instances of "stop" where an object is present, and replace it by "arrêter". Since the fact PRED(%X, stop) is consumed by applying the first rule, the second rule cannot apply if the first rule matches. Thus only if the verb "stop" occurs without an object will the second rule replace it by the reflexive version of "arrêter". Note how the arguments to the template, %English and %French, are shared between the rule instances, unlike the variable %X, which is separately scoped within the individual rules.

The definition of a template must occur in the grammar somewhere preceding its first use, or call. The definition takes the following form (the final period being of crucial importance):

template :: Rules.

If the definition involves a sequence of more than one rule, the rules are separated by semicolons. Calls to templates can (optionally) be marked by preceding the template name with an at-sign (@).

Macros

Macros provide a shorthand for sets of patterns (or patterns and other macro calls) rather than sequences of rules. The following are examples of a macro definitions:

pronoun(%X, %Person, %Number, %Case) :=
  PRED(%X, pro), PERS(%X, %Person), NUMBER(%X, %Number), CASE(%X, %Case).

verb_subj(%X, %Verb, %Subject) :=
  PRED(%X, %Verb), SUBJ(%X, %Subject).

verb_subj_obj(%X, %Verb, %Subject, %Object) :=
  verb_subj(%X, %Subject), OBJ(%X, %Object).

Using these definitions, we might write the following (not quite correct) rules to enable us to translate the intransitive verb "I know" in English as a transitive verb with a third person singular dummy object in French--i.e., "Je le sais".

@verb_subj_obj(%X, know, %Subj, %Obj) ==>
   @verb_subj_obj(%X, savoir, %Subj,%Obj).

@verb_subj(%X, know, %Subj) ==>
   @verb_subj_obj(%X, savoir, %Subj, %Obj), @pronoun(%Obj, 3, sing, acc).

The first rule picks up any transitive uses of the verb "know" and translates it as "savoir", preserving both the subject and the object. Given the ordering of the rules, the second rule will only fire if there is a use of "know" without an object (since the first rule will have consumed all occurrences of "know" with an object). In this case, we create a transitive version of "savoir", with a third person singular pronoun as an object. The second rule expands out to the following

PRED(%X, know), SUBJ(%X, %Subj) ==>
   PRED(%X, savoir), SUBJ(%X, %Subj), OBJ(%X, %Obj),
   PRED(%Obj,pro), PERS(%Obj, 3), NUMBER(%Obj, sing), CASE(%Obj, acc).

As we will see directly below, the above rule for creating a transitive version of "savoir" is not quite complete. But to see why, we must look at how transfer facts get converted back to f-structures.

Converting Transfer Facts to F-Structures

When converting f-structures to transfer facts, nearly all f-structure attribute-value pairs correspond to binary f-structure facts, e.g.

19:[SUBJ    2:[...]]
     <->
SUBJ(var(19), var(2))

The same is true when converting transfer facts back into f-structures: most binary facts correspond directly to attribute-value pairs. The exception, in both directions, comes up in the case of semantic forms. As noted before, the semantic form values of the PRED attribute are decomposed into PRED, sf_id, arg and non_arg facts. Thus when converting back to f-structures, these facts must be re-assembled to construct a PRED-SemanticForm attribute-value pair. In most cases, this is just a simple inverse of the conversion from f-structures to transfer facts. But sometimes transfer rules alter the basic argument structure of semantic forms, and in these cases the conversion back can be more involved. We have just seen one example of how a transfer rule can alter argument structure, when adding a pronominal object in translating intransitive "know" into transitive "savior". Arguments can also be removed, with passivization being a common case.

Adding Semantic Form Arguments

The most common mistake in rules adding a new argument to a semantic form is to neglect the arg and nonarg transfer facts. The rule we previously gave for converting the English intransitive verb know into the French transitive verb savoir commits this error (repeated below for convenience):

PRED(%X, know), SUBJ(%X, %Subj) ==>
  PRED(%X, savoir), SUBJ(%X, %Subj), OBJ(%X, %Obj),
  PRED(%Obj, pro), PERS(%Obj, 3), NUMBER(%Obj, sing), CASE(%Obj, acc).

The rule creates a new object for the verb, but it does not include the new object in the list of the verb's thematic arguments. Grammar writers may have an implicit obliqueness hierarchy in mind, so that e.g. objects always correspond to the second thematic argument in semantic forms. But no such hierarchy is hard-wired into the conversion from transfer facts back to f-structures; it must therefore be made explicit. Thus the correct rule should be

PRED(%X, know), SUBJ(%X, %Subj) ==>
  PRED(%X, savoir), SUBJ(%X, %Subj), OBJ(%X, %Obj),
  PRED(%Obj, pro), PERS(%Obj, 3), NUMBER(%Obj, sing), CASE(%Obj, acc),
  arg(%X, 2, %Obj).

where the new object has been explicitly included as the second thematic argument by adding the last line arg(%X, 2, %Obj). It is not necessary to say anything about the subject being the first thematic argument. If nothing is done to consume it, the input fact arg(%X, 1, %Subj) will be passed through transfer to preserve the required output fact. The same is true of the input lex_id fact; provided nothing consumes or alters the fact, the French semantic form will have the same semform id as the English semantic form.

If the additional arg pattern is not included in the rule's righthand side, a semantic form will still be created for the French output. However it will produce a semantic form like

PRED savoir'<[2:je]>'

where the presence of the object grammatical function is not reflected in the semantic form. Strictly speaking, there is nothing wrong with semantic forms like this except that the XLE will not be able to generate from it because the OBJ is ungoverned.

Semantic Forms Added by Transfer Rules

The know-savoir rule does not only introduce an additional thematic argument; it creates a whole new node and semantic form for the pronominal object. What is the name of this new f-structure node, what is the semform id of its semantic form, and what are its thematic and non-thematic argument lists? The rule does not appear to say anything about this.

First, what are the thematic and non-thematic arguments of the new f-structure node? Because the rule does not add any arg or nonarg facts for the object, these lists are taken to be empty when converting back to f-structure. To create non-empty argument lists we would have to explicitly add arg and nonarg facts.

Second, what is the name of the new-fstructure node? Note how the rule contains a variable, Obj, on the righthand side that does not occur on the lefthand side. Whenever a new variable is introduced on the righthand side of a rule, it will be instantiated to a brand new constant of the form var(n), where n is an integer that does not clash with any previously encountered f-structure node number. This instantiation, unlike the next, is performed by the transfer system when the rule is applied.

Third, what is the new semform's id? When composing PRED-SemanticForm attribute-value pairs, the transfer fact to f-structure conversion first looks for any facts of the form PRED(%X, P)and collects all the lex_id, arg and nonarg facts pertaining to %X. If there is no lex_id fact for %X, then the conversion process will create one, using a brand new numerical identifier that does not clash with any previously encountered semform ids.

Deleted Semantic Form Arguments

Deleting an argument generally requires removing all facts pertaining to the argument. Given the way that semantic forms are recomposed, it is not strictly necessary to remove the  lex_id, arg and nonarg facts that constitute the removed argument's semantic form, provided that it's pred fact is removed. However, you should take care to remove arg and nonarg facts from any higher level semantic forms that included the deleted item as an argument. This removal can be problematic if the deleted item was not the last argument in the list.  This is because the numbers of all the arguments following the deleted item should be reduced by 1 to take account of the deletion.  This can be very cumbersome to express with the current transfer rule formalism.

One special case has been taken care of, however. Passive verb phrases without an agentive by-phrase usually give rise to a semantic form where NULL is the first thematic argument, and the subject is the second argument. A passivization transfer rule might thus delete all facts to do with the active subject, including the fact that it was the first thematic argument to the active verb. Rather than renumbering subsequent arg facts, or including an explicit arg(%X, 1, 'NULL') fact, it is sufficient just to delete the arg fact for the active subject.

When recomposing semantic forms, whenever there is a (non)arg(X, N+1, Arg) fact, but no (non)arg(%X, N, Arg) fact, then a (non)arg(%X, N, 'NULL') fact will automatically be created. So, by just deleting an active subject, we will get the required NULL first argument in the semantic form. However, NULL values will also be given to all arguments missing from the middle of a list, which may not be what you want.

Unconvertible Facts

It is possible to write rules that produce transfer output that cannot be converted back to f-structures. Apart from arg and nonarg facts, only binary facts can be converted to attribute value pairs. For unary and n-ary facts (n>2), rather than have conversion fail, dummy attribute-value pairs of the form

eq(attr(null, '$unconvertible_attribute'), )

will be included in the f-structure, where is just the offending transfer fact. To generate from such f-structures, you will need to remove or otherwise manipulate these dummy pairs.

Advanced Transfer Rule Constructions

This section discusses some more advanced constructions in transfer rules, in approximate order of relative usefulness and non-obscurity.  Two of these, negated patterns and built-in macros for handling variable predicates, are ones that you will very likely have cause to use but need to be approached with care.

Splitting Rules Across Files

For large rule sets, it is often convenient to split rules across multiple files. This can be done by means of include statements, e.g.
include(subfile1).
include(subdir/subfile2).

The path names of the included rule files are taken relative to the location of the file that is including them. The included files should not specify their own grammar names.


Negated Patterns

It is possible to include negated patterns on the lefthand sides of rules. These say that the rule can apply only if there are no facts that match the negated pattern. For example, the rule
STMT-TYPE(%X,declarative), +TNS-ASP(%X,%TA), -MOOD(%TA,%%) ==> STMT-TYPE(%X,decl).
says that statement type "declarative" must be rewritten as statement type "decl", but only if its tense and aspect does not have a mood attribute. The minus sign prefixing a pattern makes it a negated pattern. As so often with negation, it all turns out to be much more complex than it seems at first sight, and caution is required.

To see why, suppose for the sake of argument that we are trying to match the above rule against the following contexted facts
1 <-> xor(A1, A2)
A1 <-> xor(B1,B2)

cf(A1, STMT-TYPE(var(19),declarative)),
cf(A2, STMT-TYPE(var(19),imperative)),
cf(1,  TNS-ASP(var(19),var(3))),
cf(B1, MOOD(var(3), indicative)),
cf(1,  STMT-TYPE(var(7),declarative))
cf(1,  TNS-ASP(var(7), var(8))),
cf(1,  MOOD(var(8), indicative))

These facts correspond to a structure having at least two clauses with declarative statement types (var(19) and var(7)). Moreover, var(19) is (a) ambiguous between being declarative or imperative, and (b) when declarative is further ambiguous between having a mood attribute or not.

Where negation is concerned, the order in which patterns are matched makes a significant difference. If we were to try to match -MOOD(%TA,%%) before anything else, we would find that there are two positive matches, namely cf(B1, MOOD(var(3), indicative)) and cf(1,  MOOD(var(8), indicative)). This would suggest that the negative match only succeeds in the context not(1 or B1) = 0. That is, the negative pattern fails, and hence the rule cannot be applied. By matching the negative pattern when it is completely uninstantiated, we are effectively checking that there are no mood attributes of any kind, anywhere in the sentence.  But this is probably not what we intended the rule to mean.

Suppose instead that we first matched the other two (positive) patterns on the lefthand side. There are two distinct ways in which they can match. In the first, we get a partial match
STMT-TYPE(var(7),declarative), +TNS-ASP(var(7),var(8)), -MOOD(var(8),%%) ==> STMT-TYPE(var(7),decl).
Since the two matching facts are in context 1, this partial match takes place in context 1. The negative pattern now checks that there is no mood attribute for var(8). However, there is one in context 1, so the negative pattern only matches in the false context 0. Thus, with these instantiations the rule only matches in context and(1, 0): i.e. it fails.

But there are other facts that can match the first two patterns in the rule
STMT-TYPE(var(19),declarative), +TNS-ASP(var(19),var(3)), -MOOD(var(3),%%) ==> STMT-TYPE(var(19),decl).
This partial match holds in context and(A1, 1) = A1  (since the statement type in context A2 does not match the first pattern). The negative pattern now asks us to check that there is no mood attribute for var(3). However, in context B1 there is such an attribute. Thus the negative pattern only matches in context not(B1). This means that overall the lefthand side matches, with the instantiations shown, in context and(A1, not(B1)).  We can simplify the description of the matching context somewhat.  Context A1 is split into two parts, B1 and B2. The only way we can be in A1 but not in B1 is if we are in B2. Thus and(A1, not(B1)) = B2 .

From this example, we can see two of the difficulties raised by negation.  First, it makes a difference when you try to match a negative pattern. The grammar writer has no way of controlling this, and should be aware that the system always matches negative patterns as late as possible, when they are maximally instantiated by matching positive patterns.

We can explain the consequences of this late evaluation of negation more logically as follows. There is a significant difference between the logical formulas
forall X. not( P(X) )      i.e. no X is a P
not( forall X. P(X) )     i.e. not every X is a P, or some X is a not-P
Matching a negative pattern early, before its variables are instantiated by other positive patterns, is like treating the variables as being bound by a quantifier within the scope of the negation, as in the second formula.   Matching a negative pattern late, where all the variables that can be instantiated by positive patterns have been instantiated, puts the variables' quantifiers outside the scope of the negation. In looking at rules involving negation, we therefore have to be clear about the intended scope of quantified variables with respect to negation. And the intended scope is that
  1. All variables that have even one occurrence in a positive pattern on the lefthand side  of the rule are bound by a universal quantifier with wide scope over the entire rule
  2. Any variables that occur only inside a negative pattern are treated as bound by a universal quantifier that only has scope within the negation. This means you will gain no linkage by using the same variable in two different negative patterns unless the variable also occurs in a positive pattern.
  3. Variables that occur only in the righthand side of the rule are treated as bound by existential quantifiers that have scope only over the rule's right hand side. This means that new variables introduced on the right will get replaced by arbitrary new (skolem) constants.  These constants always have the form var(N), where N is an integer that does not clash with any previous constant or f-structure node.
These rules of interpretation only become important to understand once your rules include negative patterns.  This is the first motivation for being sparing in your use of negative patterns.

The second difficulty raised by negation, and motivation for its avoidance,  is that it can be quite expensive to compute.  To match a negative pattern, you have to find all the positive matches, disjoin the contexts of the positive matches, and negate the result. This negative context then needs to be conjoined with the context in which the remainder of the rule match holds.  While there are more efficient ways of doing this than naively following the preceding description, it is still computationally expensive.

But having got this far, let us complete the example. We have seen that the rule with a negative pattern finds an overall match in context B2. Since the first pattern in the rule matched the fact cf(A1, STMT-TYPE(var(19),declarative)), this means that this fact gets consumed, but only in context B2.  Since A1 is split into B1 and B2, this means that in context B1 the fact remains in place, while in context B2 it is removed and replaced by the right hand side of the rule. This means that after matching and applying the rule, we are left with the following facts
1 <-> xor(A1, A2)
A1 <-> xor(B1,B2)

cf(B1, STMT-TYPE(var(19),declarative)),
cf(B2, STMT-TYPE(var(19),decl)),
cf(A2, STMT-TYPE(var(19),imperative)),
cf(1,  TNS-ASP(var(19),var(3))),
cf(B1, MOOD(var(3), indicative)),
cf(1,  STMT-TYPE(var(7),declarative))
cf(1,  TNS-ASP(var(7), var(8))),
cf(1,  MOOD(var(8), indicative))

Avoiding Negated Patterns

There are circumstances where negated patterns are unavoidable.  But by making use of  rule ordering and the way that rule applications consume facts, negated patterns can often be avoided. Let us recall our macros and rules for translating transitive and intransitive "know" into transitive "savoir" . By making use of negation, we could define macros for transitive and intransitive verbs as follows
verb_intrans(%X, %Verb, %Subject) :=
   PRED(%X, %Verb), SUBJ(%X, %Subject), -OBJ(%X,%%).
verb_trans(%X, %Verb, %Subject, %Object) :=
   PRED(%X, %Verb), SUBJ(%X, %Subject), OBJ(%X,%Object).
The first macro says that an intransitive verb is one that has a subject but lacks an object, while the second says that a transitive verb is one that has both a subject and and object. Using these macros, it does not matter what order we write the know-savoir rules in:
@verb_intrans(%X, know, %Subj) ==>
   @verb_trans(%X, savoir, %Subj, %Obj), pronoun(%Obj, 3, sing, acc).
@verb_trans(%X, know, %Subj, %Obj) ==>

   @verb_trans(%X, savoir, %Subj,%Obj).
The first rule, for intransitive "know" will be blocked if the verb has an object, and does not rely on having consumed all transitive instances of the verb before the rule is matched.

This approach is not without problems. First, we should really be a little more careful about stating the negative restrictions on transitive and intransitive verbs. An intransitive verb must lack not only an object, but also an indirect object, an xcomp, a comp, an oblique, and so forth. Likewise, a transitive verb should have both a subject and object, but lack an indirect object, an oblique, etc etc. To make the rules and macros truly order independent, we need to state a lot of negative conditions. Second, while it is safe to use the verb_trans macro on the righthand side of a rule, it would make no sense to use verb_intrans macro. This is because it contains a negative pattern, and negative patterns are only possible on the left hand sides of rules.

The original formulation of the rules avoided the use of negation.  Instead we defined macros for verbs that have subjects, and for verbs that have subjects and objects (where the former includes the latter).   We then ordered the rules so that they consumed all instances of verbs with both subjects and objects before matching against any remaining verbs with subjects.  To repeat the definitions:
verb_subj(%X, %Verb, %Subject) :=
   PRED(%X, %Verb), SUBJ(%X, %Subject).
verb_subj_obj(%X, %Verb, %Subject, %Object) :=
    @verb_subj(%X, %Subject), OBJ(%X, %Object).
@verb_subj_obj(%X, know, %Subj, %Obj) ==>
   @verb_subj_obj(%X, savoir, %Subj,%Obj).

@verb_subj(%X, know, %Subj) ==>
   @verb_subj_obj(%X, savoir, %Subj, %Obj),
   arg(%X,2,%Obj),
   @pronoun(%Obj, 3, sing, acc).

Of course, this way of using rule ordering and consumption of  facts only works for obligatory rules. If the transitive translation were optional, then the subsequent intransitive rule would match transitive verbs to which the first rule had not been applied.

Built-in Macros for Parameterized Predicates

The current version of the transfer system comes with two macros already defined: complex_term and gf.  These macros have indisputable utility, but the system does not have the flexibility that a user would require to define them.

The macro call @complex_term(A, A1 ... Ak) is equivalent to A(A1 ... Ak), however it provides partial relief from the requirement that functors must always be written as literal atoms. A user can write a variable name as the first argument of complex_term provided it is known that that variable will have an atomic constant as its value at the time when the rule calling the macro comes to be compiled.  Thus, for example, one may define the following macro

and then use it as follows to switch subjects and objects: thus reusing one x_arg macro where separate ones would otherwise be required for the subj and obj cases.  A shorter form is provided for that case where the predicate with the variable functor will have exactly two arguments.  It is called gf and the it is used in the following alternative to the example above: The difficulty with these macros lies in guaranteeing that the variable predicate is instantiated to an atom when the rule is compiled.  Normally, these built in macros are used in the body of a template definition, where one of the arguments to the template provides the predicate.

Variables over Predicates

The gf and complex_term macros give the illusion of  being able to have variables over predicates as well as over arguments.  But it is only an illusion, since the actual values of the predicates must be known at rule compilation time. The macros allow one to parameterize rule definitions, but they do not provide run-time variables over predicates.

To express run-time quantification over predicates the rules can make use of the qp construction (for "quantification over predicates").  For example, to convert all attribute values "yes" to "+", we could write
qp(%P, [%X, yes]) ==> qp(%P, [%X, +]).
Here, %P ranges over predicates (such as SUBJ, PROG, PRED).  The arguments to the predicate are enclosed between square brackets.  In this case, we are looking only at predicates that take two arguments.

Quantification over predicates provides a significant degree of expressivity in the formalism. The price paid for this is that in matching qp's against transfer facts, the system has to inspect each input fact for a possible match. Thus, although
qp(SUBJ, [%X, %Y])
SUBJ(%X, %Y)

both match exactly the same facts, the system will inspect every single input fact in the first case, but only the SUBJ facts in the second case.

Splicing Lists of Facts into the RHS

There are circumstances where it is useful to generate a list of new facts on the lefthand side of a rule, and splice that list onto the righthand side. For example, assuming some procedural call that is known to return a list of new facts, we might have
lhs_condition, {get_new_facts(%ListofFacts)} ==> rhs_new_fact(1), $splice(%ListFacts).
The special (rhs only) operation $splice([F1, ... Fn]), will splice its list of facts into the righthand side of the rule. In the example above, the procedural call muct return a list of facts enclosed between square brackets. Assuming that %List gets instantiated to [f1, f2, f3], then the righthand side resulting from the rule application will be
rhs_new_fact(1), f1, f2, f3
It is also possible to use $and(f1,f2,f3) instead of $splice([f1,f2,f3]).

Left- and Right-Handed Macros

Macro definitions contain patterns prefixed by a + or a - cannot validly be called from the righthand sides of rules.  It is possible to provide alternative expansions of a macro, one for use on the lefthand side and one for use on the righthand side:
<macro> :=
   <left-handed definition> * <right-handed definition>.

Procedural Attachments

The transfer formalism allows for procedural attachments on the left hand sides of rules, in the same way that direct clause grammars in prolog do.   A procedural attachment allows one to attach an arbitrary piece of prolog code to a rule to impose more complex matching conditions. This might sound like an extremely powerful and useful tool. In fact, it provides sufficient rope to hang not only yourself, but also everyone in your near vicinity. For this reason, the way in which you can declare prolog predicates to be accessible procedural attachments has deliberately not been documented.

One of the few cases where procedural attachments can be useful is in calling on prolog predicates to test for identity. Suppose that you wished to test for set-valued attributes containing two or more distinct elements.  We could define a macro
non_singleton(%Set) :=
   +in_set(%M1, %Set), +in_set(%M2, %Set), {\+ %M1 = %M2}.
The procedural attachment is shown between braces, and calls on prolog to check that M1 is not equal to M2 (\+ is prolog's notation for negation, and it uses = for identity/unifiability).  Without the negative equality check, we could pick the same member twice from a singleton set.   It is important to know that procedural attachments are the last patterns to be matched in a rule, when all the variables have been instantiated. You should not rely on procedural attachments to instantiate variables.

Another use of  prolog equality is, for example, in removing set-valued features, where the members of the set are atomic values, rather than f-structure nodes. This can be done as follows
in_set(%Atom, %%), {\+ %Atom = var(%%)} ==> 0.
This tests that the element Atom is not of the form var(...), which characterizes f-structure nodes, and then deletes the fact.

The transfer system pre-defines the followiing procedural attachments:

strip_trailing_underscore(%Constant_, %Constant).

concat_preds(%Constant1, %Constant2, %Constant1_Constant2).

new_constant(%Stem, %StemN).

Strip_trailing_underscores strips off the final underscore from an atomic expression (if present). Concat_preds concatenates two atoms together with an underscore. New_constant creates a brand new constant consisting of a specified atomic Stem, and a number N chosen to make the new constant unique.  In addition to these procedures, you can also make use of standard prolog procedures, the most useful of which are probably:
%X = %Y.

member(%Item, %List).

append(%List1, %List2, %List3).

Negation is prolog is expressed as \+.

Defining Further Attachments

You can also define your own procedural attachments. Let's assume that you just want to define a procedure not_equal(%X, %Y),  which succeeds when X and Y do not unify.

First,  you need to have your rule file specify where the additional attachments are defined. This is done by means of a single statement of the form
procedural_attachments = code_file.
where code_file is the name of the file containing the prolog code (as with including rule files, the path name of the code file is taken relative to the location of the rule file declaring it).

The code file first needs to specify the instantiation pattern for the procedure.  (Note that it is not longer necessary to include a prolog module declaration). This says when the transfer system is going to view the procedure call as being sufficiently instantiated to evaluate. In the case of evaluating an inequality, both arguments have to be fully instantiated.  We indicate this by the declaration
:- assert(instantiation_pattern(not_equal(+, +))).
where the + signals that the argument must be instantiated. If  it does not matter whether an argument is instantiated or not, use - instead.

Finally, you need to define the prolog procedure, in this case
not_equal(X, Y) :- \+ X = Y.
Note that prolog variables begin with uppercase letters or underscores, and that % is the prolog comment character.


WordNet Interface

The system has an interface to WordNet (currently version 2.1).  You must obtain and install your own version of wordnet, and ensure that the WNHOME environment variable is set as required by WordNet (i.e. to the WordNet directory containing the installed bin sub-directory).

WordNet lookup is done through a number of predefined procedural attachments.
You can also define your own procedures built around these pre-defined ones.  In this case, calls to the pre-defined procedures in the prolog code for procedural attachments should be preceded by the module prefix wn_prolog.

C++ Procedural Attachments

The C++ transfer system only supports a limited number of procedural attachments:

Prolog operators:

Prolog predicates:

The standard Prolog predicates can be found using Sictus's predicate index. For information on how to create new C++ procedural attachments, please contact John Maxwell or Yingwei Zhang.

Stop Rules

The single atom stop on the righthand side of a rule has special significance. All analyses / facts that fall within the context in which the lefthand side of the rule is matched will be deleted.  This can be an expensive operation.

Rule unions

The rule notation provides for complex rules, or rule unions,  to be made up from sets of simpler rules.  This facility is most useful when all, or most, of the rules are specified using templates. If R and S are rules, then R && S is a rule whose lefthand side is all the predicates occurring on the left-hand sides of both R and S, and whose right-hand side, in like manner, is the union of the right-hand sides of R and S.  Strictly speaking, what is formed in each case is not a set—the union of the contributing sets—but a bag made up by collecting the members of the contributing bags. In other words, if identical predicates occur on, say, the left-hand sides of R and S, then there will be two copies on the left-hand side of the union rule. Rule unions can be used both to specify particular rules and in the definition of templates. If either R or S is a template that expands into a sequence of rules, then the union is formed from the Cartesian product of the two sets arranged in the following way.  Suppose R is the sequence R1, R2, ... Rm and S is the sequence S1, S2, Sn then the new sequence will be R1 && S1, R1 && S2 ... R1 && Sm, R2 && S1, R2 && S2 ... Rm && Sm. When the union of a pair of elementary rules is formed, the new rule is optional unless both members of the contributing pair is obligatory, in which case it is obligatory.

Here is a more specific, if not entirely plausible, example. Suppose that the grammar contains the following:

When expanded by the grammar compiler, this would be as follows:


Disjunction in Rules

The rule notation allows for disjunction (and non-atomic negation) on the lefthand sides of rules. For example
Pattern1, (PatternA | PatternB | -(PatternC, PatternD)), Pattern2 ==> RHS.
matches when (i) Pattern1 and Pattern2 match, and (ii) either PatternA, or PatternB match,  or PatternC and D do not both match. Parentheses are used to delimit the scope of operators. While conjunction (comma) should bind more tightly than disjunction (bar),  it is wise to explicitly bracket complex formulas, thus:
(a, b |  x, y)  == ((a,b) | (x,y))
Disjunction and negation are not permitted on the righthand sides of rules. To capture the effect of a disjunctive righthand side, e.g.
Pattern ==> RHS, (RhsA | RhsB | RhsC).
you should write a sequence of rules:
Pattern ?=> RHS, RhsA.
Pattern ?=> RHS, RhsB.
Pattern ==> RHS, RhsC.

Rule unions can be used to write (though not always clarify) disjunctions.  This used to be the only way of encoding disjunction in the transfer system. We could express the rule
Pattern1, (PatternA | PatternB | PatternC), Pattern2 ==>  RHS.
 as the following union of templates
disjunctive_lhs ::
   PatternA ?=> 0;
   PatternB ?=> 0;
   PatternC ==> 0.

rewrite1 ::
   Pattern1, Pattern2 ==> RHS.

@rewrite1 && @disjunctive_lhs.
 The 0 on the righthand side of the disjunction rules ensures that these rules contribute nothing (other than possibly variable instantiations) to the unioned rules.   (Note that the templates can be defined in either order, or unioned in either order: it is just the order of the Pattern{A|B|C} rewrites in the disjunctive_lhs template that is important.)

The rule(s) with a disjunctive righthand side could be expressed as the following union
disjunctive_rhs ::
   0 ?=> RhsA;
   0 ?=> RhsB;
   0 ==> RhsC.

rewrite2 ::
   Pattern ==> RHS.

rewrite2 && disjunctive_rhs.
The 0 on the lefthand side means that the disjunction rules contribute nothing to the lefthand side of the rule unions.

Non-Resourced Assertions

Sometimes it is useful to include non-resourced facts as part of your rules. For example, you might want to classify a number of prepositions as being (potentially) locative, e.g.
|- locative(in).
|- locative(at).
|- locative(near).
where the |- notation signals the declaration of a non-resourced fact.   These facts can then be invoked in the normal way by rules, e.g.
PRED(%X, %Prep), locative(%X), ... ==> ....
No plus sign is required in front of the non-resourced fact (and in fact should not be included), since it will not in any case be consumed. It is important that at least one declaration of a non-resourced fact occurs before the first time any fact of that type is mentioned in a rule. Otherwise, the rule compiler will not recognize the fact as being non-resourced.

There are two types of non-resourced facts: static and dynamic. Dynamic facts allow you to add more facts on the righthand side of a rule, so that you can temporarily add more facts when running transfer on a particular input, e.g.
PRED(%X, %UnknownPrep), PTYPE(%X, +), ... ==> locative(%UnknownPrep).
Dynamic facts are introduced with the /- operator. It is rare to find circumstances where you would want to use dynamic non-resourced facts.

Static facts are just like dynamic facts, except that the effects of trying to create new instances on the fly are undefined. In compensation, they are slightly more efficient to use. Static facts are introduced by the |- operator, e.g.
|- locative(in).
|- locative(at).
|- locative(near).


Both static and dynamic facts create look-up tables accessed by procedural attachment.  It can sometimes be useful to indicate the required instantiation pattern for the facts.  This can be done by declarations in the grammar of the form
:- instantiation_pattern(locative(+)).
where instantiation patterns are defined in the same way as for procedural attachments.


External Database

Large bodies of non-resourced facts are best incorporated by means of the external database.  An external database is declared in a grammar by means of the following construction

external_db(<DBName>,
            [<DBSourceFile1>,..., <DBSourceFileN],
            [<RecordSpec1>, ..., <RecordSpecM>]).

For example:

external_db(ul_data`.pdb,
            [ul_verb_data`.pl, ul_cnoun_data`.pl, ul_mnoun_data`.pl],
            [verb_map(+,+,+,-,-,-,-,-,-,-),
             verb_map(+,+,+,-,-,-,-,-,-,-,-,-,-),
             verb_map(+,+,+,-,-,-,-,-,-,-,-,-,-,-,-,-),
             noun_map(+,-,-,-,-),
             noun_map(+,-,-,-,-,-,-,-),
             noun_map(+,-,-,-,-,-,-,-,-,-,-),
             nn_map(+,+,-,-,-,-),
             nn_map(+,+,-,-,-,-,-,-,-)]).


This makes available a database compiled into the file ul_data.pdb from the source files ul_verb_data.pl, ul_cnoun_data.pl and ul_mnoun.data.pl.  The database contains the records listed in the record specs. The format for the record specs is the same as for the instantiation_patterns for procedural attachments, where + indicates that the field has to be instantiated prior to database lookup.  Each record will have its own corresponding procedural defined by declaring the database.  Thus to look up a noun map in a rule, you should do something like

NTYPE(%N,%%), PRED(%N,%P),
{noun_map(%P,%MassCount,%SynSet,%N,%Concept)}
==> ....


The source files must be in valid prolog notation.  Moreover, none of the records are allowed to contain prolog variables.  If the source files are more recent than the database files, the database will be rebuilt. This will produce three files: the actual database file,  <db>.pdb, and two index files <db>.pih and <db>.pin.

It is possible to access an external database that is constructed independently of the rules. This is indicated by having an empty list of source files. For example:
external_db(ul_data`.pdb,
            [],
            [verb_map(+,+,+,-,-,-,-,-,-,-),
             verb_map(+,+,+,-,-,-,-,-,-,-,-,-,-),
             verb_map(+,+,+,-,-,-,-,-,-,-,-,-,-,-,-,-),
             noun_map(+,-,-,-,-),
             noun_map(+,-,-,-,-,-,-,-),
             noun_map(+,-,-,-,-,-,-,-,-,-,-),
             nn_map(+,+,-,-,-,-),
             nn_map(+,+,-,-,-,-,-,-,-)]).

This is a good way of hiding the source data that is used to build the database: Load the rules with the source files specified, and then either remove all the source files, or set the list of source files to the empty list []. Also, if the path/file name of the pdb file is precededed by an environment variable, this can be a way of varying the database used without having to change the rules.


Lefthand Side Query Ordering

When trying to match the lefthand side of a particular rule, the system will attempt to select the optimum order for matching the individual rule components.  Thus, the order in which the left hand side is matched will not necessarily be the order in which it is written down.  Sometimes, you want the system to match the left hand side, or part of the lefthand side, in the order in which the rule was written.  This can be achieved by surrounding the part of the rule that you want matched in the order it is written between ![...]!.  For example
f1, ![f2, (f3 | f4), {p1}]!, f5 ==> f6.
will result in the system deciding whether to match f1, f5 or the more complex expression first.  But when it decides to match the complex expression, this will first match f2, then the disjunction (f3 | f4), and then the procedural attachment.

Note:  If the part of the rule to be matched in fixed order contains any macro or template calls, the expansions of the macros and templates will also be matched in the order in which they are written:  query optimization is turned off within the scope of  ![...]!.

Recursive Rules

By default, rules are not recursively applied to their own output.  This ensures decidability of the rewriting process.  There are special cases where a rule can be applied recursively to its own output without compromising decidability.  The system supports two special kinds of recursive rule, where in both cases the depth of the recursion is bound by the size of the input.

The first kind of recursive rule is appropriate when a recursive input structure is being decomposed into its parts.  For example, here is a rule to decompose a complex conjunction into its component conjuncts
and(%P, %Q)  *=> %P, %Q.
Suppose the input is and(a, and(b,c)).  The first application of the rule will break this down into a  plus and(b,c). With a non-recursive rule, this is how matters would remain.  However, the *=> indicates that the rule is to be reapplied to its output, until eventually there is nothing left to which it can apply.  Reapplication of the rule breaks the conjunction and(b,c) down into its component parts.  At this point there is nothing left to which the rule can apply, and the recursion terminates.  We are left with the input a, b, c.

Some things to note about this kind of recursion. First, the recursive rewrite is obligatory and never optional. (Optional rewriting would leave trigerring input still in place, guaranteeing that the recursion will not terminate).  Second, the output must never create new facts that might match the input, or again the recursion will not terminate.  The system tries to check for this non-terminating condition by ensuring that  at least some input must be consumed by the rule, and also that either (i) some of the output matches a negative condition on the input, (ii) no output  matches a positive input fact, or (iii) if it does, then it must be a subformula of some consumed input.  At present, there is no guarantee that these compile-time checks will catch all possible causes of non-terminating recursion, so user beware!

The second kind of recursion is more useful when trying to build structure up.   Suppose that we have a set of conjuncts, and we want to gather them together into a single list.  Here are two rules that bring this about
"First, set up an empty seed for the list of conjuncts.  Although the empty seed is produced multiple times, they will all be merged by the system into a single fact."

+conjunct(%P) ==> and([]).

"Now the recursion/iteration: The iterator before the ** collects the list of conjunct(%P) facts available before any rule applications take place. The rule following the ** is applied to each of the conjuncts in turns.  As you recurse / iterate down the list of the conjuncts, the output from the last rule application provides the input to the next.
Note that
[%P|%Cs] is the prolog-style notation for consing %P onto the head of list %Cs"

conjunct(%P) **
   [and(%Cs)
    ==>
    and([%P|%Cs])
   ].


With the input conjunct(a), conjunct(b), conjunct(c)these rules will produce an output of the form and([b,c,a]), where the order of the elements in the list is arbitrary.  Assuming that the iterator gathers the inputs in order a, c, b (this order is arbitrary), then rule applications proceed as follows:  The first application places a onto the head of the empty list, consuming the facts conjunct(a) and and([]), replacing them with and([a]). The second iteration consumes conjunct(c) and places c onto the head of the list in and([a]), consuming both to give and([c, a]). The final iteration places b onto the head of the conjunct list.  Since there are no more input conjuncts left, the iteration terminates.

In this kind of iterative recursion, the embedded rule can be either optional or obligatory.  The iterator can be any expression that is valid on the lefthand side of a rule.  The operational model is that all possible matches to the iterator expression are determined before any rule applications. Then for each rule iteration, one of the iterator matches is added to the lefthand side of the embedded rule, and the rule evaluated as normal.  

For both the fully recursive and iterative rules, it is important to remember that the scope of the recursion is only over a single rule.  It is not possible to set up a collection of rules that call on one another recursively.  However, careful use of disjunction in a single rule can achieve many of the effects that this would give.




Redefining Notation

The truly masochistic have the option of defining their own notation for transfer rules. To do this, you need to write a prolog direct clause grammar to define the new syntax. The DCG for the current syntax is shown here as a model. You need to adapt this to your own ends, using the module name alt_rule_dcg.  The start symbol of the grammar should be transfer_term.  To make a rule set abide by the new notation, the first non-blank line in the file should be
" PRS (filename) "
where filename states the location of the dcg file relative to the rule file.


Including Extra f-structure Information and Setting Transfer Options

When transferring f-structures, it is sometimes necessary to include extra information over and above the bare f-structure facts, e.g. c-structure facts or the root category of the grammar. This extra information comes in the form:
phi(CStrNode, FStrNode)
subtree(CStrNode, NodeLabel, LeftDtrNode, RightDtrNode)
terminal(CStrNode, TermLabel, SufaceFormIds)
surfaceform(SurfaceFormId, SurfaceForm, LVertex, RVertex)

rootcategory(Category)
taken directly from the prolog c-structure representations and  specification of the root category in the list of  f-structure properties. These facts can be manipulated in the normal way by transfer rules.

The easiest way of controlling what additional facts are to be included, plus other options, is from the rule file. For example, to ensure that transfer rules operate on both c- and f-structure facts, the following line should be present in the rule file
:- set_transfer_option(include_cstr, 1).
The options available are  currently, with non-default values shown:
:- set_transfer_option(include_cstr, 1).
       include c-structure facts

:- set_transfer_option(include_root_category, 1).
      include specification of root category of c-structure, in form  cf(1, rootcatgory(%C))
:- set_transfer_option(include_proj, 0).
      include f-structure projections

:- set_transfer_option(include_eqs, 1).
      include equalities from unnormalized f-structures as first order transfer facts (not recommended)

:- set_transfer_option(treat_subsumes_as_eq, 0).
      when set, treats any subsumption relations in the f-structure where the nodes are mutually subsuming as equalities, and ignores any other subsumptions.

:- set_transfer_option(include_fstr_properties, 1).
      include all the items from the input f-structure's properties as facts of the form   cf(1, fstr_property(%P))

:- set_transfer_option(extra, [cf(1, Fact1), ... cf(1, Factn)]).
      include specifed additional facts in transfer

:- set_transfer_option(conflict_resolution, 0).
      turn off the conflict resolution mechanism (see below)

:- set_transfer_option(conflict_resolution_limit, fail_after(100)).
      when conflict resolution mechanism is on, controls limits and behavior for detecting when the conflict is liable to be too large to resolve. The default value is ignore_after(30), which means that if more than 30 rule applications are in conflict, the conflict will be ignored. The example value fail_after(100) means that if more than 100 rule applications are in conflict, then transfer of the structure will be terminated since it is liable to time out anyway. (See below)

:- set_transfer_option(normalize, 1).
      ensure that equalities in input are normalized before running transfer

:- set_transfer_option(prune_final_choice_space, 0).
      clean out any parts of the final choice space that have no facts sitting under them

:- set_transfer_option(include_rule_traces, 1).
      include rule trace information as pseudo f-structure facts (see below)

:- set_transfer_option(xfr_history_limit, <Integer>).
      length of queue used to store history of previous transfer representations (see below)


It is also possible to set these options for a session, e.g. when running transfer from the XLE (see below)
prolog "set_transfer_option(include_cstr,1)."
prolog "set_transfer_option(include_root_category, 0)."
Another possibility is to include atoms like  include_cstr or include_root_category in the list of options passed to the transfer command.


Conflict Resolution

Conflicts arise in applying transfer rules when distinct applications of a rule attempt to consume the same resource. Suppose for example that we wanted to flatten out set valued ADJUNCT features, so that every member of X's adjunct set now becomes a direct adjunct of X. A plausible rule for doing this is

ADJUNCT(%X, %Y), in_set(%Z, %Y) ==>  ADJUNCT_REL(%X, %Z).
Application of this rule to the input
ADJUNCT(var(1), var(2))
in_set(var(3), var(2))
would lead to the output
ADJUNCT_REL(var(1), var(3))
But what happens if there are two or more members of the adjunct set?  According to the rule above, each set member will try to consume the ADJUNCT fact. But this fact can only be consumed once, and then it is gone. We should not resolve this conflict over resources by placing an ordering over set members (in the way that we place an ordering over transfer rules). For any ordering over facts is going to be aribtrary.  In any case, the intent of the rule is to replace all adjunct set members with ADJUNCT_RELs, not just the first one that gets to a match.

To resolve this kind of conflict, the system will split the context in which the adjunct fact lives into disjoint sub-contexts; one for each rule application in conflict. Thus each set member will completely consume the adjunct fact, but in distinct contexts. This would lead to input of the form
cf(1, ADJUNCT(var(1), var(2)))
cf(1, in_set(var(3), var(2)))
cf(1, in_set(var(4), var(2)))
cf(1, in_set(var(5), var(2)))
becoming
cf(A1, ADJUNCT_REL(var(1), var(3)))
cf(A2, ADJUNCT_REL(var(1), var(4)))
cf(A3, ADJUNCT_REL(var(1), var(5)))
While this is the appropriate way of resolving the resource conflict for the rule as written, it probably does not reflect the intention of the rule writer, which was to have all the ADJUNCT_RELs in the same, true, context.

There are three ways of achieving this. The correct way is to write the rule properly. Instead it should be
+ADJUNCT(%X, %Y), in_set(%Z, %Y) ==>  ADJUNCT_REL(%X, %Z).
ADJUNCT(%%, %%) ==> 0.
That is, consume all the in_set facts without consuming the ADJUNCT fact, and once this is done remove the adjunct fact.

The hacky way of achieving the desired result is to turn the system's conflict resolution mechanism off. With the mechanism turned off, copies of conflicted resources are made for each clashing rule application. Thus, the three matches of the original rule would apply to three copies of the ADJUNCT fact, all sitting in the true context.

The third way is to make the hackery more local. That is, leave conflict resolution turned on globally, but turn it off for individual rules. This can be done by using versions of the rewrite arrows prefixed with a plus:
+==>
+?=>
+*=>
These signal the usual obligatory, optional or recursive rules, but with conflict resolution turned off for the scope of the rule.

Rule Traces

The system automatically constructs a trace of the the rules that have been applied in mapping from input to output (see transfer command). It is sometimes convenient to include some of this information as if the rule traces were facts (of a special kind) in f-structure. The option
set_transfer_option(include_rule_traces, 1).
will ensure that traces are included as fstructure facts. These take the form (one for each application of each rule)
cf(C, in_set(rule_trace(RuleNum, ApplicationNum, 'LHS', 'RHS'),
             attr(var(0), 'RULE-TRACE)))
where RuleNum is the number of the rule, ApplicationNum counts which instance of a rule application gave rise to the trace, LHS is the instantiated lefthand side of the rule, and RHS is the instantiated righthand side of the rule.

Rule Indexing

The interpreter for the rewrite system indexes the lefthand sides of rules, so that it can often detect if a rule will fail to apply without having to attempt matching it.   A transfer grammar can place additional control  over indexing by including declarations of the form:
:- index_on(Predicate, Arity, ArgNum).
For example, suppose you have the following indexing declaration and rule:
:- index_on(PRON-FORM, 2, 2).

PRON-FORM(%X, he), CASE(%X,nom) ==> PRON-FORM(%X, il).
The declaration says that the second argument of the 2-place PRON-FORM fact is indexed.  The rule will only be picked up for possible matching if the input contains a PRON-FORM fact whose second argument is "he".  

Rules always implicitly have the indexing declaration
:- index_on(PRED, 2, 2).
which means that rules containing PREDS will only be picked up and tried if there is an exactly matching PRED fact in the input.  In a translation system, this means that rules that don't apply to words not occurring in the input sentences will not in general be processed.

In debugging mode, you can turn off all rule indexing by means of the command tdbg(skip_rule_indexing). (see below for how to invoke debugging).

Loading Multiple Rule Sets

You can have several different sets of transfer rules loaded at the same time.  Each set is identified by the grammar name given after the "grammar = ..." declaration.

The command list-transfer-grammars will list the currently loaded transfer grammars.  The command active-transfer-grammar will say which grammar is currently the active one.  The command activate-transfer-grammar <GName> will set a loaded grammar to be the active one.  

Labelling F-structure Nodes for Triples

The file $XLEPATH/include/node_label_rules.pl defines a number of templates that are useful if you want to convert f-structures to a triples format. In particular, it includes the definition of the template node_label, which will label each f-structure node with a term Pred:Id, where Pred is the PRED-FN of the node's PRED (if any), and Id is the lexical-id of the node's PRED.  These are used below in defining the default fs_triples rule set to mapping f-structures to unstripped triples.  The definition is somewhat complicated, in order for it to handle un-normalized packed f-structures, where interactions between node-labelling and node equalities can get very hairy.

XML Output

Special purpose transfer facts can be used to specify xml output from the transfer system; when used in conjunction with the xml_file output mode when calling transfer, this will construct xml.

The following is a simple example of some transfer facts and the xml it specifies.

top_xml_element(id1)
xml_element(id1,tag1, [attr(attribute1, value1), attr(attribute2,value2)])
xml_sub_element(id1, id2)
xml_sub_element(id1, id3)
xml_element(id2, tag2, [],[xml_elem(tag4,[attr(a4,v4)])])
xml_element(id3, tag2, [attr(a3,v3)])
which produces the xml
<tag1 attribute1="value1" attribute2="value2">
  <tag2 >
    <tag4 a4="v4"/>
  </tag2>
  <tag2 a3="v3"/>
</tag1>

Going through this example in more detail:

Some additional notes, advice, and warnings:

Choice and Ambiguity in XML

What happens if you attempt to construct xml from a packed/ambiguous transfer structure? The rules for constructing xml_element and xml_sub_element facts will place these facts under different parts of the choice space, in the normal way. How does this get interpreted on writing out the xml?

The xml write-out deals with choices and ambiguity by adding extra amb attributes to elements. The default (unspecified) value for this attribute is 1. If an xml-subelement applies in a different part of the choice space from its parent (note that this will always be a sub-part of the parent choice space), than an amb attribute will be added to the sub-element. If the sub element is in the same part of the choice space as the parent (even if that choice is not 1), then the amb attribute will be omitted, hence defaulting to 1.

There are two modes in which the values of the choices can be written out in xml. By default, the values will be arbitrary integers (which are in fact the names of the pointers to choice spaces inside transfer, but which have no special meaning in the xml). However, if a prolog call to

setp(print_full_xml_choices(1)).
has been made, then the choices will be printed out in full, readable form and boolean combinations of choice variables. Be warned that this can be very verbose.

Parse probablities are also added to the xml if a prolog call has been made to

enable_choice_probabilities(1).
This will add prob and prob_bucket attributes to every element.

State Facts

The use of state facts is intended as a clearer, more flexible replacement for the transfer history mechanism described below.

The transfer_seq/Arity family of prolog calls to invoke transfer have corresponding state_transfer_seq/Arity+2 calls, whose extra two argument thread a list of state facts into and out of the call to transfer. State facts take the form

state(Term)
When a list of state facts is threaded in, they are all added to the transfer facts in the true choice space. Transfer rules can then access and modify these facts, just like any other transfer facts. At the end of transfer, all state/1 transfer facts in the true choice space are collected together and provide the output state list. At present, these output state facts are also left within the transfer facts; this decision may need to be revisited.

Rules updating state facts might want to make use of the <1> operator to ensure that they are placed in the true choice.

Transfer History Mechanism

A history mechanism allows you to store a limited queue of previous transfer outputs. This is useful for such things as resolving cross-sentential anaphors, where the antecedent for a pronoun might occur in the transfer representation of a previously processed sentence.

To add a the current set of transfer facts to the history, use a rule with store_in_xfr_history as the sole item in the righthand side, e.g.

+possible_pronoun_antecedent(%%) ==> store_in_xfr_history.

This will store the transfer facts in association with a system-created unique identifier (so that different sentences can be named). If you want associate the facts with an identifier of your own choosing, use store_in_xfr_history(%Id), where %Id is the the chosen identifier.

The transfer history is held as a queue (last on, last off list), with a maximum length, as specified by the xfr_history_limit transfer option. When the queue reaches its allowed limit, storing another representation will push the oldest item off the list. To clear the current history, use a rule with delete_xfr_history as the sole item on the righthand side, e.g.

+start_of_document ==> delete_xfr_history.

Individual facts can be retrieved from the history by means of the following procedural attachment

{from_xfr_history(%Fact, %Id, %Displacement, %ChoiceSpace)}

where %Id is the identifier associated with the entire set of facts for the stored sentence. The choice space is an inactive representation of the choice space for the preceding sentence, and the %Fact is not located within that choicespace; in other words, the history mechanism currently ignores choice spaces. The Displacement show the position of the sentence in the history queue, starting at 1 for the oldest sentence; that is the higher the displacement number, the more recent the item. 

The procedural attachment

{current_xfr_history_displacement(%D)}

will access the displacement number of the sentence currently being processed. If the history queue is full, the current displacement will be identical to the xfr_history_limit. It may seem counterintutive having the displacement of the current sentence being a high number rather than zero. The motivation is that lexical-ids in fstructures are in ascending surface order, so that the end of a sentence the most recent preceding word is the one with the highest lexical-id number. Similarly, the most recently preceding sentence is the one with the highest displacement value.


Compiling Transfer Rules

Users with access to the SICStus prolog development environment can compile transfer rules into a format that can be reloaded more directly. The advantages of this are that (i) A complex set of rules, which includes many subsidiary rule files, can be dumped out as a single file; (ii) Any procedural attachments defined will be loaded in compiled rather than interpreted form, which means that they run more efficiently (iii) Any external databases will be built and dumped in binary form.
To compile rules, you must have the same version of SICStus prolog as was used when building the XLE: the version number can be determined by looking at the file $XLEPATH/bin/sp-*, e.g. sp-3.7.12 signifying version 3.7.12 of SICStus. Suppose that your transfer rules are defined in a file called my_transfer_rules.pl. You can create a Makefile like the following:

TRANSFER = ${XLEPATH}/bin/transfer.sav

.PHONY: my_transfer_rules.po

my_transfer_rules.po :
    echo "restore('${TRANSFER}'), \
          compile_rules('my_transfer_rules.pl'), \
          compile('cmp_xfr_my_transfer_rules.pl'), \
          save_files('cmp_xfr_my_transfer_rules.pl', 'my_transfer_rules.po'), \
          halt." | sicstus -f

From the shell command line, running
make my_transfer_rules.po
will read in the transfer rules file, and compile both the procedural attachments and any external databases (see below). After running make, an XLE call to prolog "load_files('my_transfer_rules')." will load the compiled set of rules. Loading the transfer rules in the usual way will ensure that fully compiled versions of the procedural attachments and databases are loaded, but the rules themselves will be re-read from source.

The prolog procedure build_pdb(+DBFile, +SourceFiles). can also be called from a makefile to compile an external database independently of a rule set.

Running the Transfer System

There are a number of different ways of running the transfer system.  The most straightforward is to run it directly from the XLE. This is most appropriate if you are developing / debugging transfer rules for f-structure to f-structure translation.  A second way of running the transfer system is through a number of shell commands, which are appropriate for batch processing transfer input to output files. A third way involves file based communication between separate XLE and transfer processes, and is useful programmers developing new functionality.  A fourth way involves embedding the XLE within a prolog-based transfer application. Finally, it is still possible to run things in the old style where the XLE communicates via sockets with a separate transfer server (should anyone still want to do this). The transfer system is currently implemented in SICStus prolog, but it is not necessary to install or obtain a license for SICStus to run the system.  Under Linux and Solaris it should run 'out of the box', but for MacOSX you may need to consult the platform specific notes.


Running Transfer Direct from the XLE

To set up the xle to run transfer invoke xle in the standard way and run the command

create-transfer

This will add extra commands to the menus in the f-structure and fs-chart windows. You also need to load a transfer grammar by means of the command load-transfer-rules (if you do not do so, the default f-structure to triples rules will be used). The following interaction is typical:

~ xle
XLE loaded from xle.
XLEPATH = /project/xle/current
....
Type 'help' for more information

% create-transfer
% create-parser /project/pargram/english/standard/english.lfg
.....
% load-transfer-rules /tilde/crouch/transfer_rules/rules1.pl
Initializing prolog engine
Loading prolog image at /project/xle/current/bin/transfer.sav
.....
% parse {The boy stood on the burning deck.}
%

The Commands menu for the f-structure window will contain the following additional items

The Commands menu for the fschart window will also have Transfer and Translate commands.

A number of transfer commands are available directly from the xle command line. The online help command transfer-help will list these. Amongst the more useful ones are


It is possible to invoke a much wider range of transfer commands directly from the XLE command line. This is done by preceding the command with prolog  and surrounding the command in double quotes,e.g..  
Loading the transfer component into the XLE installs a prolog engine, and the prolog command gives access to this functionality.   The prolog commands, surrounded in double quotes, should obey the usual rules of prolog syntax and are passed direct to the prolog engine. In particular, note the final period at the end of the command. The full range of functionality is documented in the Programmer's Guide, and online documentation is available through the command transfer-help, but the most useful commands are:

tdbg.
Turn on basic tracing of transfer rules.
tdbg(Monitor).
Turn on tracing of specified Monitor.
monitors.
List available debugging monitors
monitoring.
List currently active debugging monitors
no_tdbg.
Turn off all debugging monitors.
no_tdbg(Monitor).
Turn off specified debugging monitor
transfer_timing(1).
Print timing information about transfer. transfer_timing(0) turns it off.
transfer(In, Out, InMode, OutMode).
Transfer In to Out. InMode/OutMode can be fs_file, in which case In/Out is the name of a prolog f-structure file. Or it can be xfr_file, in which case In/Out is the name of a prolog file containing the transfer predicates.
timed_transfer(I,O,IM,OM).
Time limited version of transfer
set_transfer_timeout_limit(T1,T2,T3)
Set the time limits (in milliseconds) for time_transfer. T1 is the maximum time allowed for transfering un-normalized f-structures, T2 the maximum time allowed for normalizing them, and T3 the maximum time allowed for transfer the normalized structure.
reload_rules.
Reload the previously loaded transfer rule file
reload_rules(File).
(Re)load the rules in File.
print_compiled_transfer_rules.
This causes the transfer rule grammar to be listed with all templates and macros expanded and in a form that the system could accept as input.  This is useful for verifying that templates and macros have been expanded in the intended way.
set_active_transfer_grammar(Id).
It is possible to have multiple transfer grammars loaded at the same time. To make one of these the active one, use this command and specify the grammar name (Id) of the rule set; i.e the identifier in the grammar = Id. declaration, and not the name of the rule file. By default, the most recently loaded set of rules is active
restore_previous_transfer_grammar.
Re-activate the previous transfer grammar --- keeps on popping the stack
  
The use of debugging monitors and rule set specifications to monitor and control transfer is described in more detail below (Rule Sets). The tdbg (for transfer debug) commands turn monitors on and off for full rules sets, and are most useful for non-advanced debugging.

It may also be useful to know that the XLE and transfer still communicate data by writing it to files. Amongst other things, this makes trouble shooting easier. The files used are /tmp/$USER-xferin.pl   and /tmp/$USER-xferout.pl.

 Debugging and Rule Sets

There is a notion of a rule set that plays a central role in the transfer system.  Consider the following interaction, in which a transfer grammar is loaded into the system The system reports that a set called "verb" is initially defined with rule 32 as its only member.  Later in the course of loading the grammar, this set comes to contain rules numbered 32 through 34, to which rule 38 is added still later. The last line concerning the "verb" set of rules reads "[r(32,34),r(38,38)]" indicating that the set contains rules 32 through 34 and 38 through 38. After all the rules have been loaded, all of them become part of the set called "active" as evidenced by the line "active: [r(0,0)]". If, as in this case, the second member of the pair of rule numbers giving the members of a consecutively numbered subset is 0, this is equivalent to the highest rule number in the grammar.

The definitions of the "verb" rule set in the above example arise because part of the grammar file looks somewhat like this:
 

The appearance of a single word—in this case "verb"—where a definition or a rule would otherwise be expected causes the next rule in the file to be added to the set with that name.  If the name is preceded by a "+" sign, as in the second instance above, then not only the next rule is added to the set, but all rules starting with the last addition to the set and extending down to that point.  In the above example, we suppose that the "add -> ajouter" rule is the 32nd in the file and we have seen that this is the first rule to be added to the set.  Next, the rule "button -> bouton", number 34 is added, and finally, all rules between there and "can -> pouvoir", rule 38, are also added.

The utility of rule sets lies in the fact that sets with certain names are treated specially by the transfer system.  In particular, the set named "active" contains the rules that will be invoked when the grammar is applied to a set of predicates. As the above example illustrates, all the rules in a grammar become members of this set when the grammar is loaded. After a grammar has been loaded, however, the active set can be redefined by typing the appropriate command to the xle shell.  To remove rules 23 through 26 for example, it is necessary to redefine that active set so that it has its original contents minus these two rules.  This is accomplished by the following command to the xle shell:

This causes the following line to appear in the transfer window given the new contents of the set: In the following, rule 25 is activated again: The following line appears in the transfer window: Notice that the command must always be terminated by a period. By typing the monitoring command with no arguments, so that the period follows "monitoring" immediately, one can obtain a listing in the transfer window of all currently defined sets.

Rule Set Names

Any number of rule sets can be defined at any time with arbitrary names of the user's choosing.  However, certain names, like "active", are used for sets that the system will treat in a special way. Some of those are of interest only to system developers. Those in the list given below are, however, of more general utility.  Most of them cause information on the progress of the transfer component to be supplied to the user.  One of them—active—as we have seen, determines which rules in the transfer grammar will actually be used in the process. For some sets, the specific membership of the set is not important because it will be used simply to activate or deactivate a part of the system without reference to specific rules.  For convenience in these cases, the user can specify the content of the set as either "yes" or "no", which will be taken as referring to the sets of all rules and no rules respectively.
  
active This set contains the rules that will be used when the transfer component is applied to a set of predicates. When a grammar is read into the system, it is automatically set to cover all of the rules. It is sometimes convenient to remove certain rules from the set after a grammar has been loaded or to activate only a small set of rules for testing purposes.
compile Giving this a value other than the empty set causes detailed information on each rule to be displayed as the rule is loaded into the system. The listing shows in detail what templates and macros are involved in building each rule and how they are expanded.
detail Details on the attempt to match rules in this set against the current set of predicates is displayed.
input Before the attempt is made to apply any rule in this set, the complete current predicate set is displayed.
match_rule For the rules in this set, details beyond those given for the detail set are displayed as the matching process is carried out.
rule When the transfer system considers a rule in this set for application to the current predicate set, it displays the rule and, if it matches, shows what predicates it matches against.  Note that there can be many rules that are not considered at all because they can be eliminated early for lack of a key predicate.
success A message will be displayed when any rule in this set succeeds.
  

Defining Rules Sets in terms of Rule Sets

A rule set can be defined by typing to the parser shell, where names(s) and member(s) are as we shall now describe.

names(s) is either a single name, beginning with a lower-case letter, or entirely enclosed in single quotation marks, or is a set of names in square brackets and separated by commas. These are the names to which the monitoring command will give a new value. The new value associated with a particular name may be defined, partially or completely, in terms of the current values of any names, including those being defined.

member(s) is an expression over range specifiers. There are four kinds of elementary range specifier, namely:

  1. yes: an alternative notation for the set containing the whole grammar, or [r(0, 0)].
  2. no: the empty set.
  3. An integer.
  4. The name of an existing set.
Expressions over specifiers are composed with the operators ++, --, and /\. If R and S are specifiers, elementary or complex, then
  1. R ++ S is their union of the sets R and S;
  2. R -- S is the set R minus any of its members that are in S;
  3. R /\ is the intersection of R and S.


The translate command

It is also possible to translate directly, using a translate command similar to XLE's parse command:

translate {Ed slept.}

If  transfer rules and parser and generator grammars have not been loaded, then default ones will be used, as specified in the file $XLEPATH/bin/trnalsate.tcl. To  use non-default settings, either edit the file or use the command  create-translator with arguments specifying the grammars, rules and gen-adds.  For example, to edit the file

                              

The first three settings provide (full) file names giving the locations of the parsing grammar, the generation grammar, and the transfer grammar.  The fourth argument is a string in double quotation marks of the form "addonly word1, word2 ... wordn", where the words are the names of attributes in the generation grammar whose values may be left unspecified in the files that are output by the transfer component and input by the generator.

To  use non-default settings, the command  create-translator can be run with arguments specifying the grammars, rules and gen-adds. For example

The arguments specify (i) the parser grammar, (ii) the generator grammar, (iii) the transfer rules, and (iv) the gen adds.   As the example illustrates, the default value for a parameter can be left in place by simply supplying an empty string—adjacent double quotation marks—in place of the corresponding parameter to create-translator.

The create-translator command does not install any transfer related command buttons on the XLE's windows. A variant of the command also, create-translator-menu installs these additional menu items. It takes exactly the same arguments as create-translator, but additionally sets up the following items on the f-structure and fs-chart  windows
To run  transfer using these menu items, first parse a source sentence, e.g.
% parse "The printer stops."
Then use the pull-down menus on  the f-structure and/or fs-chart windows.

Transfer Shell Commands

A number of  shell commands allow the transfer system to be run independently of the XLE. The transfer  command allows one to transfer a set of source files to a set of target files.  The triples  command allows one to transfer f-structure files to triples files in the format of the PARC 700 dependency bank. It also has a matching option that compares the similarity of f-structures by mapping them to dependency triples and calculating the precision and recall of the dependency match. The extract  command allows one to specify search patterns to extract corpus information from an f-structure bank of parsed sentences.

The transfer Shell Command

This command runs in three modes: interactive, server and batch.  In interactive mode, the command places you in a prolog emulator that allows direct access to transfer functionality, similar to the XLE  prolog  command, but without the need for  prefixing or double quotation.  In server mode, a transfer server is set up on a host and port specified in the command line. In batch mode transfer rules can be applied to files specified in the command line.

Interactive transfer is invoked as, e.g.
The --init argument is optional and specifies the name of a file containing prolog commands, which is consulted to provide user specific initialization on start up. Commands can be entered at the prompt, just as if you were sitting inside a normal prolog interpreter.  However, there are no prolog debugging commands, and code can only be consulted, not compiled. The interactive command is principally useful for setting up a transfer listener.

The transfer server in invoked as, e.g.
~ transfer server localhost 2458
Initializing prolog engine.
Loading prolog image at /project/xle/current/bin/transfer.sav.

Starting server on port 2458
This option is probably only useful if you are masochistic enough to try setting the XLE up with a transfer server

Batch transfer can be invoked with two different sets of arguments.  In one mode, it is assumed that the input files come in a numbered sequence   <InStem>M.pl  to <InStem>N.pl,  e.g. /project/nltt/data/fs1.pl  to /project/nltt/data/fs75.pl.  The results of transfer will be written out as <OutStem>M.pl  to <OutStem>N.pl.  Both <InStem> and <OutStem> are specified as command line arguments. The command line arguments can be provided in any order and are:

   transfer
    
--inStem   <FileStem>            % To condense multiple, consecutively
      --outStem  <FileStem>            % numbered, files
      --from     <Number>
      --to       <Number>
      --inMode   <fs_file|xfr_file>    % Type of input/output file
      --outMode  <fs_file|xfr_file>
      --rules    <FileName>            % File containing transfer rules
      --select                         % Optional: transfer only selected
                                       % input analyses
      --init     <InitFile>            % Optional: specify a file containing
                                       % parameter settings for transfer


 Example:
 transfer  --inStem /project/nltt-2/TESTDATA/10-01-02/S --inMode fs_file --outStem /tmp/T  --outMode xfr_file --from 1 --to 700 --rules /tilde/thking/pred_arg_rules.pl

This reads in the f-structure files /project/nltt-2/TESTDATA/10-01-02/S1.pl  to S700.pl, applies transfer rules to the packed structures, and writes out files of packed transfer structures in  /tmp/T1.pl  to  T700.pl.  The arguments to the command should all be included on a single line.  If any files are missing from the numbered sequence, a message will be printed saying that they are missing and the command will continue onto the next file.

The second way of invoking batch transfer allows for the use of wildcards to specify the input files

  transfer
      --inMode   <fs_file|xfr_file>    % Type of input/output file
      --outMode  <fs_file|xfr_file>
      --rules    <FileName>            % File containing transfer rules
      --select                         % Optional
      --outStem  <FileStem>            % Prefix for output files
      --inFiles  <Files>               % List of input files


 Example:
 transfer   --inMode fs_file --outMode xfr_file --rules /tilde/thking/pred_arg_rules.pl --inFiles /project/nltt-2/TESTDATA/10-01-02/S*.pl --outStem /tmp/T_  --select
         
This reads in files /project/nltt-2/TESTDATA/10-01-02/S*.pl, transfers only the selected analyses and writes the results to files /tmp/T_S*.pl.  The names of the output files are obtained by stripping the directory prefix from the input file (if any), and then prefixing the resulting file name with the value of the --outStem argument.

The --init option allows one to load a file of prolog commands to set up parameters controlling the behaviour of the transfer system.   The contents of an init file might look something like
% Transfer initialization file

% define an initialization procedure:
init :-
     set_transfer_timeout_limit(0, 100000, 1000000),
     set_transfer_option(include_cstr, 1),
     set_transfer_option(no_select, 0),
     set_transfer_option(include_proj, 0).

% Run the initialization procedure
:- init.

The triples Shell Command

The triples shell command has a batch and an interactive form (but no server form). Interactive triples is invoked as, e.g.
As with interactive transfer, this command is principally useful for setting up a triples listener, to assist in the debugging of transfer rules mapping f-structures onto dependency triples.

In batch mode, the triples command can be used in two ways.  The first is to map f-structure files onto dependency triples files in the PARC 700 Dependency Bank format. Usage is either
 
triples transfer
      --inMode   fs_file | dep_file   % Type of input file
      --inStem   <FileStem>           % To transfer multiple, consecutively
      --outStem  <FileStem>           % numbered, files
      --from     <Number>
      --to       <Number>
      --rules    <FileName>           % File containing transfer rules


 Or:
 
triples transfer
      --inMode   fs_file | dep_file  
      --inFile   <FileName>           % To transfer single file
      --outFile  <FileName> 
      --rules    <FileName>           

Example:
triples transfer --inMode fs_file --inStem /project/nltt-2/TESTDATA/10-01-02/G --outStem /tmp/T  --from 1 --to 700 --rules /tilde/thking/triples_rules.pl

This reads in files /project/nltt-2/TESTDATA/10-01-02/G1.pl .. G700.pl, converts the f-structures to dependency triples and writes out the files  /tmp/T1.fdsc ... T700.fdsc (by convention, triples files have a .fdsc suffix). As the --inMode argument options indicate, it is possible to read in triples files and re-map them through the transfer rules --- an example of transfer being applied to non f-structure based input.

The triples command can also be used to compare f-structures, either to each other or to triples-based gold standards. Source and comparison (target) structures are converted to triples if they are not already in that form, and then
 
triples match
   --sourceMode  fs_file | dep_file
   --targetMode  fs_file | dep_file
   --matchMode   selected | best | average
                        % selected will match only selected source analyses
                          (e.g. from a packed fs with selections marked)
                          against the target)

                        % best will match all source analyses (not just
                          the selected ones) to find the best match.

                        % average will also match all source analyses,
                          and determine not only the best match but also
                          the overall match rate

   --rules  <FileName>  % transfer rules to use to map fs to deps, or
                          reconfigure deps

   --reConfigure        % Optional: if specified any source or target
                          dependency files will be reconfigured by
                          passing them through the transfer rules
   --matchLimit <N>     % Optional: if specified only the first N parses will be matched
                          Default is 50,000
   --diff               % Optional: if specified, will print out unmatched source and
                          target facts for each sentence (selected or best mode only)

   --sourceStem  <FileStem>    % To match multiple, consecutively
   --targetStem  <FileStem>    %   numbered, files
  
--rebankStem  <FileStem>    % Optional: to store best matching structure
   --from        <Number>
   --to          <Number>

 Example:
triples match --matchMode best --sourceMode fs_file  --sourceStem /project/nltt-2/TESTDATA/10-01-02/G  --targetMode dep_file  --targetStem /project/nltt-2/new-depbank/gold1-700-files/G  --from 1 --to 700  --rules /tilde/thking/triples_rules.pl
This matches the f-structure with the dependency files, in each case trying to find the source analysis that best matches the target dependency.  Note that in matching dependency structures, we cannot assume that the numerical indices in source and target will be identical.

In place of the --sourceStem, --targetStem, --from and --to arguments, it is instead possible to specify matching of a single pair of files using the the arguments

   --sourceFile  <FileName>   % for single file matching
   --targetFile  <FileName>
   --rebankFile  <FileName>

To match f-structures via triples requires a set of transfer rules for converting f-structures to triples. As a guide, the default set of rules for mapping  f-structures from an English grammar can be seen here. If no rule file is specified, then a default set of rules is used to map f-structures to triples, and preserves all f-structure information.

The extract Shell Command

The extract command is intended for use as a way of extracting corpus information from a corpus of parsed sentences. For example, a transfer rule to detect possible relational nouns might look for all nouns that occur in parses where they are modified by an of  phrase, as follows

 
+NTYPE(%X,%%), +ADJUNCT(%X,%Y), +in_set(%Z,%Y), +PRED(%X,%Head),
              +PRED(Z,of), +PTYPE(%Z,%%), +OBJ(%Z,%W), +PRED(W,Mod)
    ==>  ccollect(of(%Head,%Mod)).


where ccollect is a special predicate used to accumulate corpus properties.   Assuming that this rule is included in a transfer file of search properties, search_rules.pl, we can look for candidate relational nouns in a parsed corpus e.g. as follows

         extract  /tilde/crouch/corpora/search_rules.pl  ../eureka/fs*.pl

This will print a frequency sorted list of   of(Head,Mod)  occurrences.

The unpack Shell Command

As a convenience, the command

unpack PackedFile UnpackedFile

will take a packed file (fs_file, xfr_file), and unpack all the individual analyses into a sequence of unpacked structure in a single file UnpackedFile.

Transfer Listeners

The simplest way of running transfer is undoubtedly to call it directly from the XLE (Running transfer direct from the XLE). However, there may be circumstances when this is not entirely appropriate. For example, there are cases when the output of transfer is not intended to be an f-structure even though the input is.  Converting f-structures to dependency triples is an example of this.   Another example is where a prolog programmer is trying to develop additional functionality based around the transfer component, and is hampered by the lack of debugging tools in the runtime prolog engine loaded into the XLE.

A transfer listener is a useful alternative in such cases.   It relies on setting up two separate processes that communicate through files.  One is an XLE process, invoked in the standard way. Another is a transfer process, perhaps invoked by using the interactive options on one of the transfer shell commands.    The transfer process enters into a dumb listener loop, where it waits for user interactions with the XLE to write files in a specified location.  When it finds these files, it processes them, displays results etc., and then returns to listening out for new instances of the files.

Two different listeners are included with the XLE distribution (the programmer's guide explains how you can write your own prolog listener loop should you want to). One is a general transfer listener, and the other is a dependency triples listener.  The latter is useful for debugging transfer rules mapping f-structures onto dependency relations, and we will looks at the use of this listener in detail.

The triples Listener

To run a triples listener, first make sure that you have an XLE process running.  Then from the XLE run the command create-listener (with no arguments) to install the necessary listener menu items on the f-structure and fs-chart windows:
% create-listener
You should also make sure that you have a separate interactive triples process running, and run the triples command
~ triples interactive
Initializing prolog engine.
Loading prolog image at /project/xle/current/bin/triples.sav.
This is a prolog reader.  Rules of prolog syntax apply.
Type  halt.  to exit;  help. for information

prolog>   triples.

%Waiting for XLE communication on /tilde/crouch/.transpipe1 or /tilde/crouch/.transpipe1 ...


As the waiting message indicates, the triples listener is now just waiting to receive input from the XLE, and is expecting to find it in the file indicated (located in the user's home directory --- this means that the XLE and triples processes do not have to be running on the same machine provided that both machines have access to the user's home directory). If you parse a sentence from the XLE you can access the listener menu items.  To load a set of transfer rules, you can click on the "Reload rules" button. Two things should happen.  (1) If this is the first interaction between the XLE and the listener, the XLE should print out a message confirming that it has successfully established file-based communication, or a message indicating a failure to communicate through what it thinks is the correct file. By comparing the names of the files the listener is expecting to use and the files the XLE is expecting to use, one can try to diagnose any problems.  (2)  The listener window should display a prompt asking you to enter the file name of the rules you want to load. If you just hit return, the previously loaded file will be reloaded. Or otherwise, you can specify a new rule file to load.  

Having loaded the rules, you can now select the "Transfer" button in the XLE window. The triples listener  will display the dependency relations derived from the f-structure and then return to a dumb prompt saying  %Ready...  Typing things in at this listener prompt will have no effect.

If you want to break out of the listener loop to enter transfer commands, e.g. to adjust the debugging monitors, you must first click on the "Break" button in the XLE windows. The listener will return to an active prolog>  prompt, at which you can enter commands. For example, you can control whether or not the listener displays the input predicates to transfer by means of calling show_xfr_input or dont_show_xfr_input. To return to the listener loop, just enter the command triples. again.

Two additional XLE menu items allow you to turn basic transfer tracing on and off without having to break out of the listener.

The transfer Listener

The transfer listener operates in the same way as the triples listener. You need to run create-listener in the XLE and have a separate interactive transfer process running
~ transfer interactive
Initializing prolog engine.
Loading prolog image at /project/xle/current/bin/transfer.sav.
This is a prolog reader.  Rules of prolog syntax apply.
Type  halt.  to exit;  help. for information

prolog>   xfr.

%Waiting for XLE communication on /tilde/crouch/.transpipe1 or /tilde/crouch/.transpipe1 ...

The command xfr. is used to set the listener running. The transfer listener shows the output of transfer in transfer predicate notation, before it is converted back to f-structure.  You can also control whether or not the input transfer predicates are displayed by means of the show_xfr_input or dont_show_xfr_input commands.

Transfer Servers

Previously, the only way of integrating the XLE and the transfer system was to have the XLE start up a seprarate transfer server and communicate with it via sockets. It's hard to see why anyone would still want to do this, but this possibility has been maintained.  To start a transfer server, use the XLE command create-translation-server-menu, with the same arguments specifying parser and generator grammars, transfer rules and gen-adds as the create-translator-menu command, e.g.
 ~ xle
XLE loaded from xle.
XLEPATH = /project/xle/current.
Type 'help' for more information.

% create-translation-server-menu "/project/pargram/english/homecentre/english-hc.lfg" "/project/pargram/french/homecentre/french-hc.lfg" "/project/pargram/trans/eng_to_fre_rules" ""
%
This will load the grammars, and create then an iconified xterm window named Transfer_Server.  If you open this window you should see something like the following
Loading prolog image located at transfer.sav
Starting transfer server on localhost 2548

Processing "nl,write(Transfer system is ready ...),nl,nl"

Transfer system is ready ...

Processing "force_load_rules(/project/pargram/trans/eng_to_fre_rules)"

active: no monitors.
active: [r(0,0)].

Sometimes the Transfer_Server window only blinks into existence and then dies. In this case, it is worth parsing a sentence and clicking on one of the transfer server menu items.  This will often succeed in starting the server up again.

You can send prolog commands direct to the transfer server by means of the XLE's pl command. This is analogous to the prolog command, and the transfer command must be in proper prolog syntax, period terminated, and surrounded in double quotes. The server should respond to this by printing a Processing "<Command>" message in its window, and executing the command.

You should note that there is currently no server-based analogue of the XLE's translate command.   The only way to run server-based transfer is through pull-down menus.

As with running transfer directly from the XLE, data is passed between XLE and transfer by means of the files /tmp/$USER-xferin.pl   and /tmp/$USER-xferout.pl.


Platform Specific Notes

MacOSX: The environment variable DYLD_LIBRARY_PATH must be set to include $XLEPATH/lib. To set this variable, invoke
sentenv DYLD_LIBRARY_PATH $XLEPATH/lib
if the environment variable is undefined, or
setenv DYLD_LIBRARY_PATH $XLEPATH/lib:${DYLD_LIBRARY_PATH}
otherwise. (There is no need to set the corresponding LD_LIBRARY_PATH variables under linux or solaris).

MacOSX: As documented in the SICStus release notes, sometimes the default limit on the process's data-segment is unreasonably small, which may lead to unexpected memory allocation failures. To check this limit, do
          tcsh> limit data
datasize 6144 kbytes
bash> ulimit -d
6144
This indicates that the maximum size of the data-segment is only 6 Mb. To remove the limit, do
          tcsh> limit datasize unlimited
datasize unlimited
bash> ulimit -d unlimited
bash> ulimit -d
unlimited
  Note: limit (ulimit) is a shell built-in in csh/tcsh (sh/bash). It may have a different name in other shells.


MacOSX and Solaris: It has been obseverved that with large parser and generator grammars loaded, there is sometimes either insufficient remaining memory to load the prolog engine, or to apply the transfer rules.  This has not so far been observed under Linux, and the cause is still under investigation.  If this is a problem, you should use some other way of interfacing XLE to transfer than the direct embedding of transfer within XLE.

Programmer's Guide to the Transfer System

The following is intended for those who have access to sicstus prolog, and wish to embed transfer functionality in their own applications. The transfer system and its related applications (triples mapping and matching, property extraction from f-structure banks) are distributed as sicstus prolog saved images plus associated shell commands

$XLEPATH/bin/transfer.sav & transfer
$XLEPATH/bin/triples.sav & extract
$XLEPATH/bin/extract.sav & triples

This unfortunately means that you don't have access to the source code. But you can load the images into prolog and develop additional functionality based around them:

| ?- restore('$XLEPATH/bin/transfer.sav').

For this to work, you will need to be running the same version of sicstus as the image was saved under, which will usually be the latest release. You can tell which version of sicstus was used by looking at $XLEPATH/bin/sp-{Version} (e.g. $XLEPATH/bin/sp-3.11.0 means sicstus version 3.11.0). This directory contains the runtime prolog system, which (a) can be distributed without a sicstus license, and (b) is required in the same directory as the transfer, triples and extract shell commands (see sicstus release notes on distributing runtime systems).

Restoring the transfer.sav image makes the following prolog predicates available, which are described in greater detail below

Top level calls to transfer:
  
  transfer_seq/5, state_transfer_seq/7, 
  transfer_seq/4, state_transfer_seq/6,
  transfer_seq/3, state_transfer_seq/5,
  transfer_seq_charlist/5, state_transfer_seq_charlist/7,
  transfer/4, transfer/5, transfer/2, timed_transfer/4,
  timed_transfer/5,   transfer_facts/2, transfer_input/3,
  transfer_input/4,   transfer_output/3, transfer_output/4, 
  transfer_files/7,transfer_files/6,transfer_files/5,transfer_files/4,
  transfer_timing/1, set_transfer_timeout_limit/3,
  set_transfer_option/2,   load_rules/1, reload_rules/0,
  reload_rules/1, force_load_rules/1,
  print_compiled_transfer_rules/1,   monitor/1, monitor/2,
  monitor/3, monitoring/0,   monitoring/1, monitoring/2,
  monitors/0,   tdbg/0, tdbg/1, full_tdbg/0, vfull_tdbg/0,
  no_tdbg/0,   xfr/0,xfr_help/0,main/0, show_xfr_input/0,
  dont_show_xfr_input/0,   run_transfer_reload/0

Conversion between f- and transfer structures:
   xfr2fs/2,write_xfr/1, write_xfr/2,
write_xfr_no_trace/1, write_xfr_no_trace/2
fs2xfr/2,fsfacts2preds/5

Transfer interface predicates:
   listen/2, communication_pipes/3
start_server/2, sever_loop/1
my_prolog_loop/0, my_prolog_loop/2,
print_help/0, call_string/1
xle_exec/1, xle_exec/2, silent_xle_exec/3,
xle_exit_all/0, check_xle_running/0

XLE library calls:
   init_xle/2, create_parser/2, parse_sentence/4,
next_graph_solution/3, free_graph_solution/1,
reset_storage/1, create_graph/2, create_generator/2,
generate_from_graph/5, print_net_as_regexp/4,
read_prolog_graph_file/3, print_prolog_graph_file/2,
make_new_choice_disjunction/3, create_disjunction/4,
get_choice/4, conjoin_clauses/5, disjoin_clauses/4,
subtract_clause/4, negate_clause/4, not_clause/3,
assert_nogood/2, assert_nogood/7,
evaluate_clause/3, evaluate_choices/2, covers_clause/3,
count_solutions/2,
get_edge_solutions/2, first_dnf_solution/3, next_dnf_solution/3,
set_solution_choice_values/2, xle_true_context/1, xle_false_context/1,
use_primary_choice_space/0, use_alternate_choice_space/1,
reset_choice_space/0, reset_choice_space/1,
create_fs_choice_space/3, xle_context/3,
xle_safe_context/3, select_choice/1,
set_choice_values/2, unpack_choice_space/2,
xle_unpack_fstr/3, collect_true_facts/3,
name_internal_choices/2, name_internal_equivs/2,
name_internal_context/2,
ext2int_contexts/3, ext2int_contexts/5,
int2ext_contexts/5, named2ext_contexts/6,
named2int_contexts/5, int2named_contexts/5,
ext2named_contexts/6, write_fs/1, write_fs/2,
write_cf_list/1, write_cf_list/2,
write_named_context/1, write_named_context/2,
write_named_context_no_commas/1,
write_named_context_no_commas/2,
fs2graph/2, graph2fs/2

Miscellaneous utilities:
   strict_member/2, strict_memberchk/2,
vartail/2, vt_append/2, vt_member/2,
list_to_vtlist/2, null_vtlist/1,
unkey/2, concat_list/2, generated_symbol/2,
time_call/1, time_call/2,
time_msg/1, time_msg_if/2, reset_time_msg/0,
contains_somewhere/2,
setsys/2, getsys/2, setp/1,
pp_debug/0, nopp_debug/0,
pp/1, pp_underscore/1,
format_if/3, format_if/4,
format_level/1,
get_option/3, get_optional/3, get_option_list/4,
add_dir_slash/2, file_name_concat/3,strip_file_suffix/3,
file_suffix/2, dir_and_file/3,
atom_to_num/2, assert_number_of_digits_in_file_numbers/4,
numbered_file_name/4


Top level calls to transfer

transfer(+In,-Out,+InMode,+OutMode,+Options)
transfer(+In,-Out,+InMode,+OutMode)
transfer_seq(+In,-Out,+InMode,+OutMode,+RuleSequence)
    Applies loaded transfer rules to In to produce Out.  InMode and OutMode 
    specify the format of the input and output. Options is a list specifying 
    any further manipulations of In and/or Out to be carried out before/after 
    transfer.  transfer/4 calls transfer/5 with Options=[]
 
    InMode/OutMode can be one of :
     fs_file | fs | xfr_file | xfr | xle_graph  
    where
      fs_file means In/Out is the name of  a prolog f-structure file
      fs means In/Out is a prolog f-structure,
         i.e. fstructure(Sentence,Proprerties,Choices,Equivalences,FS,CS)
      xfr_file means In/Out is the name of a prolog transfer-structure file
      xfr means In/Out is a prolog transfer structure, 
         i.e. xfr(Choices,Equivalences,Equalities,Facts,Doc)
      xle_graph means In/Out is an integer serving as an aligned pointer 
         to an xle-internal f-structure (see XLE library calls).
   OutMode can additionally include:
      xml | xml_file

   RuleSequence is an atom, comprising a space separated sequence of
      ruleset names, specify the sequence of rulesets the input must
      be passed through to create the output
  
   Options are
    no_select: 
       Ignore any choice selections marked on the input, and apply transfer 
       to the whole packed input
    include_cstr: 
       Include c-structure facts along with f-structure facts.
    include_root_category: 
       Include the root category along with f-structure facts, in case you want to
       rewrite it.
    include_proj: 
       Include f-structure projections.
    include_eqs: 
       Include any un-normalized equalities from the input f-structure in 
       with the transfer Facts. Normally, these equalities are only included 
       with the transfer Equalities.  By including them in with the facts, 
       transfer rules can explicitly manipuate equalities.  (This is not 
       highly recommended, since the same equalities are still added to the 
       Equalities, and the transfer system matches facts with reference to 
       these equalities)
    extra([Fact|Facts])
       Add the specified extra facts to the transfer input. The facts must 
       take the form cf(Context, Predication) where Context is a boolean 
       context, probably 1. Predication is a basic transfer fact.
    Typically, however, options are specified in the individual rulesets.

state_transfer_seq(+In,-Out,+InMode,+OutMode,+StateIn,-StateOut,+RuleSequence)

   Just like transfer_seq/5, except that it passes in a list of state facts
        (of the form: state(Fact))
   and collects a list of state facts from the output of transfer


transfer_state_facts(+XfrStructure, -StateFacts)
   Given a transfer structure, will collect together any state facts
   in the transfer facts.
A note about transfer structures. The transfer system is intended to provide a general purpose contexted rewriting system, with a contexted f-structure to f-structure rewriting system as a special case. Therefore the input and output to transfer are transfer structures. F-structure input/output must be mapped to/from transfer structures. A transfer structure is a 5-tuple

         xfr(Choices,Equivalences,Equalities,Facts,Documentation)

where Choices defines a choice space (as in f-structure) and Equivalences define any context abbreviations (as in f-structure) Choices and contexts in general may occur in one of two forms.
  1. As an external/prolog choice space, where contexts are represented as prolog variables, and the Choices is a list of the form
          [choice([A1,A2,A3],1), choice([B1,B2],A1),...]
    That is: as will be familiar from prolog fstructures
  2. As an internal/xle choice space, where contexts are integers corresponding to pointers to xle data structures representing contexts, and Choices = choice(Integer), where Integer corresponds to a pointer to the xle data structure for the choice space. xfr_files use only external, prolog choice spaces.
Equalities allow the transfer system to deal with input from un-normalized f-structures. These equalities are typically gleaned from the cf(C,eq(var(99), var(101))) type FS facts that litter un-normalized fs-charts.

Facts have to be contexted, and take the form
                cf(Context, Predication)
where Context is an internal or external context from the choice space, and predication can be any (ground) prolog term, e.g. SUBJ(var(0), var(3)) or give('Fred', 'Ed', book31). It is recommended that all Predications are ground (i.e. contain no prolog variables). Transfer matching against non-ground predications is liable to be unpredictable.

Documentation is a list of arbitrary prolog terms, providing whatever additional documentation is deemed necessary (cf Properties in prolog fstructures). This includes a term, number_of_solutions(N), which reports the number of solutions in the packed transfer structure.Transfer output also includes a list of rule traces as part of the documentation, where the rule traces record which rules were applied how; this is intended to support such things as stochastic selection of transfer output. The form of a rule trace is
         '$rule_trace'(RuleNum, ApplicationNum, LHS, RHS, MatchCtx, ApplyCtxs)
where




load_rules(+RuleFile)
reload_rules(+RuleFile)
force_load_rules(+RuleFile)

 Load the specified transfer rule file. load_rules/1 will not load
anything if a transfer rule file has already been loaded.
reload_rules/1 will load, even if a rule file of the same or of a
different name has already been reloaded, overwriting any previously
loaded rules. This function also catches any exceptions (e.g.
specified file does not exist) and prints out an error message before
failing. Because it catches exceptions, the C interface to transfer
uses this procedure to load rules. force_load_rules/1 is like
reload_rules, but does not catch exceptions

reload_rules
Re loads the previously loaded rules file (catches exceptions)

print_compiled_transfer_rules+(File)
Prints the compiled / expanded transfer rules to File

timed_transfer(+In,-Out,+InMode,+OutMode,+Options)
timed_transfer(+In,-Out,+InMode,+OutMode)
As for transfer/5 and transfer/4, except that time
out limits are imposed. There are three limits for
(i) un-normalized transfer, (ii) normalization of transfer
input, and (iii) normalized transfer. In the first
instance, transfer is run on un-normalized inputs.
If this times out, and the input was un-normalized
(i.e. contained equalities), then the input is normalized,
and transfer is run once again on the normalized input.
If the input was already normalized (i.e. no equalities),
then nothing more happens after the first timeout.
Setting a time limit of 0 for un-normalized transfer ensures
that input is automatically normalized prior to transfer

set_transfer_timeout_limit(+UnNormalizedXfr, +Normalization, +NormalizedXfr)
Set the time out limits (in CPU ms) for timed_transfer.
Default is set_transfer_timeout_limit(10000,100000,100000)

set_transfer_option(+Option, +Value)
Sets the default values for the Options argument to transfer,
to be used whenever this argument is not explicitly provided, e.g.
transfer/4. Options and values are:
no_select 1 | 0
include_cstr 1 | 0
include_proj 1 | 0
include_eqs 1 | 0
extra [List of extra facts]


transfer_timing(+Level)
Level is an integer specifying the level of detail at which
timing messages about transfer should be printed. Default is 0
(no messages), 1 is a sensible alternative.

transfer(+In,-Out)
Calls timed_transfer(In,Out,fs_file,fs_file)

transfer_files(+From,+To,+InStem,+OutStem,+InMode,+OutMode,+Options)
transfer_files(+From,+To,+InStem,+OutStem,+InMode,+OutMode)

Applies timed_transfer to a succession of numbered files
   InStem<From>.pl to InStem<To>.pl
writing the results to
   OutStem<From>.pl to OutStem<To>.pl.
If files are missing from the numbered sequence, will
print a message saying the file is missing and continue.

transfer_files(+InFiles,+OutStem,+InMode,+OutMode,+Options)
transfer_files(+InFiles,+OutStem,+InMode,+OutMode)

 Applies timed_transfer to each file in InFiles. Writes result to file named
by (a) stripping any directory off InFile to get the base file name, and then
(b) prefixing the base file name with OutStem. If a file in the list does not
exist, will print a message saying the file is missing and continue.
 
transfer_input(+In,-Xfr,+InMode,+Options)
transfer_input(+In,-Xfr,+InMode)

Convert In to prolog transfer structure Xfr (with internal xle choice space).
InMode and Options are as for transfer/5

transfer_output(+Xfr,-Out,+OutMode,+Options)
transfer_output(+Xfr,-Out,+OutMode)

Convert prolog transfer structure Xfr to Out.
OutMode and Options are as for transfer/5

transfer_facts(+XfrIn,-XfrOut)
Run transfer on input prolog transfer structure to produce
 output prolog transfer structure.

tdbg
Turn on basic tracing of transfer rules. Equivalent to
monitoring([rule],[r(0,0)])

tdbg(+Monitors)
Turn on debugging for specified transfer Monitors. Monitors can
either be a single monitor name or a list of monitor names. Available
monitors can be found using monitors/0. Equivalent to
monitoring(Monitors,[r(0,0)])

monitors
List all documented transfer monitors. These are documented via
the (multifile) predicate user:monitor_doc(MonitorName,DocString).

monitoring
List all the monitors for which debugging is currently turned on,
showing their range sets

monitoring(+Monitors)
List range sets for specified monitors

monitoring(+Monitors, +RangeSpec)
Change RangeSpec for specified monitors

monitor(+MonitorCondition)
monitor(+MonitorCondition, +Then)
monitor(+MonitorCondition, +Then, +Else)

Perform Then action if MonitorCondition holds, otherwise Else
Action. Then and Else are arbitrary prolog goals (typically print
statements), MonitorCondition is an Expr of the following form
Expr ::= (Expr, Expr) Conjunction
Expr ::= (Expr; Expr) Disjunction
Expr ::= MonitorName Named range set
Expr ::= r(L, H) Explicit range
Expr ::= Integer =r(Integer, Integer)
Expr ::= Prolog Arbitrary goal

full_tdbg
vfull_tdbg

Turn on debugging for [rule,match_rule,detail] (full) or
[rule,rule_input,rule_output,match_rule,detail,garbage] (vfull)


xfr
This starts up an xle listener loop (see listen/2,
communication_pipes/3), which is useful for debugging transfer
rules. This allows separate XLE and transfer
processes (running on the same machine) to communicate via files.
The tcl command install-xfr-listener issued to the XLE process
(either at the command line, or via an xlerc file) will set up
additional buttons on the fstructure and fs-chart Tk windows.
These are
fstructure window:
Transfer, Load Transfer Rules, Debug, Debug off, Break
fs-chart windwo
Transfer
Clicking on these buttons will cause two files to be written,
~/.transpipe0 and ~/.transpipe1 (where ~ is the user's home
directory). (You can change the name of these communication files
in translate.tcl). The prolog listener loop polls for the
existence of these two files. When they are written, the listener
reads them, deletes them, acts on their contents, and then returns
to polling.

~/.transpipe0 normally contains the f-structure or fs-chart from
the window in which the Transfer button was pressed, or is empty.
~/.transpipe1 is the command file, which contains a prolog term
specifying the action to be performed. Possible contents of
transpipe1 are:
break. Leave the listener loop
ready. Transfer contents of transpipe0 and display results
debug. Run tdbg/0.
no_debug. Run no_tdbg/0.
reload. Run run_transfer_reload/0.

You can also break out of the listener loop (in a prolog
development system) by hitting ^C. However, in a prolog runtime,
as brought up by running "transfer interactive", ^C aborts the
whole process

run_transfer_reload
This prompts the user to enter the name of a rule file to be
loaded. If the user just hits return, the previously loaded file
is reloaded.

show_xfr_input
dont_show_xfr_input

These commands alter the behaviour of how the xfr loop displays
transfer input

xfr_help
Prints a help message

main
This is used to read in arguments from the command line when
transfer.sav is called on as part of the transfer shell
command. You probably don't ever want to use this...




Conversion between f- and transfer structure


write_xfr(+Stream,+Xfr)
write_xfr(+Xfr)
write_xfr_no_trace(+Stream,+Xfr)
write_xfr_no_trace(+Xfr)

Write a prolog transfer structure Xfr to a Stream or to
user_output, printing choices in external prolog format. Xfr can
have either an internal or external choice space. The
no_trace versions suppress the printing of any rule trace
information. Note that no terminating period is printed

write_fs(+Stream,+FS)
write_fs(+FS)

Write a prolog f-structure FS to a Stream or to user output, printing
choices in external prolog format. FS can have either an internal
or external choice space. Note that no terminating period is printed

xfr2fs(+Xfr,-FS)
Converts a prolog transfer structure (internal or external
choice space) to a prolog f-structure (internal choice space).

fs2xfr(+FS,-Xfr)
Converts a prolog f-structure (internal or external
choice space) to a prolog transfer structure (internal choice space).

fsfacts2preds(+FS, +CS, -Preds, -Eqs, +ChoicePtr)
Given contexted f-structure and c-structure facts FS and CS, in
internal context notation under choice(ChoicePtr), will convert
them to a list of transfer facts, Preds, and extract out any
un-normalized equalities. (CS is currently ignored).


XLE-Transfer Interfaces:


There are a number of different ways of getting the transfer component to interface with the xle:
a) Embed transfer directly within the XLE
b) Have XLE start up a separate transfer server and communicate with it via sockets (the original mode of interaction)
c) Have transfer start up a separate XLE server, and communicate with it via pipes
d) Have separate transfer and XLE process communicate via files though a listener loop, like xfr
The socket- and pipe-based interfaces are still a little fragile, and the direct embedding of transfer within XLE makes it harder to include new transfer functionality, and is of course next to impossible for prolog programmers to debug. The listener loop is handy for developing and debugging new functionality, but is probably not ideal for a final applications.


Embedding Transfer Directly within XLE

For the direct embedding, xle and libxle make the following Tcl and C commands available

int load_prolog_image()
This loads a prolog image into the xle. If the
extern char *prolog_image is non null, then the saved image located
at the full path name given by prolog_image will be loaded. If it
is null, then it will load the first it finds of
$XLEPATH/bin/transfer.sav, $XLEPROLOGPATH/bin/transfer.sav, and
$PWD/transfer.sav (note: XLEPROLOGPATH will probably not be
defined).

From Tcl, you can set the prolog_image variable by, e.g.

set prolog_image /tilde/crouch/xle/transfer/transfer.sav

int prolog(char *command)
This passes a command string to prolog, which prolog executes. The
command string should obey prolog syntax, except that there should
be no terminating period. Values assigned to prolog variables in
the command string will not be available --- i.e. only the side
effects of the prolog command are relevant.

The corresponding Tcl command is, e.g.

prolog "X is 2+2, write(X), nl, nl"

Note that it is OK to include a final period in the Tcl command
string, though this is not obligatory.

Both commands are implemented via means of the prolog predicate
call_string/1, which evaluates its string argument as a prolog goal,
and catches any exceptions.

int load_transfer_rules(char *file)
Call reload_rules to load the transfer rule file

From Tcl, you are probably best off doing
prolog "reload_rules('/tilde/crouch/xle/transfer/rules')"

Graph *fs_transfer(Chart *chart, Graph *fsIn);
Passes fsIn through transfer and returns the result.

From Tcl
create-parser /project/pargram/english/standard/english.lfg
create-generator /project/pargram/english/standard/english.lfg
translate_sentence "Ed slept." $defaultparser $defaultgenerator
will not only transfer, but also generate.



From within the xle you can also invoke the Tcl command

prolog "my_prolog_loop"

This will place you inside a rudimentary prolog read loop (the same one as is used in the interactive versions of the transfer and triples shell commands). From here you have direct access to the prolog runtime system distributed alongside the XLE. Note, however, that runtime systems do not provide a prolog debugger, and can only consult new prolog code, not compile it. This may be different if you have your own licensed development system, and replace $XLEPATH/bin/sp-{Version} with a soft link to $SP_PATH/lib (see sicstus release notes for distributing run time systems).

You can also call my_prolog_loop/2 with two prolog goals. The first is called on entering the loop, and the second on leaving it.




Transfer Server

For socket based transfer servers there is the following, if anyone is brave enough to try

start_server(+Host,+Port)
This will open a prolog socket on the specified Host and Port, end
enter a loop that reads and acts on commands written to the stream
associated with the socket.

From the XLE side, the file $XLEPATH/bin/translate.tcl contains a bunch of possibly unmaintained code that fires up a prolog process passing the Host and Port as command line arguments, and passing them through to start_server, e.g.

sicstus -l run_server.pl -a localhost 2453

where the contents of run_server.pl might be something like
   main :- prolog_flag(argc, [Host,Port]), start_server(Host,Port).
:- main, halt.



Pipe-Based XLE Servers

Setting up the XLE as a pipe based server to the transfer system has been rather better maintained. Much of the code for this has been adapted from code written by Martin Emele. This mode of operation has been used for the sentence condensation application.


check_xle_running
This will fire up an xle process if one is not already running, and open
a pipe to it, as follows

xle_err(_,Err),
exec('xle',[pipe(Input),pipe(Output),Err], PID),

The identity of the process Id (PID) and Input and Output pipes are
asserted as
xleinterface:xle_pid/1,
xleinterface:xle_input/1,
xleinterface:xle_output/1

What is done with the error stream can be controlled by
:- assert(xleinterface:xle_err(_,null)). % suppress errors (default)
%:- assert(xleinterface:xle_err(_,std)). % write errors to std

xle_exec(+CommandString)
xle_exec(+FormatString,+ArgList)
silent_xle_exec(+FormatString,+ArgList,+ReturnString)

The exec commands write their strings to the Input pipe, and read
what the XLE returns from the Output pipe. The FormatString
versions of the exec commands use prolog's format/3 to write to
Input. You should always include a new line character in the format
string (the CommandString version automatically inserts a new
line). Normally you would want to call check_xle_running before
executing a command, just to check it is still running.

silent_xle_exec reads the response XLE writes to Output into the
ResultString. The other two commands print Output directly to
user_output. The silent option is useful if you want to have the
XLE running completely hidden in the background, and have prolog
munge over the results.

To interact with the XLE via Tk windows etc, call

xle_exec("set no_Tk 0~n",[]).

This forces the XLE to open up its usual windows. To prevent XLE
from showing its windows, use xle_exec("set no_Tk 1~n",[])

Clicking on command buttons in the Tk windows will not have any
effect on the transfer process, which remains resolutely on control
of things. However, you can use xle_exec to issue Tcl commands that
get the XLE to write results to specified files, and then have
prolog read the contents of the files, e.g.

get_fs_from_xle(FS) :-
check_xle_running,
xle_exec("print-fs-as-prolog ~a~n",['.tmp_fs.pl']),
open('.tmp_fs.pl', read, Stream),
read(Stream,FS),
close(Stream).


xle_exit_all
This looks up all the process ids of active XLE processes, and
shuts the processes down. A common problem is that the XLE does
not like one of the commands it has been sent. It does not usually
complain straight away, but next time you issue a command you get
an exception complaining about a format error in trying to write -1
where a character was expected. Under these circumstances, your
best bet is just to shut the xle process down, and start up all
over again.



XLE-Transfer Listener

The XLE listener interface has already been mentioned in connection with xfr/0. You can define your own listener loops in the following way. First of all, you need to define your own listener predicate that takes two file names as arguments, and a third Break flag. As a schematic example

:- module(my_module, [go/0]).

my_listener(DataFile,CommandFile,Break) :-
open(CommandFile,read,In),
read(In,Command),
close(In),
(
Command = process ->
Break = fail,
my_process(DataFile)
;
Command = break ->
Break = true
).

go :- listen(my_module, my_listener).
:- communication_pipes('.mydatafile', '.mycommandfile', '.my_ping').

The user defined predicate my_listener reads the Data and Command Files. If the Command file contains what should be taken as an instruction to break out of the loop (e.g. the command "break."), then the Break flag is set to true. This acts as an instruction to listen/2 to break out of its loop.

To activate the loop, you need to call listen/2, providing the name of the module in which the listener predicate is defined and the name of the predicate. In the example, go/0 is defined so as to call this in the correct way. You also need to specify what files communication is to take place through. This is done by means of communication_pipes/3. This asserts which files (in the user's home directory) are going to be used to communicate Data, Commands and Pings through. The Ping file is used in an initial handshaking routine that confirms whether the prolog process and the XLE process are using the same files.

On the XLE side of things, you will want to define some Tk buttons that write things to the communication files. A typical example might be including the following in your xlerc file

#####################################################################
proc install-my-listener-menu {} {
global fsCommands fsChartCommands

add-item-to-xle-menu \
{command -label "Process" \
-command "data-to-listener $self" \
-doc "Runs process on fs."} \
fsCommands

add-item-to-xle-menu \
{command -label "Break " \
-command "command-to-listener break." \
-doc "Breaks out of listener."} \
fsCommands
}

# Specify communication pipes (must coincide with set_communication_pipes

set datapipe [glob ~/]/.mydatafile
set commandpipe [glob ~/]/.mycommandfile
set pingpipe [glob ~/]/.my_ping
set pinged 0

proc command-to-listener {command} {
global pinged
global datapipe
global commandpipe
global pingpipe

set cmd1 "echo '$command' > '$datapipe'"
set cmd2 "echo '$command' > '$commandpipe'"
exec sh -c $cmd1
exec sh -c $cmd2
if {$pinged == 0} {
check-ping-result $datapipe $pingpipe
}
}


proc data-to-listener {window} {
global pinged
global datapipe
global commandpipe
global pingpipe

print-fs-as-prolog $datapipe $window
set cmd "echo 'process.' > '$commandpipe'"
exec sh -c $cmd
if {$pinged == 0} {
check-ping-result $datapipe $pingpipe
}
}

# The function check-ping-result is defined in translate.tcl

#####################################################################

To initiate the listener start the prolog process, start the XLE process, call install-my-listener-menu in XLE to set up the right buttons, call go/0 from the prolog process, parse a sentence in the XLE process, click on one of the newly installed command buttons. The order in which processes are started up does not matter. However, clicking on listener buttons will have no effect if go/0 is not running.

Note that listen/2 takes care of passing the right file names to the my_listener predicate, and of deleting the files once my_listener has read them.



XLE library calls:

Just as prolog functionality is directly available to the XLE, XLE functionality is directly available to prolog. The following predicates are defined through a foreign language interface that loads in $XLEPATH/bin/foreign_language_interface/xleprologlib.so and  libxlecore.so. The location of these two libraries is important.  When creating a saved image sicstus unloads all foreign resources.  If the saved image is then moved to a new locations (as it most certainly is when you download it), these foreign resources can only be relocated if they sit in the same relative location to the saved image as they were in when the image was created. Thus the saved images will only work if they are sitting in a directory that has a foreign_language_interface subdirectory containing xleprologlib.so and libxlecore.so.

Most of the library procedures are fairly direct calls to C-functions defined in the various header files in the include directory. In many cases, the arguments to these functions are pointers to XLE data structures. Prolog represents these pointers as (large) integers, which can be passed around as ordinary arguments in prolog predicates. However, you need to exercise some caution if you want to assert pointers into the prolog database, and have procedures pick them up from there rather than passing them as explicit arguments. This has a tendency to produce segmentation errors, possibly because the asserted pointers in prolog fall out of sync with the XLE pointers. However, it has (so far) been safe to assert pointers to charts (as set up by create_parser).

The following higher level convenience predicates have been defined in terms of calls to lower level calls to XLE functions

create_fs_choice_space(+PrologChoices, +PrologEquivs, -XLEChoices)
ext2int_contexts(+PrologChoices, +PrologEquivs, -XLEChoices)
PrologChoices and PrologEquivs are the choice space and set of
equivalences such as might be taken from a prolog f-structure (in
external, prolog variable context form). XLEChoices is a pointer
to the XLE internal choice space (graph).

This procedure will create a new chart if one is not already in
existence using create_parser('',Chart). Otherwise it picks up the
existing chart and resets its storage. It then goes through the
prolog equivalences instantiating any context selections made.
Finally it constructs a choice space within the chart, essentially
by calling create_disjunction/4 whenever a new choices is
encountered, and instantiating prolog variables representing
contexts to the corresponding pointers to XLE contexts. As a side
effect of calling this procedure, prolog variables in contexted
facts will also be instantiated to their corresponding context
pointers

xle_safe_context(+Boolean, +XLEChoices, -XLEContext)
xle_context(+Boolean, +XLEChoices, -XLEContext)

Given a boolean context expression (with context variables
instantiated to XLEContext pointers), and the XLE choice space,
this converts the boolean expression into a pointer to a
context. This is the simplest procedure to use for creating new
boolean combinations of context. The boolean connectives permitted
are and(C1,C2), or(C1,C2), not(C1). However, negation tends to
be an expensive operation --- you would be well advised to use the
lower level subtract_clause/4 if possible.

The safe procedure (a) checks that all context variables are indeed
instantiated, and (b) binarizes all n-ary boolean expresssions so
that and(C1,C2,C3,...Cn) becomes
and(C1,and(C2,and(C3,...and(Cn-1,Cn)..))). The non-safe procedure
performs none of these checks. It thus provides a more efficient
way of evaluating boolean expressions against the choice space, but
will cause unpredictable results if the boolean expression is
either not ground or not binarized.

xle_true_context(-TrueContext)
xle_false_context(-FalseContext)
Returns the true or false XLE contexts. These are in fact the
integers 1 and 0 respectively.

use_primary_choice_space
use_alternate_choice_space(+Id)

Sometimes it is necessary to keep more than one choice space around
at a single time. By calling use_alternate_choice_space(Id) before
creating a choice space (e.g. with create_fs_choice_space), a
choice space will be set up under an alternative chart, identified
by Id. By calling use_primary_choice space, you will revert to using the
principal chart (Id = 0). By making sure you pass the pointer to
the correct XLEChoice to functions such as xle_context, you
manipulate several choice spaces at the same time. For example

create_fs_choice_space(PlgChoices0, PlgEquivs0, XleCS0),
use_alternate_choice_space(1),
create_fs_choice_space(PlgChoices1, PlgEquivs1, XleCS1),
use_primary_choice_space,
....
xle_context(and(C0_1, C0_2), XleCS0, C0_12),
xle_context(or(and(C1_1,C1_2), C1_3), XleCS1, C1_123),
...
% reset and overwrite XleCS1:
use_alternate_choice_space(1),
create_fs_choice_space(PlgChoices2, PlgEquivs2, XleCS2),


reset_choice_space
reset_choice_space(Id)

Resets the storage / choice space either for the chart identified
by Id, or for whichever chart chart is current as identified by
either use_primary_choice_space or use_alternate_choice_space.



unpack_choice_space(+XLEChoices, -Solution)
Successive backtracking through this will unpack a sequence of
choice space solutions. When there are no more solutions left,
will set Solutions = 0.

Typical calling sequence:

repeat,
unpack_choice_space(ChoiceSpace, Solution),
do_something(Solution, ChoiceSpace, ContextedFacts),
solution == 0,
!


xle_unpack_fstr(+PackedFStr, +XLEChoices, -UnPackedFStr)
This uses unpack_choice_space to backtrack through unapckings of an
fschart (in internal, XLE choice form). It is defined as follows:

xle_unpack_fstr(PackedFstr, ChoiceSpace, Fstr) :-
PackedFstr = fstructure(Sent,Props,_Choice,_Eqv,PFS,PCS),
Fstr = fstructure(Sent,Props,[],[],FS,CS),
unpack_choice_space(ChoiceSpace,Solution),
\+ Solution = 0,
collect_true_facts(PFS, ChoiceSpace, FS),
collect_true_facts(PCS, ChoiceSpace, CS).


collect_true_facts(+PackedFacts, +XLEChoices, -TrueFacts)
Collects whichever cf(Ctx, Pred) facts are in the true context
(i.e. Ctx evaluates to 1). Typically this is called after a
particular solution has been imposed on the choice space.

collect_true_facts([cf(C,Fact)|Facts],ChoiceSpace,TrueFacts) :-
xle_context(C,ChoiceSpace,Context),
evaluate_clause(Context,1,Value),
(
Value == 1 ->
TrueFacts = [cf(1,Fact)|TrueFacts1]
;
otherwise ->
TrueFacts = TrueFacts1
),
collect_true_facts(Facts,ChoiceSpace,TrueFacts1).



count_solutions(+XLEChoices, -Num)
Num is the number of solutions encoded in the XLEChoices.



The following procedures map between XLE-internal, Prolog-external and Named-external context notations. Note that in converting collections of facts, compound terms are inspected to find occurrences of cf(Context, Pred) expressions at any level. However, no descent is made inside such expressions. This allows you to convert two discrete collections of facts at the same time, e.g. [FS_Facts, CS_Facts]


ext2int_contexts(+PrologChoices, +PrologEquivs, +PrologFacts,
-XLEChoices, -IntFacts)

This is like ext2int_contexts/3, except that in addition
xle_safe_context is applied to all the contexted Facts in
PrologFacts. This produces contexted facts in internal context
format, where each context is a pointer rather than a boolean
expression.

int2ext_contexts(+IntChoices,+IntFacts,-ExtChoices,-ExtEquivs,-ExtFacts)
The inverse of ext2int_contexts. Given a set of contexted facts,
IntFacts in internal context form, returns an external prolog
choice space, list of equivalences (usually empty) and contexted
facts in external prolog form, with boolean combinations of prolog
variables for contexts.

int2named_contexts(+IntChoices,+IntFacts,-NChoices,-NEquivs,-NFacts)
Like int2ext_contexts, except that instead of prolog variables for
context choices, they are replaced by mnemonic names, e.g.
[choice([cv('A',1), cv('A',2)], 1),
choice([cv('B',1), cv('B',2)], cv('A',1)],

[cf(cv('A',1), SUBJ(var(0), var(1))]

This is used, e.g. when writing contexted structures to a prolog
file, so that context variables get their familiar names like A1,
A2, instead of the arbitrary prolog variables of external format like
_12983, _12985.

Inverse mapping is
named2int_contexts(+NChoices,+NEquivs,+NFacts,-IntChoices,-IntFacts)


ext2named_contexts(+ExtChoices, +ExtEquivs, +ExtFacts,
-NChoices, -NEquivs, -NFacts)

Replaces prolog context variables by their mnemonic names in
copies of the external structures.
Inverse mapping is
named2ext_contexts(+NChoices,+NEquivs,+NFacts,
-ExtChoices,-ExtEquivs,-ExtFacts)

name_internal_choices(+IntChoices, -NChoices)
Given an internal XLEChoice e.g. 109367, returns a named choice
space, e.g.
[choice([cv('A',1), cv('A',2)], 1),
choice([cv('B',1), cv('B',2)], cv('A',1)],

name_internal_equivs(+IntChoices, -NChoices)
Given an internal XLEChoice e.g. 109367, returns a named list of
equivalences, usually = []

name_internal_context(+XLEContext, -NamedContext)
Given a pointer to an XLE context, return a named boolean
exression, e.g. and(cv('AQ', 1), cv('B',5))

write_named_context(+Stream, +NamedContext)
write_named_context(+NamedContext)

Writes the named context either to Stream or user_output so that
it looks as though it contains prolog variables, e.g.
and(cv('AQ', 1), cv('B',5))
==> and(AQ1, B5)

write_named_context_no_commas(+Stream, +NamedContext)
write_named_context_no_commas(+NamedContext)

Writes the named context either to Stream or user_output so that
it looks as though it contains prolog variables, but without
commas (used for triples notation)
and(cv('AQ', 1), cv('B',5))
==> and(AQ1 B5)

write_cf_list(+Stream,+NamedList)
write_cf_list(+NamedList)

Given a list (i.e. not a compound expression) of named contexted
facts, named choice definitions, named selections, and/or named
definitions, will print out a list of the same, but using
write_named_context throughout.

fs2graph(+PrologFStructure, -XleFSGraph)
graph2fs(+XleFSGraph, -PrologFStructure)

Converts between prolog fstructures and pointers to XLE internal
representations of f-structures. This is not stable at the
moment. It is implemented at present by writing structures to
files and reading them back in again using
print_prolog_graph_file(File,XleFSGraph) and
read_prolog_graph_file(File,Chart,Graph)
The trouble with this is that reading in a prolog file to create
an FS Graph is dependent on having the Chart set up for the
correct parser/grammar. However, prolog's Charts are usually
created with a null grammar, so that the file gets read in
incorrectly.



The following functions are more or less directly derived from the corresponding C-functions, described in the relevant .h files. Bear in mind that when a C-function returns a result, this is reflected by an additional final arguments in the corresponding prolog predicate. A few functions place additional C wrappers around the C library functions to make them easier to access from prolog. These are listed first

read_prolog_graph_file(+File,+Chart,-Graph)
print_prolog_graph_file(+File,+XleFSGraph)

These call the C functions read_prolog_graph and print_prolog_graph
(which require stream arguments) after having first opened the File
to create the stream.

generate_from_graph(+GeneratorPointer,+FStrPointer,
+NormalizeInt,+UTF8Int,-WordNetPointer)

Calls C generate_from_graph, but returns a pointer to the WordNet
rather than the WordNet itself

print_net_as_regexp(+WordNetPtr,+File,+NormalizeInt,+UTF8Int)
Calls C print_net_as_regexp, but inputs a pointer to the WordNet
rather than the WordNet itself

assert_nogood(+XLEChoices, +XLEContext)
Calls C assert_nogood, but with NULL pointers to all the items
documenting the nogood. (Note: asserting nogoods can be expensive)


The following are listed without description, see the relevant C header files:
init_xle/2, create_parser/2, parse_sentence/4,
next_graph_solution/3, free_graph_solution/1,
reset_storage/1, create_graph/2, create_generator/2,
make_new_choice_disjunction/3, create_disjunction/4,
get_choice/4, conjoin_clauses/5, disjoin_clauses/4,
subtract_clause/4, negate_clause/4, not_clause/3,
evaluate_clause/3, evaluate_choices/2, covers_clause/3,
get_edge_solutions/2, first_dnf_solution/3, next_dnf_solution/3,
set_solution_choice_values/2, select_choice/1,set_choice_values/2,


Miscellaneous utilities:


strict_member(+Item,+List)
strict_memberchk(+Item,+List)

Like member and memberchk, except that Item is strictly identical
(==) to some element in list

contains_somewhere(Item,Expr)
Item is strictly identical to some subexpression of Expr


vartail(+VTList, -VariableTail)

Returns the variable tail of a variable tail list, e.g.
vartail([a,b,c|X],X)

vt_append(+VTList1,+VTList2)
Appends two variable lists by setting the tail of List1 = List2

vt_member(Item, +VTList)
Gets members of variable tail lists

list_to_vtlist(List,VTList)
Converts an ordinary list to a variable tail list

null_vtlist(VTList)
VTList is an empty variable tail list

unkey(+KeyedList,-UnkeyedList)
Removes the keys from a key-sorted list

concat_list(+ListOfAtoms, -Atom)
Concatenates all the atoms (and/or integers) in ListofAtoms
together to create a new atom

generated_symbol(+Prefix,-GenSym)
Generates a unique atom with the specified prefix

time_call(Goal,Repetitions)
time_call(Goal)

Time the Goal, repeated either 1 or Repetitions times

format_if(Level,String,Args)
format_if(Level,Stream,String,Args)

Conditional format --- only print if current format level is less
than or equal to Level

format_level(Integer)
Set current format levels

time_msg(String)
time_msg_if(Level,String)

Print String followed by CPU time (in secs) since either last call
to time_msg, or last call to reset_time_msgs. Conditional version
depends on current format_level

reset_time_msg
Call statistics(runtime,_) to reset timings

setp(Parameter)
Sets the value of a parameter. Retracts all previous parameter
settings and then asserts user:Parameter

setsys(Parameter, Value)
getsys(Parameter, Value)

An alternative to setp (and not integrated with it). Used only to
control pretty printer (below).

pp(Expr)
pp_underscore(Expr)

Pretty print the expression (can be a bit ropey on printing
anonymous variables). You can control the line length with
setsys(pagewidth, N)
and the print depth with
setsys(ppdepth, M)
where N and M are integers

pp_debug
nopp_debug

Set the prolog debugger to use pp rather than write, and unset it.

get_option(Option, Value, ArgList)
get_optional(Option, Value, ArgList)

Value is the item following Option in the ArgList. get_option
prints an error message if Option is not present in the list,
whereas get_optional fails silently


get_option_list(Option, Values, ArgList, Terminators)
Values is the sublist of ArgList lying between Option and the
first member of Terminators. Prints an error message if Option is
missing.

file_suffix(+File,-Suffix)
Returns the suffix following the final . in File

dir_and_file(+DirFile, -Dir, -File)
Splits DirFile into its slash terminated directory and the file
name

file_name_concat(+Dir,+File,-DirFile)
Concatenates directory to file (Dir must be slash terminated)

add_dir_slash(+Dir,-DirSlash)
Adds a trailing slash to Dir, if missing.

strip_file_suffix(+FileSfx,+Sfx,-File)
Removes the specified suffix from the filename

atom_to_num(Atom,Num)
Converts e.g. '3' to 3. Useful for manipulating command line
arguments, where numerical items get read as atoms not integers.