XLE consists of cutting-edge algorithms for parsing and generating Lexical Functional Grammars (LFGs) along with a rich graphical user interface for writing and debugging such grammars. It is the basis for the Parallel Grammar Consortium, which is interested developing industrial-strength grammars. Past and present efforts have included grammars for English, French, German, Hungarian, Indonesian, Japanese, Murrihn-Patha, Norwegian, Polish, Tigrinya, Turkish, Urdu, Welsh and Wolof. XLE is written in C and uses Tcl/Tk for the user interface. It currently runs on Solaris Unix, Linux, and Mac OS X. Support for Windows is patchy.
One of the main goals of the XLE is to parse and generate with LFGs efficiently. This is difficult because the LFG formalism, like most unification-based grammar formalisms, is NP complete. This means that in the worst case the time that it takes to parse or generate with an LFG can be exponential in the length of the input. However, natural languages are mostly context-free equivalent, and one should be able to parse them in mostly cubic time. XLE is designed to automatically take advantage of context-freeness in the grammar of a natural language so that it typically parses in cubic time and generates in linear time. This lets grammar writers write grammars in an expressive formalism without necessarily sacrificing performance.
There are three key ideas that XLE uses to make its parser efficient. The first idea is to pay careful attention to the interface between the phrasal and functional constraints. In particular, XLE processes all of the phrasal constraints first using a chart, and then uses the results to decide which functional constraints to process. This is more efficient than interleaving phrasal and functional constraints since the phrasal constraints can be processed in cubic time, whereas the functional constraints may take exponential time. Thus, the phrasal constraints make a good polynomial filter for the functional constraints.
The second key idea is to use contexted unification to merge multiple feature structures together into a single, packed feature structure. For instance, here is a contexted feature structure for "They saw the girl with the telescope":
In this sentence, "saw" is ambiguous between the past tense of "saw" (labeled a:1) and the present tense of "see" (a:2). Also, "with the telescope" is either an adjunct of "saw" (b:1) or "the girl" (b:2). The contexts a:1 and a:2 are mutually exclusive. Similarly for b:1 and b:2. Furthermore, a:1 and a:2 are independent from b:1 and b:2. So this feature structure represents four different solutions. XLE uses a contexted feature structure like this for each edge in a chart to represent all of the possible feature structures that the edge can have.
The third key idea is to use lazy contexted copying during unification. Lazy contexted unification only copies up as much of the two daughter feature structures of a subtree as is needed to determine whether the feature structures are unifiable. If the interactions between the daughter feature structures are bounded, then XLE will do a bounded amount of work per subtree. Since there are at most a cubic number of subtrees in the chart, then XLE can parse sentences in cubic time when the interactions between daughter feature structures are bounded.
Here is a scatterplot of XLE applied to a corpus of sentences from the documentation of the HomeCentre, a multi-function copier/printer device marketed by Xerox. Each sentence is plotted according to the number of local subtrees that it has (on the x-axis) and the time that it takes to parse the sentence (on the y-axis). If the time grows linearly in the number of subtrees, then this means that a bounded amount of work is being done per subtree and so XLE is parsing in cubic time. By doing a linear regression we find that 79 percent of the time can be explained by the number of subtrees in this corpus. The data points that are higher than expected correspond to sentences that have a lot of coordination and/or long-distance dependencies. These constructions require a lot of the feature structure to be copied up, and so they increase the effective grammar constant.
XLE produces a packed representation of all possible solutions as its output. This has a number of advantages over trying to pick the "right" solution:
Documentation for the current version of XLE is available here.
XLE and its sources are available for non-commercial purposes by filling out this license. Please sign and date the license on the last page and mail it to maxwell "at" parc.com. Then contact miriam.butt "at" uni-konstanz.de to obtain XLE. Here is the list of current licensees. If this license is too restrictive for you, you can get just the XLE binaries using this license.
The XLE project began in October of 1993 and was originally a joint project between the NLTT group at PARC and the MLTT group in Grenoble. It evolved from the Grammar Writer's Workbench, which was implemented by Ron Kaplan and John Maxwell. John Maxwell was the technical lead for XLE and also designed the algorithms and implemented the user interface. Hadar Shemtov did the initial implementation of the parser and generator, and Max Copperman wrote the LFG formalism parser and the bracketing tool. Paula Newman wrote the morphology and lexicon components and also worked on the generator. Andreas Eisele implemented complex categories and parameterized rules. Martin Kay wrote a prototype system for doing transfer in Prolog. Dick Crouch rewrote the transfer system and is also responsible for the rest of the Prolog subsystem. Alex Vasserman wrote the code for stochastic disambiguation under the supervision of Stefan Riezler. Ron Kaplan and Annie Zaenen provided managerial support. XLE makes heavy use of the c-fsm package for processing the morphology, which was written in C by Lauri Karttunen and others in Grenoble based on work done by Ron Kaplan in Interlisp.
For publications on ParGram and the ParGram grammars see the ParGram site and the individual project sites.