ResourcesThis page lists some of the software and resources that I have developed or helped to develop during my studies. All of this is available for free download. You can acknowledge that you made use of the software and/or resources on this page in your work, by citing the relevant paper (given with each resource).
Perl Scripts for MWE Candidate Extraction
A Reference Dependency Bank for Complex Predicates
DescriptionThis paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (Lexical-Functional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena.
The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. We thus present a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks, that represents deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS information.
DownloadParGramBank can be accessed and downloaded for free via the INESS treebanking environment.
PaperSebastian Sulger, Miriam Butt, Tracy Holloway King, Paul Meurer, Tibor Laczkó, György Rákosi, Cheikh Bamba Dione, Helge Dyvik, Victoria Rosén, Koenraad De Smedt, Agnieszka Patejuk, Özlem Çetinoğlu, I Wayan Arka and Meladel Mistica: ParGramBank: The ParGram Parallel Treebank. To appear in Proceedings of ACL 2013 (Long Papers).
paper - PDF; poster - PDF; BibTeX
DescriptionThese Perl scripts were written by Annette Hautli and myself in 2010. They basically extract bigrams and trigrams from raw text corpora, sorting them by frequency, and structuring the output in a way that it can be read by the collocation analysis tool UCS, developed by Stefan Evert (at least for bigrams).
You may want to try the scripts if you would like to extract multiword expression (MWE) candidates from a given corpus in some language using the UCS toolset, but find that UCS needs its input files in a special format, which is produced by these scripts.
UCS handles bigram lists efficiently, but is not able to deal with trigrams. Therefore, you need some other tool to work on the trigram lists produced by our perl script. After download, change the input and output file settings directly in the script, as these currently expect Urdu input and output.
DownloadDownload bigram perl script
Download trigram perl script
Download UCS tool by Stefan Evert
DescriptionWhen dealing with languages of South Asia from an NLP perspective, a problem that repeatedly crops up is the treatment of complex predicates. In Ahmed et al. (2012), we present a first approach to the analysis of complex predicates (CPs) in the context of dependency bank development. The efforts originate in theoretical work on CPs done within Lexical Functional Grammar (LFG), but are intended to provide a guideline for analyzing different types of CPs in an independent framework. The design of the dependencies is kept parallel to the triples in PARC700 (King et al. 2003) and general enough to account for CP constructions across languages.
DownloadThe dependency bank is available from the 'Resources' page on our project website.
PaperTafseer Ahmed, Miriam Butt, Annette Hautli and Sebastian Sulger: A Reference Dependency Bank for Analyzing Complex Predicates. Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC 2012), Istanbul, May 2012. European Language Resources Association (ELRA).
paper - PDF; slides - PDF; BibTeX
DescriptionThe Eclipse plugin 'eXLEpse' provides functionality for editing computational LFG grammars and an interface to the XLE grammar development platform. The plugin was implemented by two computer science students, Roman Rädle and Michael Zöllner, who took the course 'Grammar Development' in the summer of 2010.
The primary goal of eXLEpse was to develop an easy-to-use editor for computational grammars and interface to XLE. The editor basically replaces emacs as an editor and provides an alternative to the shell-based interaction with the XLE platform.
For novices in XLE grammar development, it can be quite hard to get used to the emacs and the XLE command prompt. Also, the grammar syntax required by XLE can sometimes be confusing. eXLEpse addresses these problems, providing a native MacOS X GUI via the Eclipse platform, various error support functions and advanced syntax highlighting, enabling the novice user to concentrate solely on the grammar development process without painfully learning the details of the emacs and its concepts.
Additionally, Eclipse offers support for the version management software Subversion via the 'Subclipse' plugin. Developers can make use of all the Subversion features through this plugin, without having to leave Eclipse. This eases version control in bigger grammar projects.
The eXLEpse plugin, just like the Eclipse platform it goes with, is free software under the terms of the Eclipse Public License (EPL). The plugin is distributed together with Eclipse and the Subclipse plugin in a single package, for different operating systems. For further information, please refer to the documentation, which is available in German and English. If you have trouble getting started, take a look at the getting started documentation.
DocumentationGetting started with eXLEpse (English)
DownloadTo download eXLEpse, visit the eXLEpse website.
PaperRoman Rädle, Michael Zöllner and Sebastian Sulger: eXLEpse: An Eclipse-based, Easy-to-Use Editor for Computational LFG Grammars. Online Proceedings of the LFG11 Conference, Hong Kong University, Hong Kong. CSLI Publications.
paper - PDF; poster - PDF; BibTeX