Statistical Grammar Development and Corpuslinguistic Information Extraction

Heike Zinsmeister and Sabine Schulte im Walde

The course presents an introduction to corpus-linguistic information extraction, based on statistical grammar models with lexical extension.

First, we will introduce the linguistic and mathematical properties of statistical grammars and statistical parsing. This will enable the participants to understand the coding of linguistic information in a statistical grammar model.

The participants will then learn to write and to train a statistical grammar for extracting reliable subcategorisation frames of verbs and linguistically relevant properties of the subcategorised verbal arguments. As basis for the exercise we will use the Huge German Corpus. Finally, the participants will be confronted with a large-scale statistical grammar model. Possibilities for further linguistic information extraction --such as collocations-- and for applying the linguistic information --such as lexicon acquisition-- will be discussed.


*) Papers can be downloaded from the "IMS Bibliography":