Tibor Kiss, Ruhr-Universität Bochum

15.09.2009, G 309, 6.15 - 7:45pm

Building a Treebank for Annotation Mining: Mining for licensing conditions of Preposition-Noun-Combinations


We present a treebank for preposition-noun-combinations (PNCs), as e.g. nach Stilllegung einer Verbrennungsanlage (after closedown of an incineration plant), and corresponding PPs, comprising lexical, syntactic, relational, and global information about PNCs and PPs represented in an extendible stand-off annotation format. For some time, PNCs have been treated as exceptions, but recent research has shown that they are indeed productive, and no more idiomatic than other phrasal combinations. However, they violate a constraint on the realization of countable singular nouns, viz. that such nouns have to appear with a determiner. It has not yet been able to determine the exact licensing conditions for PNCs, as not every P can be combined with any old noun in a PNC. As we are particularly interested in the interpretation of prepositions in PNCs and PPs, we had to derive an annotation scheme for the interpretation of German prepositions, which would allow feasible annotations. The treebank serves as an input for annotation mining with the ultimate goal to identify the pertinent licensing conditions for PNCs.