Annette Hautli, Sebastian Sulger and Miriam Butt. 2012. Adding an Annotation Layer to the Hindi/Urdu Treebank. Linguistic Issues in Language Technology, Volume 7, Issue 3, pages 1-18. Stanford: CSLI Publications. [ bib ]
This paper proposes an additional layer of annotation for the recently established Hindi Treebank. Despite the fact that the treebank already features a number of annotation layers such as phrase structure, dependency relations and predicate-argument structure, we see potential for the inclusion of a dependency layer generated from Lexical Functional Grammar (LFG) f-structures with relations that we believe are crucial for a deep analysis of Urdu/Hindi. The suggestions are based on theoretical and computational investigations into Hindi/Urdu in the context of the Urdu ParGram grammar, through which we can automatically create the additional annotation layer.
Dirk Saleschus, Annette Hautli. 2008. On dissolving long distance dependencies in Russian verbs. Linguistic Issues in Language Technology, Volume 1, Issue 3. Stanford: CSLI Publications. [ bib ]
This paper proposes a linguistic analysis of two long distance dependencies in the morphology of Russian verbs, namely secondary imperfectivization and deverbal nominalization.We show how these processes can be reanalysed as local dependencies. Although finite-state frameworks are not bound by linguistic considerations, the implementation of our analysis in a finite-state framework does not complicate the grammar or enlarge the network unproportionally.
Annette Hautli-Janisz. 2014. Urdu/Hindi Motion Verbs and Their Implementation in a Lexical Resource. University of Konstanz.
In this thesis, I investigate the ways that the spatial notions of figure, ground, path and manner of motion are realized in Urdu/Hindi and I implement these insights in a computationally-usable lexical resource, namely Urdu/Hindi VerbNet. I show that in particular the encoding of complex predicates can serve as a guiding principle for the encoding of similar constructions in other VerbNets. (For a more detailed abstract, use the link to the publication page.)
Conference proceedings (all peer-reviewed)
Aikaterini-Lida Kalouli, Katharina Kaiser, Annette Hautli-Janisz, Georg A. Kaiser, Miriam Butt. accepted. A Multilingual Approach to Question Classification. In Proceedings of LREC 2018.
Mennatallah El-Assady, Annette Hautli-Janisz, Valentin Gold, Miriam Butt, Katharina Holzinger and Daniel A. Keim. 2017. Interactive Visual Analysis of Transcribed Multi-Party Discourse. In Proceedings of ACL 2017, System Demonstrations, pages 49-54, Vancouver, Canada. [ bib ]
We present the first web-based Visual Analytics framework for the analysis of multi-party discourse data using verbatim text transcripts. Our framework supports a broad range of server-based processing steps, ranging from data mining and statistical analysis to deep linguistic parsing of English and German. On the client-side, browser-based Visual Analytics components enable multiple perspectives on the analyzed data. These interactive visualizations allow exploratory content analysis, argumentation pattern review and speaker interaction modeling.
Annette Hautli-Janisz and Miriam Butt. 2016. On the role of discourse particles for mining arguments in German dialogs. In Proceedings of the COMMA 2016 workshop 'Foundations of the Language of Argumentation', pp. 10-17. [ bib ]
Argument mining in dialogs or multilogs neccessarily must take into account the pragmatic relations that hold between dialog participants, their arguments and the ongoing discourse. This paper analyzes the role of German discourse particles and the illocutionary force contributed by the particles. We investigate a set of highly frequent discourse particles in German and propose a categorization that complements those levels of analysis that are pursued in opinion mining and dialog act annotation. Incorporating the subtle pragmatic information encoded by the discourse particles into Argument Mining offers a new way of pragmatically underpinning the propositional content of arguments in German dialog data.
Annette Hautli-Janisz, Tracy Holloway King and Gillian Ramchand. 2015. Encoding event structure in Urdu/Hindi VerbNet. In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation (NAACL 2015), pp. 25-33.
We propose a new kind of event structure representation for computational linguistics, based on the theoretical framework of First- Phase Syntax (Ramchand, 2008). We show that the approach not only gives a theoretically well-motivated set of subevents and related semantic roles, it also posits the levels of representation needed for analyzing a linguistic phenomenon that has repeatedly caused problems in computational systems, namely the treatment of complex predication. In particular, we look at V+V complex predicates in Urdu/Hindi and show that Ramchand’s subevent decomposition implemented in a VerbNet-style resource allows for a consistent semantic analysis of these complex events. We also show how the proposed event representation can be added to existing resources in the language, in particular the Hindi-Urdu Treebank and Hindi PropBank.
Tina Bögel, Annette Hautli-Janisz, Sebastian Sulger and Miriam Butt. 2014. Automatic Detection of Causal Relations in German Multilogs. In Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL), pp. 20-27, Gothenburg, Sweden. [ bib ]
This paper introduces a linguistically motivated, rule-based annotation system for causal discourse relations in transcripts of spoken multilogs in German, with the aim of providing an automatic means of determining the degree of justification provided by a speaker in the delivery of an argument in a multiparty discussion. The system comprises of two parts: A disambiguation module which differentiates causal connectors from their other senses, and a discourse relation annotation system which marks the spans of text that constitute the reason and the result/conclusion expressed by the causal relation. The system is evaluated against a gold standard of German transcribed spoken dialogue. The results show that our system performs reliably well with respect to both tasks.
Annette Hautli-Janisz. 2013. Moving right along: Motion verb sequences in Urdu. In Proceedings of the LFG13 Conference, pages 295-315, Stanford: CSLI Publications. [ bib ] [ handout ]
In this paper I survey the phenomenon of motion verb sequences (MVSs) in Urdu/Hindi, a combination of two motion verbs denoting a complex motion event. First noted by Hook (1973), the construction exhibits interesting syntactic and semantic properties and behaves unlike other complex verbal expressions found in the language. The paper shows that MVSs should be treated as complex predicates of motion, complementing the various types of complex predicates already established in Urdu/Hindi (e.g., Mohanan (1994), Butt (1995)). This paper provides a first formal analysis of the construction and accounts for the types of combinations, word orders and argument structures that are possible in the language.
Andreas Lamprecht, Annette Hautli, Christian Rohrdantz and Tina Bögel. 2013. A Visual Analytics System for Cluster Exploration. In Proceedings of ACL 2014, System Demonstrations, pages 109-114, Sofia, Bulgaria. [ bib ] [ poster ]
This paper offers a new way of representing the results of automatic clustering algorithms by employing a Visual Analytics system which maps members of a cluster and their distance to each other onto a two-dimensional space. A case study on Urdu complex predicates shows that the system allows for an appropriate investigation of linguistically motivated data.
Miriam Butt, Tina Bögel, Annette Hautli, Sebastian Sulger and Tafseer Ahmed. 2012. Identifying Urdu Complex Predication via Bigram Extraction. In Proceedings of COLING 2012, Technical Papers, pages 409–424, Mumbai, India. [ bib ] [ slides ]
A problem that crops up repeatedly in shallow and deep syntactic parsing approaches to South Asian languages like Urdu/Hindi is the proper treatment of complex predications. Problems for the NLP of complex predications are posed by their productiveness and the ill understood nature of the range of their combinatorial possibilities. This paper presents an investigation into whether fine-grained information about the distributional properties of nouns in N+V CPs can be identified by the comparatively simple process of extracting bigrams from a large “raw” corpus of Urdu. In gathering the relevant properties, we were aided by visual analytics in that we coupled our computational data analysis with interactive visual components in the analysis of the large data sets.
Tafseer Ahmed, Miriam Butt, Annette Hautli and Sebastian Sulger. 2012. A Reference Dependency Bank for Analyzing Complex Predicates. In Proceedings of LREC12, pages 3145-3152, Istanbul, Turkey. [ bib ] [ slides ]
When dealing with languages of South Asia from an NLP perspective, a problem that repeatedly crops up is the treatment of complex predicates. This paper presents a first approach to the analysis of complex predicates (CPs) in the context of dependency bank development. The effort originates in theoretical work on CPs done within Lexical-Functional Grammar (LFG), but is intended to provide a guideline for analyzing different types of CPs in an independent framework. Despite the fact that we focus on CPs in Hindi and Urdu, the design of the dependencies is kept general enough to account for CP constructions across languages.
Christian Rohrdantz, Andreas Niekler, Annette Hautli, Miriam Butt and Daniel A. Keim. 2012. Lexical Semantics and Distribution of Suffixes — A Visual Analysis. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS and UNCLH, p. 7-15, Avignon, France.
[ bib ] [ slides ]
We present a quantitative investigation of the cross-linguistic usage of some (relatively) newly minted derivational morphemes. In particular, we examine the lexical semantic content expressed by three suffixes originating in English: -gate, -geddon and -athon. Using data from newspapers, we look at the distribution and lexical semantic usage of these morphemes not only within English, but across several
languages and also across time, with a time-depth of 20 years. The occurrence of these suffixes in available corpora are comparatively rare, however, by investigating huge amounts of data, we are able to arrive at interesting insights into the distribution, meaning and spread of the suffixes. Processing and understanding the huge amounts of data is accomplished via visualization methods that allow the presentation of an overall distributional picture, with further details and different types of perspectives available on demand.
Annette Hautli and Miriam Butt. 2011. Towards a Computational Semantic Analyzer for Urdu. In Proceedings of The 5th International Joint Conference on Natural Language Processing, 9th Workshop On Asian Language Resources, pages 71-78. Chiang Mai, Thailand. [ bib ] [ slides ]
This paper describes a first approach to a computational semantic analyzer for Urdu on the basis of the deep syntactic analysis done by the Urdu ParGram grammar. Apart from the semantic construction, external lexical resources such as Urdu WordNet and a preliminary resource for Urdu verbs are developed and connected to the semantic analyzer. These resources allow for a more abstract level of representation by providing real-word knowledge such as hypernyms of lexical entities and information on thematic roles. We therefore contribute to the overall goal of providing more insights into the computationally efficient analysis of Urdu, in particular to the computational semantic analysis of the language.
Rajesh Bhatt, Tina Bögel, Miriam Butt, Annette Hautli and Sebastian Sulger. 2011. Urdu/Hindi Modals. In Proceedings of the LFG11 Conference, pages 47-67. Stanford: CSLI Publications. [ bib ] [ handout ]
In this paper we survey the various ways of expressing modality in Urdu/Hindi and propose an analysis that does justice to the various different realizations of modality in Urdu. In terms of syntactic analysis, we use the standard LFG (Bresnan 1982) and ParGram (Butt et al. 1999) analysis as a point of departure. In terms of semantics, we rely on Kratzer’s (1977, 1981, 1991) landmark proposal for an analysis of modality, but show that this approach needs to be extended in order to account for the range of meaning encoded by the different strategies for encoding modality in Urdu. In particular, there is very clear evidence in Urdu for a two-place modal operator in addition to the one-place operator usually assumed in the literature (a.o. Lewis (1944), Carnap (1947)).
Christian Rohrdantz, Annette Hautli, Thomas Mayer, Miriam Butt, Daniel A. Keim and Frans Plank. 2011. Towards Tracking Semantic Change By Visual Analytics. In Proceedings of ACL 2011, pages 305–310,
Portland, OR, USA.
[ bib ] [ slides ]
This paper presents a new approach to detecting and tracking changes in word meaning by visually modeling and representing the diachronic development in word contexts. Whereas previous studies have shown that computational models are capable of disambiguating senses, a more recent trend investigates whether changes in word meaning can be tracked by automatic methods. The aim of our study is to offer a new instrument to investigate the diachronic development of word senses in a way that allows for a better understanding of the nature of semantic change in general. For this purpose we combine techniques from the field of Visual Analytics with unsupervised methods from Natural Language Processing, providing an interactive visual exploration of semantic change.
Annette Hautli and Sebastian Sulger. 2011. Extracting and Classifying Urdu Multiword Expressions. In Proceedings of the ACL-HLT 2011 Student Session, pages 24–29,
Portland, OR, USA. [ bib ] [ poster ]
This paper describes a method for automatically extracting and clustering multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus. The MWEs are extracted by an unsupervised
method and clustered into two distinct classes, namely locations and person names. The clustering is based on simple heuristics that take the co-occurence of MWEs with distinct postpositions into account. The resulting classes are evaluated against a hand-annotated gold standard and achieve an f-score of 0.5 and 0.746 for locations and persons, respectively.
A target application is the Urdu ParGram grammar, where MWEs are needed to generate a more precise syntactic and semantic analysis.
Tafseer Ahmed and Annette Hautli (2010). A First Approach Towards an Urdu WordNet. Linguistics and Literature Review 6 Vol. 1 No 1, pages 1-14. (Proceedings of the Conference on Language and Technology 2010 (CLT10), Islamabad, Pakistan) [ bib ]
This paper reports on a first approach for developing a lexical knowledge resource for Urdu on the basis of Hindi WordNet. Due to the structural similarity of Urdu and Hindi, we can focus on overcoming the differences in the scriptual systems of the two languages by using transliterators. Various natural language processing tools, among them a computational semantics based on the Urdu ParGram grammar, can use the resulting basic lexical knowledge base for Urdu.
Annette Hautli, Özlem Çetinoğlu, Josef van Genabith (2010). Closing the Gap Between Stochastic and Hand-crafted LFG Grammars. In Proceedings of the LFG10 Conference, pages 270-289, CSLI Publications, Stanford. [ bib ] [ slides ]
This paper presents an approach to extend the stochastic DCU LFG annotation algorithm with more detailed f-structure information. It thereby reaches the feature detailedness of state-of-the-art hand-crafted grammars such as the English XLE grammar, while profiting from the robustness and the good coverage of stochastic grammars.
Annette Hautli and Tracy Holloway King (2009). Adapting Stochastic LFG Input for Semantics. In Proceedings of the LFG09 Conference, pages 357-377, CSLI Publications, Stanford. [ bib ] [ poster ]
In this paper, we present a system in which a stochastic LFG-like grammar of English provides the input to the semantic processing. The LFG-like grammar uses stochastic methods to create a c-structure and a proto f-structure. A set of ordered rewrite rules augments and reconfigures the proto f-structure to add more information to the stochastic output. Evaluation of the resulting derived f-structures and of the semantic representations based on them indicates that the stochastic LFG-like grammar can be used to produce input to the semantics.
Tina Bögel, Miriam Butt, Annette Hautli and Sebastian Sulger. 2009. Urdu and the Modular Architecture of ParGram. Proceedings of CLT09, CRULP, Lahore, Pakistan. [ bib ] [ slides ]
This paper reports on the modular architecture for natural language parsing and generation that is a consequence of using Lexical Functional Grammar as the linguistic framework in the context of the ParGram (Parallel Grammar) project. In particular, we discuss the following modules: the tokenizer and morphological analyzer, the syntax as implemented in the grammar development platform XLE and the semantics, which is effected through rewrite rules. We also briefly touch upon the ability to allow for extra projections, such as the prosodic projection. Overall, Lexical-Functional Grammar in conjunction with the XLE development platform allows not only for robust and large-scale natural language parsing and generation, but also for the incorporation of deep linguistic insights.
Tina Bögel, Miriam Butt, Annette Hautli, Sebastian Sulger. 2007. Developing a Finite-State Morphological Anlayzer for Urdu and Hindi: Some Issues. FSMNLP07, Potsdam, Germany. [ bib ] [ slides ]
We introduce and discuss a number of issues that arise in the process of building a finite-state morphological analyzer for Urdu, in particular issues with potential ambiguity and non-concatenative morphology. Our approach allows for an underlyingly similar treatment
of both Urdu and Hindi via a cascade of finite-state transducers that transliterates the very different scripts into a common ASCII transcription system. As this transliteration system is based on the XFST tools that the Urdu/Hindi common morphological analyzer is also implemented in, no compatibility problems arise.
My Google Scholar Citation page