The Architecture of the Grammar

Morphology     Syntax     Prosody     Semantics    

Overall Architecture

The Urdu grammar is based on Lexical Functional Grammar, which allows for a modular architecture. Morphology, syntax, semantics and prosody are therefore encoded at independent levels providing the necessary flexibility.
Each level of analysis uses a different type of representation, depending on how the content is best represented. All of these levels interact with each other, providing a broad grammatical coverage of each string entered. Furthermore, the grammar can be used for both: parsing and generation.

Morphology (xfst)

Syntax (XLE),
c- and f-structure
Prosody
p-structure

Semantics (xfr)

Morphology

The morphology is based on finite state machines, mainly the finite state morphology tools described (and developed) by Beesley and Karttunen (2003). Using this technology, the grammar can deal with the full range of inflectional and derivational morphology in Urdu, including difficult phenomena such as reduplication.
The morphological analyzer provides each string with tags, e.g.

kitAb+Noun+Unmarked+Fem+Sg

is the analysis of the Urdu word kitAb, which means book.
The morphological analysis is then fed into the syntax via a morphology-syntax-interface. This interface basically makes the tags readable to XLE and feeds the analysis into the grammar for further use.
back to top

Syntax

The syntax component is at the core of the Urdu grammar. Its theoretical background is Lexical Functional grammar, which believes in a modular architecture. It runs on a platform named XLE, which has been developed by Xerox/PARC. The syntax depends on two pillars: first on c-structure, which forms the basic constituent "tree" and linear precedence; the second main pillar is the f-structure, which encodes grammatical relations and functional information.

Text? Text?
Above are the c- and the f-structure for the Urdu sentence billI mEz kE Upar hE ("The cat is on the table").

Currently, the grammar has about 40 phrase-structure rules, covering basic clauses with free word order, complex verbal constructions, tense and aspect, causative verbs and complex predicates. Note however that the large grammars have about 360 rules - so there is a lot of work to be done yet.
back to top

Prosody

One module that also connects with the syntax is the prosody of the sentence. Even though there is no special focus on developing this section, we have implemented some prosodic material for special constructions like the Urdu Ezafe. Ezafe's correct analysis is only possible by including the prosody. However, since most of the time the prosody is conform with the other modules we see no use in focussing on this section.
back to top

Semantics

The semantics module relies on the prolog-coded f-structures. We take the Prolog code and apply ordered rewrite rules (XFR) on it, consuming the input of the f-structures step by step. XLE then produces the semantic form as an output.
Text?
The semantics are the last part within the pipeline. Since the grammar has just started out, our semantics are not yet greatly worked out.
back to top