For several years after the beginning of the 'statistical revolution' computational linguistics focused primarily on the syntactic aspects of language interpretation: first on part of speech tagging, then on parsing. But in the past ten years there has been a return of interest in semantic interpretation problems, such as semantic role labelling and anaphora resolution - the task of recognizing which mentions in a text refer to the same object. Larger annotated corpora such as OntoNotes can now be used to train and test ouor models; the improving performance of parsers is making mention detection easier; the availability of resources such as WordNet, FrameNet and Wikipedia is enabling the interpretation of more complex anaphors; and last but not least better machine learning models are being developed. These developments are also leading to a return of interest in the more interesting aspects of the problem that were at the heart of research until the mid-90's but were considered too complex to handle with the methods and resources available until 2005 or so.
In this course we will begin by discussing linguistic and psychological data about anaphora, with a particular attention to the problem of salience. We will then talk about anaphoric annotation, discussing available corpora, the practice of anaphoric annotation, and covering our recent work on using games for anaphoric annotation (www.phrasedetectives.org). We will then discuss anaphora resolution models, beginning with the traditional algorithms (Hobbs and Sidner), moving on to the first statistical models (Soon, Ng) and then discussing recent developments with particular emphasis on proposals for using lexical and commonsense knowledge and on more advanced machine learning models of the anaphora resolution task. We will conclude by introducing the BART toolkit for developing anaphora resolution tools.