Konstanz Prosodically Annotated Infant-Directed Speech Corpus
May 2016: We will present the KIDS Corpus at the 8th International Conference on Speech Prosody, which will be hosted at Boston University from Tuesday, May 31 through Friday, June 3, 2016. You can download the poster and the paper here.
May 2016: We are very happy to announce that the KIDS Corpus will soon be available on CHILDES (MacWhinney 2000) and Phon (Rose et al. 2006; Rose & MacWhinney 2014). Thanks to Brian MacWhinney and Yvan Rose for their support!
The KIDS corpus is the first prosodically annotated infant-directed speech corpus in German – a tool for formulating hypotheses and modeling acquisition processes in the prosodic domain and at the prosody-syntax interface. This multi-layered corpus consists of 524 intonation phrases (IPs) directed to infants younger than one year (196 IPs extracted from the CHILDES database; 328 IPs from our own recordings). Pitch accents (n=832) and boundary tones (n=1048) were labeled according to GToBI (Grice, Baumann & Benzmüller 2005; click here for training materials). Furthermore, we annotated the presence of unaccented syllables and pitch targets before and after the accentual syllable. Such an additional theory-neutral prosodic annotation is important as we do not know whether infants are more sensitive to the pitch movement leading to the accented syllable (a.k.a. onglides) or to the pitch movement following the accented syllables (a.k.a. offglides). The current corpus hence captures the tonal surroundings on both sides of the accented syllable. We also tagged the word-prosodic structure of all accented words (e.g., trochaic, iambic) and the syntactic category of both accented and unaccented words (e.g., noun, verb, adjective).
Eight mothers from the CHILDES database (MacWhinney 2000); a subset was selected so that there was a more balanced number of Intonational Phrases (IPs) from each mother (see Zahner, Pohl & Braun 2015) (henceforth, CHILDES subset).
Seven mothers recorded at the Baby Speech Lab (BSL) at the University of Konstanz and one mother recorded in a home-environment (henceforth, BSL subset).
Recordings of mother-infant dyads (in both subsets) took place in natural play situations.
In total, the KIDS Corpus comprises 10min 12sec of speech, 2014 words (513 word forms, counting all inflected forms of a word as a separate type), 524 IPs, 832 pitch accents.
Table 1 shows more detailed information on each mother.
Two trained annotators (first and last author of Zahner, Schönhuber, Grijzenhout & Braun (2016)) labeled the corpus together, using praat (Boersma & Weenink 2014).
For each wav-file, a corresponding TextGrid-file consisting of ten tiers was created, see example annotation below.
In the following, we specify the information that is provided on each tier. We also indicate whether the information is annotated on an interval or a point tier.
Word category of both accented and unaccented words (more detailed categories, following the guidelines of STTS (Stuttgart - Tübingen Tagset), (Schiller, Teufel, Stöckert & Thielen 1999), e.g., "ADJA" for adjective in attributive position or "ADJD" for an adjective used predicatively or adverbially) (word category labels); point tier)
S: primary stressed syllable (e.g., S for "Maus")
W: unstressed, weak syllable (e.g., SW for "'Mama" or WS for "Mu'sik")
s: secondary stressed syllable (typically in compounds, e.g., SWsW for "'Sandel,eimer")
Prosodic domain of accent (indication of availability of unaccented syllables to the left or right of the accented syllable (=a) on which leading or trailing tones could be realized; 1 = unaccented syllables available; 0 = no unaccented syllables available; analysis is performed irrespective of word boundaries; point tier)
0a0: accented syllable is immediately surrounded by other accented syllables or boundary tones (e.g., % "NEIN" %; capitalization indicates the accented syllable; % indicates an IP boundary)
1a1: accented syllable has at least one unaccented syllable to its right and its left (e.g., "geSCHLAfen", "was MACHST du", "der RAsselt")
0a1: accented syllable has at least one unaccented syllable to its right and is preceded by another pitch accent or boundary tone (e.g., % "KAtze", % "SCHAU mal", % "HINsetzen")
1a0: accented syllable has at least one unaccented syllable to its left and is immediately followed by another pitch accent or boundary tone (e.g., "MuSIK" %, "mit SAND" %)
Tritonal pattern analysis (for 1a1-condition (accented material available on both sides of the accented syllable): indication of tonal surrounding on both sides of the accented syllable; point tier). For more details on the motivation for this analysis and precise labeling conventions see Zahner, Pohl & Braun (2015) (paper) and Zahner, Schönhuber, Grijzenhout & Braun (2016) (paper).
Similar to ToBI (Silverman et al. 1992), the tone associated with the accented syllable is marked by an asterisk (e.g., LH*L, HH*L)
If the preceding or following tonal target is not associated with a syllable adjacent to the accented syllable, this separation of tonal targets is indicated by ".." (e.g., LH*..L)
In the following, we present some figures and tables on different distribution frequencies, which are also found in Zahner, Schönhuber, Grijzenhout & Braun (2016).
Most of the words that are used by the mothers in the KIDS Corpus are verbs (23%), followed in frequency by pronouns (19%), adverbs (18%) and nouns (12%), see Figure 2. Within the 524 IPs, 832 words are accented. Thus, an IP contains 1.6 pitch accents on average. 41% of the words carry a pitch accent (832 out of 2014). In total, 26% of the accented words are nouns, 25% are verbs, 16% are adverbs, and 10% are adjectives, see Figure 2. Most of the accented words follow a typical Germanic word-prosodic structure: 52% are monosyllabic (S), followed in frequency by trochaic words (SW, 30%). Other structures are considerably less frequent (e.g., WS: 4%, SWW: 4%).
Table 2 shows the distribution of boundary tones in the KIDS Corpus, Table 3 the distribution of pitch accents. The corpus comprises 524 initial and 524 final boundary tones. In the majority of cases, the utterances start with a low boundary tone (69% of the IPs). The most frequent final boundary tone is L-% (46% of the IPs). The next frequent patterns are a high plateau (H-%, 25%) and a low rise (L-H%, 13%). Incomplete falls (!H-%, 7%) and high rises (H-^H%, 7%) are least frequent.
Overall, the most frequent accent types are H* and L+H*, each occurring in more than 25% of the accents overall. The low-pitched monotonal accent (L*) are also common (18%). Note, however, that L* accents are often followed by a high tone, in particular a high boundary tone (see analysis of three-tone-sequences in Table 4). In the CHILDES recordings, L* accents are significantly more frequent than in the utterances recorded in our lab (25% vs. 13%; p=0.003 in a glmer with dataset as fixed factor and mother as crossed-random factor). Accents with a high leading tone (H+L*, H+!H*) are only sparsely represented in the corpus (6% and 2%, respectively).
Table 4 shows the results for the three-tone-sequences in accents that are surrounded by at least one unaccented syllable on both sides, i.e., the 1a1-cases (see annotation on tier 9 in the TextGrid-file). For the sake of clarity, the results are simplified in two respects: First, Table 4 ignores scaling differences, i.e., an L+H* !H-% is counted as LH*L, see Figure 1. Second, it is not taken into account whether a preceding or following pitch target is associated with a syllable adjacent to the accented syllable or is realized later, i.e., an LH*..L notation is counted as LH*L here, see Figure 1. In total, the relevant 1a1-cases account for more than half of the data (426 accented syllables). By far the most frequent accentual pattern is a rising-falling movement (LH*L), which occurs in 34% of the cases. The second most common accentual pattern is LL*H. i.e., a low accentual tone preceded by a low and followed by a high tone, occurring in 14% of the cases. Regarding the three-tone-sequence LL*H in the 1a1-cases of our corpus, we again observed a distribution difference across the two subsets: LL*H patterns are significantly more frequent in the CHILDES subset than in the BSL recordings (20% vs. 11%, p=0.04 in a glmer with dataset as fixed factor and mother as crossed-random factor).
The TextGrid-files can be downloaded here (zip).
For more information on the annotation and analysis, please contact Katharina Zahner (firstname dot lastname at uni-konstanz dot de).
If you wish to gain access to the wav-files, please fill in the following form (German version or English version) and send it to Katharina Zahner. We will respond to your request as soon as possible.
Zahner, K., Schönhuber, M., Grijzenhout, J. & Braun, B. (2016). Konstanz prosodically annotated infant-directed speech corpus (KIDS Corpus). Proceedings of the 8th International Conference on Speech Prosody. Boston, USA. (paper)
We thank Isabelle Auriga, Andrea Beeken, Sophie Egger, Angela James and Stephanie Gustedt for help with preparation and analyses of the data. We also thank Clara Huttenlauch for writing PRAAT scripts for TextGrid preparation and analyses as well as designing this homepage. We further appreciate discussion of data at the DIMA (Annotation Guidelines for German Intonation) meeting in Potsdam (March 2015). We owe special thanks to Brian MacWhinney and Yvan Rose from the CHILDES team for their support with data conversion in order to make the data available on the CHILDES online database soon. Last but definitely not least, we thank all mothers and their infants for their participation in our study.