
コーパス日本語学の創成 (音声・対話グループ)
前川 喜久雄 (国立国語研究所 言語資源研究系 教授)
平成22年10月15日 (金) 17:00~19:00 (非公開)
国立国語研究所 4階 405号室


"Learning phonemes with a pseudo-lexicon"Andy MARTIN (理化学研究所)

Infants acquiring their native language must overcome the variability inherent to speech, which alters phonemes and words according to their phonological context. Experimental evidence indicates that infants begin to learn how sounds vary in their language before they know many words, suggesting that early phonological learning takes place without the benefit of top-down lexical knowledge. Peperkamp, Le Calvez, Nadal, & Dupoux (2006) modeled this acquisition with an algorithm that compares the statistical distributions of pairs of segments in order to determine which are allophones of the same phoneme. I test the performance of this algorithm on a greater range of data than that used by Peperkamp et al., and demonstrate that although it is effective for simple artificial phonologies, it fails to scale up to realistically complex systems. I propose an alternative model in which infants build a crude approximation of the lexicon consisting of the high-frequency n-grams present in their speech input, and use this to significantly reduce the search space. I show that this model is superior to the Peperkamp et al. algorithm on data containing a realistic number of allophones.
The "pseudo-lexicon" allows infants to access lexical information without the learnability problems posed by models which require them to know words before they can learn phonemes.