[Last update]January 10, 2024

C-JAS 中国語・韓国語母語の日本語学習者縦断発話コーパス

This is a spoken corpus, based on a series of longitudinal studies consisting of data collected from 3 Chinese and 3 Korean learners of Japanese. The title of this corpus is "Spoken Corpus of Longitudinal Research on Chinese and Korean Learners of Japanese", and the abbreviated title is "C-JAS (Corpus of Japanese as a Second Language)".
The data comprises 570,000 words and about 46.5 hours worth of speech (utterance). An online system can be used to search the data for examples by morpheme unit, character string, etc.

I-JAS 多言語母語の日本語学習者横断コーパス

This corpus is based on cross-sectional research and collects data on spoken and written words of 1000 Japanese learners of 12 different native languages. The title of this corpus is "International Corpus of Cross-sectional Research of Japanese learners", and the abbreviated title is "I-JAS (International Corpus of Japanese as a Second Language)". The targeted Japanese learners were requested to take the Japanese proficiency test to assess their langauge proficiency. Therefore, the data can be compared by level, native language, tasks, and learning environment. Examples can be searched online, along with audio data from speech research.