Introducing the NPCMJ

For the major languages of the world, progress has been made in the creation of corpora annotated with syntactic information (treebanks), and significant results have been gained in the fields of linguistics and language processing using these corpora for research. With regard to Japanese, at the National Institute for Japanese Language and Linguistics (NINJAL), the Collaborative Research Project “Development of and Linguistic Research with a Parsed Corpus of Japanese” began in 2016, and built the NPCMJ (NINJAL Parsed Corpus of Modern Japanese). This project aimed to annotate syntactic and semantic information to texts of written and spoken Contemporary Japanese, making it possible to search and extract from the data a rich inventory of function words, phrase structures, clause types, and complex constructions, and to use the results actively for research. The project ended in March 2022, but the results of the project are available to the public in the form of approximately 90,000 sentences (90,000 trees).

Source Number of Trees Word Count
Aozora Bunko 12,810 246,568
Bible 1,664 26,089
Blog 219 3,218
Book 553 10,992
Dictionary 26,279 141,297
Diet 1,698 32,715
Essay 3,264 70,167
Fiction 7,597 84,169
Law 337 6,943
News 5,979 90,570
Nonfiction 234 4,118
Patent 261 8,636
Spoken 2,382 12,720
TED Talk 1,453 21,420
Textbook 6,950 63,952
Whitepaper 13,433 398,347
Wikipedia 2,745 59,833
Misc. 2,211 22,754
Total 90,069 1,304,508

Online Tool for using the NPCMJ

The Kainoki Treebank Homepage
This site has been continuously maintained under the name “Kaikki Treebank” since the research project was completed. It has a powerful search interface that allows you to use almost all aspects of the annotations. We hope you will make use of it.
The Kainoki Treebank Homepage (External Link)

Annotation Manual

NPCMJ Annotation Manual
Download NPCMJ Annotation Manual

Full Download

Bracketed tree file format
This is a compressed zip file containing containing all the sample files of the NPCMJ in bracketed tree format.
Download bracketed kana tree files
Download bracketed romaji tree files

Exercises for Syntax Textbook

Exercises for Analyzing Japanese Syntax: A Generative Perspective
Analyzing Japanese Syntax: A Generative Perspective is an introductory syntax textbook that explains the basic ideas of generative grammar and uses concrete examples to show how Japanese syntax can be analyzed. The exercises in this textbook were developed in cooperation with the NPCMJ project. You can download the exercises for beginner and intermediate level. The advanced level exercises are no longer available as the project has been closed.
Download Exercises