Notice concerning the discontinuation of the NPCMJ search interface
The search interface has been associated with the NPCMJ since its first publication in 2016. However, on March 1st 2022 this interface was discontinued.
Note that the NPCMJ Explorer interface will continue, although functionality gained through linking to the search interface will be lost.
Going forward, we encourage NPCMJ users to become accustomed to the NPCMJ project’s development interfaces, accessible from https://oncoj.orinst.ox.ac.uk/. Select ‘Kainoki’ to access contemporary Japanese data.
The development interface is a more powerful search interface and is being continually updated with the latest developments of the NPCMJ project. It also shares features with a growing number of corpora covering, for example, Old Japanese, Japanese regional dialects, Japanese child language, and JFL/JSL Learner Japanese data.
Note that the NPCMJ Explorer interface will continue, although functionality gained through linking to the search interface will be lost.
Going forward, we encourage NPCMJ users to become accustomed to the NPCMJ project’s development interfaces, accessible from https://oncoj.orinst.ox.ac.uk/. Select ‘Kainoki’ to access contemporary Japanese data.
The development interface is a more powerful search interface and is being continually updated with the latest developments of the NPCMJ project. It also shares features with a growing number of corpora covering, for example, Old Japanese, Japanese regional dialects, Japanese child language, and JFL/JSL Learner Japanese data.
Introducing the NPCMJ
For the major languages of the world, progress has been made in the creation of corpora annotated with syntactic information (treebanks), and significant results have been gained in the fields of linguistics and language processing using these corpora for research. With regard to Japanese, at the National Institute for Japanese Language and Linguistics (NINJAL), the Collaborative Research Project “Development of and Linguistic Research with a Parsed Corpus of Japanese” began in 2016, and is presently building the NPCMJ (NINJAL Parsed Corpus of Modern Japanese). This project aims to annotate syntactic and semantic information to texts of written and spoken Contemporary Japanese, making it possible to search and extract from the data a rich inventory of function words, phrase structures, clause types, and complex constructions, and to use the results actively for research. Approximately 90,000 sentences (90,000 trees) have been made publicly available. Together with the data, the project also offers a variety of tools designed to be used with the NPCMJ, enabling searches of many different kinds. By all means see for yourself what can be done with the tools and the data.
Source | Number of Trees | Word Count |
Aozora Bunko | 12,810 | 246,568 |
Bible | 1,664 | 26,089 |
Blog | 219 | 3,218 |
Book | 553 | 10,992 |
Dictionary | 26,279 | 141,297 |
Diet | 1,698 | 32,715 |
Essay | 3,264 | 70,167 |
Fiction | 7,597 | 84,169 |
Law | 337 | 6,943 |
News | 5,979 | 90,570 |
Nonfiction | 234 | 4,118 |
Patent | 261 | 8,636 |
Spoken | 2,382 | 12,720 |
TED Talk | 1,453 | 21,420 |
Textbook | 6,950 | 63,952 |
Whitepaper | 13,433 | 398,347 |
Wikipedia | 2,745 | 59,833 |
Misc. | 2,211 | 22,754 |
Total | 90,069 | 1,304,508 |
Online Tools for using the NPCMJ
NPCMJ Development Interfaces (data is updated daily)
This new interface offers direct access to the most up-to-date working files of the NPCMJ database. This interface also gives the advantage of search with Tregex, a very powerful search tool. There are many other additional features that will be of interest, including alternative analysis views to showcase the depth of information present with the annotation.
Start the NPCMJ Development Interfaces (External Link)
NPCMJ ExplorerFor entry-level users
This is a pattern browser that searches the corpus for examples matching the grammatical descriptions in Kisonihongo bunpo, Revised edition, by Masuoka Takashi and Takubo Yukinori, from Kuroshio Publishers. It also includes a character string search function with which users can look for examples based on strings they enter themselves.
Start the NPCMJ Explorer
NPCMJ Child Language Development Timeline (NPCMJ-CLDT)
The NPCMJ Child Language Development Timeline (NPCMJ-CLDT) provides an interface for interacting with Soyogo. Soyogo is a parsed corpus of child language Japanese, with data sourced from the CHILDES database. The NPCMJ-CLDT interface makes the morpho-syntactic analysis of child language accessible to search and exploration through an age range filter. Soyogo and the NPCMJ-CLDT interface are components of the NPCMJ project.
Start NPCMJ-CLDT
NPCMJ Annotation Manual
Download NPCMJ Annotation Manual
Full Download
Bracketed tree file format
This is a compressed zip file containing containing all the sample files of the NPCMJ in bracketed tree format.
Download bracketed kana tree files
Download bracketed romaji tree files
Exercises for Syntax Textbook
Exercises for Analyzing Japanese Syntax: A Generative Perspective
Analyzing Japanese Syntax: A Generative Perspective is an introductory textbook that illustrates the basic ideas of the most influential syntactic theory of generative grammar and shows how Japanese syntax can be analyzed using concrete examples from a range of phenomena. Exercises for this textbook have been developed in collaboration with the NPCMJ project, and include advanced-level exercises using an online search engine for NPCMJ.
Start Exercises for Analyzing Japanese Syntax: A Generative Perspective