Here, we introduce some of the major research achievements.
Vocabulary research: Laying the foundations and developing quantitative linguistics
The Institute has conducted "vocabulary research" on how words are used in the media and school textbooks, since these sources have a major impact on the language life of Japanese citizens.
The first vocabulary project used women’s magazines.
The 1950 (Showa 25) issues of two magazines, Shufu No Tomo and Fujin Seikatsu, provided the data. The analysis was based on a total of 200,000 words taken from 15-16% of the total pages. These magazines were selected for the research because the vocabulary of daily family life, especially clothing, food and housing, was the target of the analysis. This project used advanced statistical methods to select the data, laying the foundation for future vocabulary research methods.
Vocabulary research using magazines continued and expanded. One later project used 16 wide-ranging magazines (published in 1953 and 1954 (Showa 28 and 29)), and another used 90 general-interest magazines (published in 1956 (Showa 31)).
The "90-magazine" project was revolutionary in terms of scale. It was a pioneering effort not only because of the large volume of data, but also because of its unprecedented statistical precision.
These vocabulary research activities developed further with the transition to data analysis by computer.
In 1966 (Showa 41), vocabulary research using newspapers (1966 (Showa 41) issues of the Asahi, Mainichi, and Yomiuri) began. At that time, a large computer, which was a rare piece of equipment even in science laboratories, was introduced for vocabulary data processing. This effort not only increased the volume of data that could be handled, but also led to many new research methods, including various types of quantitative analysis and a system for creating glossaries with contexts (KWIC). Thus, the Institute played a major role in establishing the research field called mathematical linguistics.
The tradition of the research on vocabulary/Kanji in magazines continues to the present day. A recent project was the "two million character modern magazine language survey", which used the 1994 (Heisei 6) issues of 70 general-interest magazines.
One of the crowning achievements of the Institute’s vocabulary research is Bunrui Goihyo (Word List by Semantic Principles). As a pioneering thesaurus (synonym dictionary), it has been used widely since its publication in 1964 (Showa 39). An enlarged and revised edition was published in 2004.
The Institute conducted a five-year collaborative project with the Communications Research Laboratory (now the National Institute of Information and Communications Technology) and the Tokyo Institute of Technology to build a large-scale database of the spoken language. The resulting "Corpus of Spontaneous Japanese" is a world-class database both in quality and in size (7,520,000 words).