This page organizes key columns (fields) that appear in the search results and in downloaded result files
(e.g., CSV) when searching BCCWJ2 in Chunagon, focusing on the items most important for corpus use.
Note: The following description applies to the book data that is publicly available as of March 2026.
Field groups (categories)
Corpus information
Sample ID / Start position / Serial number / Register / Core
Morphological information
Lexeme / Lexeme reading / Lexeme Subclassification / Word form / Orthographic Form / Phonetic Surface Form
Source information
Author (contributor) / Birth decade / Genre / Editors / Publisher / Publication year / ISBN
Corpus information
Sample ID
- An ID assigned to each sample indicating the year of publication and the genre (NDC).
Start position / Serial number
- Start position: The offset value from the beginning of the sample in the original text string (in increments of 10).
- Serial number: The order of short units within the sample (in increments of 10).
Register
- A language variety used in a particular context or situation.
Core
- A sample whose analysis accuracy has been improved through manual annotation.
Morphological information
Lexeme / Lexeme reading / Lexeme Subclassification / Word form
- Lexeme: Corresponds to the representative written form of a dictionary headword (written using a mixture of kanji and kana).
- Lexeme reading: Corresponds to the dictionary headword reading (written in katakana).
Lexeme Subclassification: Information that further subdivides a lexeme according to distinctions such as sense or meaning. - Word form: A headword level that distinguishes variant forms (written in katakana).
Orthographic Form / Phonetic Surface Form
- Orthographic Form: The headword form at the level that distinguishes different written representations.
- Phonetic Surface Form: The headword form at the level that distinguishes different pronunciations.
Source information
Source information is based on the National Bibliography data provided by the National Diet Library: https://www.ndl.go.jp/data/data_service/jnb.
Author (contributor)
- Indicates the author of the sample.
Birth decade
- Shows the author’s birth year in 10-year units (decades). It is displayed by truncating the birth year in the National Bibliography data to the nearest lower multiple of 10.
Genre
- Indicates the genre of the source (e.g., books, textbooks, SNS). (As of March 2026: books only.)
- For books, NDC (Japanese Decimal Classification) major/minor classes and NDLC (NDL Classification) are shown.
Editors
- Shows authors and editors of the book.
Publisher
- Shows the publisher of the book.
Publication year
- Shows the publication year of the book.
ISBN
- Shows the ISBN(International Standard Book Number) of the book.
Differences from BCCWJ1
- The author's gender is not shown.
- In BCCWJ1, the third genre column used C-codes, whereas in BCCWJ2 it uses NDLC (National Diet Library Classification).


