日本語
krm_wakun

krm_wakun #

Overview and file formats #

This data file is derived by extracting Japanese Native Readings (*wakun*) from the KRM.tsv file (an older version of krm_main) of the Myōgishō database, organizing variant forms of these wakun, and adjusting their correspondence with variant characters (*itaiji*).

Collation notes and source investigations related to wakun are documented in the krm_notes file (which contains data for Compiler's Remarks), so they are omitted here.

In some wakun entries, different phonetic readings are presented side-by-side as annotations. For example, the wakun “マサル” (masaru) is assigned to the Hanzi (Chinese character) “倍” (bai), but “ス” (su) is written in small katakana to the right of “ル” (ru) as an additional note. This indicates that the wakun “マス” (masu) is also noted in addition to “マサル” (masaru).

Since information from the JapanKnowledge version of the Nihon Kokugo Daijiten (Second Edition) will be added to the wakun data, it is necessary to accommodate cases where variant forms of wakun are presented together.

The correspondence with variant characters (*itaiji*) has been adjusted because Headwords in the Myōgishō sometimes indicate such variants. For example, the wakun “ヤツカレ” (yatsukare) appears in the Definition (Original Glosses) for the Headword(s) “㒒/僕”. The wakun “ヤツカレ” is a Japanese Native Reading for “僕” (boku) and simultaneously for “㒒”. The relationship between standard and variant forms such as “爲” and “為”, or “來” and “来” is handled similarly.

The JapanKnowledge version of the Nihon Kokugo Daijiten (Second Edition) has a “Notation” (表記) field that includes Hanzi (Chinese character) notations from the Myōgishō; this adjustment is a measure to ensure correspondence with that resource.

To explicitly indicate that these are the filenames after the specification change in March 2025, lowercase “krm” was used instead of uppercase “KRM,” resulting in the names krm_wakun.tsv and krm_wakun.json.

Column name comparison #

The comparison of the new and old column names is as follows:

New Column Name (v1.2.0) Old Column Name (v1.1.97)
wakun_id KRID_wakun_no
definition_seq_id KRID_no
kazama_entry_location KR2ID
hanzi_entry Entry
wakun_elements Def
wakun_form Word_form
wakun_standard_hanzi Wakun_Hanzi
wakun_variant_in_hanzi Wakun_variant
variant_hanzi_for_wakun Hanzi_variant
japan_knowledge_id JK_URL
- Remarks

Remarks have been omitted as this type of information is now consolidated in the krm_notes file (data for Compiler's Remarks).

Description of each column #

Next, the content of the column names will be explained.

New Column Name (v1.2.0) English Explanation (Final Revised)
wakun_id An ID for each Japanese Native Reading (*wakun*). This is derived from definition_seq_id by extracting only those elements where the type (from definition_type_name in krm_notes) is Japanese Native Reading (*wakun*). Suffixes ‘b’, ‘c’, ’d’ are appended for variant forms.
definition_seq_id An identifier for each component of the Definition (Original Glosses) or for the Headword itself within an Entry. It is formed by appending a sequential suffix (e.g., “_00” for the Headword or overall Entry note, “_01”, “_02” for subsequent elements of the Definition (Original Glosses) in order of appearance) to the 5-digit numeric part of the entry_id. (This ID links to records in krm_notes).
kazama_entry_location ID including location information (Kazama edition: K, Book/volume, page(xxx), line(y), column(zz)), ranked 1, 2, …, n for multiple Entries in a column. “Book(volume)” represents the volume number, “page(xxx)” the page number, “line(y)” the line number, and “column(zz)” the column number.
hanzi_entry The collated Headword (using Hanzi (Chinese characters)) to which this Japanese Native Reading (*wakun*) pertains. Principally Kangxi Dictionary forms, though Unicode-representable new forms (common-use, popular variants) may be retained.
wakun_elements Extracted elements of Japanese Native Readings (*wakun*) from the full Definition (Original Glosses). Each record typically corresponds to one such element.
wakun_form The lexical form of the Japanese Native Reading (*wakun*). Inflected words are generally given in their dictionary (citation) form, excluding grammatical particles. The particles ’no’ and ’to’ from Monzen (文選) style readings are omitted.
wakun_standard_hanzi Notation of the Japanese Native Reading (*wakun*) using standard Hanzi (Chinese characters).
wakun_variant_in_hanzi Notation of a variant form of the Japanese Native Reading (*wakun*) using standard Hanzi (Chinese characters).
variant_hanzi_for_wakun Notation of the Japanese Native Reading (*wakun*) using variant characters (*itaiji*) of Hanzi (Chinese characters).
japan_knowledge_id If this Japanese Native Reading (*wakun*) exists as a headword in the JapanKnowledge version of the Nihon Kokugo Daijiten (2nd Ed.), the alphanumeric part of its URL (from “20020” to the end) is recorded here. If it does not exist as a headword, “null” is entered.