日本語
krm_main

krm_main #

Overview and file formats #

This section describes the core files of the database for the Kanchiin manuscript of the Ruiju Myōgishō (hereinafter “Myōgishō”).

Previously, the released file was a TSV file named KRM.tsv.

It contains information regarding Headwords, the full content of the Definition (Original Glosses), volume, radical, and the locations in the Kazama Shobō edition and the Tenri Central Library/Yōtokusha (Tenri Zenhon Sōsho) edition.

In March 2025, the specifications for column names and the display method for Tone marks (*shōten*) were updated. To clearly indicate that it is the file with these updated specifications, it was renamed krm_main.tsv. A JSON version of this file has also been made available.

Column name comparison #

The correspondence between the old and new column names is as follows:

New Column Name (v1.2.5) Old Column Name (v1.1.347)
entry_id KRID_n
hanzi_id KRID_sn
- KR2ID
kazama_location KRID
tenri_location KR_Tenri_p
volume_name KR_vol_name
radical_name KR_radical
volume_radical_index KR_vol_radical
hanzi_entry Entry
original_entry Entry_original
definition Def
- Remarks

The KR2ID column was omitted, and the kazama_location column was aligned with the KRID column.

The Remarks column was omitted; this information is now consolidated in the krm_notes file (which contains data for the Compiler's Remarks).

Description of each column #

The content of each column (v1.2.5) is explained below.

New Column Name (v1.2.5) Explanation
entry_id A heading Entry ID consisting of a 5-digit numeric ID starting with ‘F’. For some added entry items, a ‘b’ suffix is appended.
hanzi_id A heading Hanzi (Chinese character) ID consisting of a 5-digit numeric ID starting with ‘S’. For some added entry items, a ‘b’ suffix is appended.
kazama_location An ID indicating K + Volume (2 digits) + Kazama Edition Page (3 digits) + Line (1 digit) + Segment (1 digit) + Character order (字順, jijun) (1 digit). Details of the rules for assigning Character order are defined separately.
tenri_location An ID indicating T + Volume (a/b/c) + Tenri Edition Page (3 digits) + Line (1 digit) + Segment (1 digit) + Character order (字順, jijun) (1 digit). Details of the rules for assigning Character order are defined separately.
volume_name Name of the volume, consisting of 10 volumes: 仏上, 仏中, 仏下本, 仏下末, 法上, 法中, 法下, 僧上, 僧中, and 僧下.
radical_name Name of the radical, consisting of 120 radicals ranging from 人 to 雑, used to classify Hanzi (Chinese characters).
volume_radical_index Volume and radical number, ranging from v1#1 (Volume 1, Radical 1) to v10#120 (Volume 10, Radical 120), indicating the location of the Entry within the text. (Corresponds to 第1帖仏上 to 第10帖僧下).
hanzi_entry The collated Headword (校訂漢字) principally uses Kangxi Dictionary form, including Unicode simplified Chinese characters (common-use forms, popular variants). For Chinese characters not included in Unicode, they are represented by the following methods: If representable by combining Chinese character components, input using IDS (Ideographic Description Sequence). For specific Chinese characters or their components, if representation by IDS or standard Unicode is difficult, use simplified notations based on the entity reference systems of CHISE and GlyphWiki (e.g., CDP-8C55, koseki-00001). Chinese characters not representable by any of the above methods, or characters unreadable in the original text (due to damage such as wormholes, etc.), are input as ‘■’ (black square). Headwords consisting of multiple Chinese characters are separated by ‘/’ (full-width slash). The abbreviation symbol ‘|’ is indicated by ‘ー’ (long vowel mark), and the corresponding character is appended in full-width parentheses ().
original_entry Headword based on the original character form. Typographical errors in the original are preserved. The representation of Chinese characters outside Unicode follows the rules for hanzi_entry. If the original-form Headword is not needed, ‘〇’ is used.
definition The content of this definition column represents the Definition (Original Glosses). It includes Notes on Character Form, Phonetic Glosses, Semantic Glosses in Chinese, Japanese Native Readings (*wakun*), and Other relevant information, separated by spaces. As a general rule, character forms included in the “Kangxi Dictionary style” should be used.