krm_main #

Overview and file formats #

This section describes the core files of the database for the Kanchi-in manuscript of the Ruiju Myōgishō (hereinafter “Myōgishō”).

Previously, the released file was a TSV file named KRM.tsv.

It contains information regarding Headwords, the full content of the Definition (Original Glosses), volume, radical, and the locations in the Kazama Shobō edition and the Tenri Central Library/Yōtokusha (Tenri Zenhon Sōsho) edition.

In March 2025, the specifications for column names and the display method for Tone marks (*shōten*) were updated. To clearly indicate that it is the file with these updated specifications, it was renamed krm_main.tsv. A JSON version of this file has also been made available.

Column name comparison #

The correspondence between the old and new column names is as follows:

New Column Name (v1.2.5)	Old Column Name (v1.1.347)
entry_id	KRID_n
hanzi_id	KRID_sn
-	KR2ID
kazama_location	KRID
tenri_location	KR_Tenri_p
volume_name	KR_vol_name
radical_name	KR_radical
volume_radical_index	KR_vol_radical
hanzi_entry	Entry
original_entry	Entry_original
definition	Def
-	Remarks

The KR2ID column was omitted, and the kazama_location column was aligned with the KRID column.

The Remarks column was omitted; this information is now consolidated in the krm_notes file (which contains data for the Compiler's Remarks).

Description of each column #

The content of each column (v1.2.5) is explained below.

New Column Name (v1.2.5)	Explanation
entry_id	A heading `Entry` ID consisting of a 5-digit numeric ID starting with ‘F’. For some added entry items, a ‘b’ suffix is appended.
hanzi_id	A heading `Hanzi (Chinese character)` ID consisting of a 5-digit numeric ID starting with ‘S’. For some added entry items, a ‘b’ suffix is appended.
kazama_location	An ID indicating K + Volume (2 digits) + Kazama Edition Page (3 digits) + Line (1 digit) + Segment (1 digit) + Character order (字順, jijun) (1 digit). Details of the rules for assigning Character order are defined separately.
tenri_location	An ID indicating T + Volume (a/b/c) + Tenri Edition Page (3 digits) + Line (1 digit) + Segment (1 digit) + Character order (字順, jijun) (1 digit). Details of the rules for assigning Character order are defined separately.
volume_name	Name of the volume, consisting of 10 volumes: 仏上, 仏中, 仏下本, 仏下末, 法上, 法中, 法下, 僧上, 僧中, and 僧下.
radical_name	Name of the radical, consisting of 120 radicals ranging from 人 to 雑, used to classify `Hanzi (Chinese characters)`.
volume_radical_index	Volume and radical number, ranging from v1#1 (Volume 1, Radical 1) to v10#120 (Volume 10, Radical 120), indicating the location of the `Entry` within the text. (Corresponds to 第1帖仏上 to 第10帖僧下).
hanzi_entry	The collated `Headword` (校訂漢字) principally uses Kangxi Dictionary form, including Unicode simplified Chinese characters (common-use forms, popular variants). For Chinese characters not included in Unicode, they are represented by the following methods: If representable by combining Chinese character components, input using IDS (Ideographic Description Sequence). For specific Chinese characters or their components, if representation by IDS or standard Unicode is difficult, use simplified notations based on the entity reference systems of CHISE and GlyphWiki (e.g., CDP-8C55, koseki-00001). Chinese characters not representable by any of the above methods, or characters unreadable in the original text (due to damage such as wormholes, etc.), are input as ‘■’ (black square). `Headwords` consisting of multiple Chinese characters are separated by ‘／’ (full-width slash). The abbreviation symbol ‘｜’ is indicated by ‘ー’ (long vowel mark), and the corresponding character is appended in full-width parentheses ().
original_entry	`Headword` based on the original character form. Typographical errors in the original are preserved. The representation of Chinese characters outside Unicode follows the rules for `hanzi_entry`. If the original-form `Headword` is not needed, ‘〇’ is used.
definition	The content of this `definition` column represents the `Definition (Original Glosses)`. It includes `Notes on Character Form`, `Phonetic Glosses`, `Semantic Glosses in Chinese`, *`Japanese Native Readings (wakun)`, and Other* relevant information, separated by spaces. As a general rule, character forms included in the “Kangxi Dictionary style” should be used.