krm_pronunciations #

Overview and file formats #

The Phonetic Glosses in the Kanchi-in manuscript of the Ruiju Myōgishō (hereafter Myōgishō) include Fanqie spellings (反切), Similar sound notes (類音注, ruion-chū), and Kana glosses (仮名注, kana-chū). These are often accompanied by Tone marks (声点, shōten). As a database for Sino-Japanese pronunciations, the “Database of Historical Sino-Japanese Readings” (abbreviated as DHSJR), developed by Professor Katō Taitsuru and others, offers exceptionally rich content. Its specifications are also publicly available in detail. Accordingly, as part of the HDIC project, we have decided to release data in accordance with the DHSJR specifications.

The DHSJR defines a data structure with 23 column names.

To facilitate linkage with the Myōgishō data included in HDIC, it is necessary for HDIC to assign unique column names to its own data files and to establish Primary Keys and Foreign Keys for interoperability between HDIC’s internal data files.

For this purpose, pronunciation_id (音注ID) has been set as the Primary Key, and definition_seq_id (注文ID) as the Foreign Key.

Since the Myōgishō features diverse formats for its Phonetic Glosses, a classification field named annotation_format (音注型) has been established to categorize them.

While DHSJR uses Japanese column names, HDIC employs English ones. Therefore, for data processing convenience within HDIC, English column names have been adopted.

Column name comparison #

The current draft, with English and Japanese explanations side-by-side, is as follows. The Japanese explanations are those stipulated by DHSJR. The English explanations are formulated to facilitate correspondence with HDIC. This is a provisional measure until official English explanations are released by DHSJR.

HDIC’s original column names are indicated in bold.

DHSJR (Japanese)	HDIC (English)	Key	English Explanation	Japanese Explanation (from DHSJR)
ID	dhsjr_id		DHSJR unique ID for each single `Hanzi (Chinese character)` (integrated data only)	単字ごとのユニークID（統合データのみ）
音注ID	`pronunciation_id`	Primary Key	ID for each `Phonetic Gloss`. This is derived from `definition_seq_id` by extracting only those elements where the type (from `definition_type_name` in `krm_notes`) is `Phonetic Gloss`. Suffixes ‘b’, ‘c’, ’d’ are appended for variant forms.	音注ID。kr_definition_sequence_idから、注文の種類が音注のものだけを取り出したもの。変異形を追加したものには末尾にxを付した。 (User indicates ‘x’ is incorrect, and ‘b,c,d’ is correct for variants)
注文ID	`definition_seq_id`	Foreign Key	An identifier for each component of the `Definition (Original Glosses)` or for the `Headword` itself within an `Entry`. It is formed by appending a sequential suffix (e.g., “_00” for the `Headword`, “_01”, “_02” for subsequent elements) to the corresponding `entry_id`.	連番で与えられるFで始まる5桁の見出しの数値IDに加えて、見出しの下に記される注文の各要素を出現順に区分し、出現の順番に_01、_02のように追加したもの。見出しには_00を追加する。
資料番号	material_id		Material ID	資料ID
資料名	material_name		Name of the material	資料の名称
資料内漢字番号	material_character_index		Sequential number of a `Hanzi (Chinese character)`’s appearance in the material	漢字の資料内出現順の通し番号
資料内漢語番号	material_word_index		Sequential number of a Chinese word’s appearance in the material	漢語の資料内出現順の通し番号
単字_見出し	character_headword		`Headword` column for `Hanzi (Chinese characters)` with `Phonetic Glosses`	音注が付された漢字の見出し列
単字_出現形	character_form		`Hanzi (Chinese characters)` that have `Phonetic Glosses`	音注が付された漢字
漢語_見出し	word_headword		`Headword` column of Chinese words containing `Hanzi (Chinese characters)` with `Phonetic Glosses`	音注が付された漢字を含む漢語の見出し列
漢語_出現形	word_form		Chinese words containing `Hanzi (Chinese characters)` with `Phonetic Glosses`	音注が付された漢字を含む漢語
漢語_alphabet	word_alpha		Entered when there is an alphabetic representation of the Chinese word	欧文による漢語の表記がある場合に入力されている。
語種	word_type		Indicates the word type when there are mixed-language words (e.g., hybrid Sino-Japanese words)	混種語がある場合に、語種を示す。
漢語内位置	word_position		Position of the single `Hanzi (Chinese character)` within the Chinese word	漢語内での単字の位置
単字長	character_mora_count		Number of morae for the single `Hanzi (Chinese character)`	単字の拍数
声点	tone_marks		`Tone marks` for single `Hanzi (Chinese characters)`, indicating Four Tones (平上去入), Six Tones (平平軽上去入軽入), and voicing (清濁).	単字に対する四声（平上去入）、六声（平平軽上去入軽入）及び清濁。
声点型	tone_pattern		Combination of `Tone marks` for Chinese words. `Hanzi (Chinese characters)` without `Tone marks` are represented by a full-width asterisk (＊).	漢語に対する声点の組合せ。声点がない単字については＊で表す。
仮名注	kana_notes		`Kana glosses` (仮名注) for `Hanzi (Chinese characters)`, including kana-based fanqie.	仮名表記による字音注（仮名反切を含む）
仮名型	kana_pattern		Combination of `Kana glosses` for Chinese words. `Hanzi (Chinese characters)` without `Kana glosses` are represented by a full-width asterisk (＊).	漢語に対する仮名注の組合せ。仮名注がない単字については＊で表す。
反切	fanqie		`Fanqie spellings` (反切) for single `Hanzi (Chinese characters)`.	単字に対する反切注
類音	similar_sound		`Similar sound notes` (類音注) for single `Hanzi (Chinese characters)`.	単字に対する類音注
音注型	`annotation_format`		Pattern of combined phonetic information (e.g., `Kana glosses`, `Fanqie spellings`, `Similar sound notes`, `Tone marks`).	仮名注、反切、類音、声点などの複数の音注が組み合わさった形式のパターン。
節博士	fushi_hakase		`Fushi-hakase notations` (melodic or intonational markings) attached to musical materials such as Shōmyō (Buddhist chant).	声明等音楽資料に付される博士譜など
その他	other_phonetic_annotations		Other types of `Phonetic Glosses`.	その他の音注
出現位置	material_location		Location of single `Hanzi (Chinese characters)` and Chinese words within the material.	資料内の単字・漢語の所在
備考	remarks_pronunciation		Matters to be noted regarding these phonetic elements.	注記すべき事柄

The material_location is indicated in the format: K + Volume (2 digits) + Kazama Edition Page (3 digits) + Line (1 digit) + Segment (1 digit). For example, K0201474 indicates an appearance in Volume 2, Page 14, Line 7, Segment 4.

Currently, this is under consideration in the case study “Linkage with DHSJR,” which should also be consulted.