Attribution
Last updated: 2026-04-29
Tadorimichi separates its data into two layers: an originally compiled core covering the words and characters Chinese-speaking JLPT learners use most, and a JMdict / KANJIDIC2 fallback that catches long-tail words pasted into the reader. This page lays out exactly what comes from where, and under which license.
Originally compiled by Tadorimichi
The following content is © Mason AI Lab, all rights reserved:
- vocab_entries — 8,047 vocabulary core: Chinese / English / learner notes / examples / collocations / register / domain / 30+ metadata fields per entry, generated by Claude (Anthropic) from the Japanese surface forms and reviewed for quality. Entry ids are our own (tdrm-XXXXX), not inherited from any external source.
- kanji — 2,682 kanji core: Chinese / English meanings, phonetic group analysis, component decomposition, mnemonics, look-alikes, compound word selections, learner notes, all originally generated.
- grammar_points — 723 JLPT N1–N5 grammar points: pattern, usage rules, similar-pattern disambiguation, examples, all written from scratch.
- articles — JLPT-graded reading material: every article body, summary, vocabulary annotation, grammar annotation, and metadata is Tadorimichi original work.
- UI, design, code, learning algorithms, mock exam logic: all owned by Mason AI Lab.
Factual data layer
Some data classifications are objective grammatical or linguistic facts — not protected by copyright in any jurisdiction we operate in — and are compiled into our tables alongside the originally generated content:
- Kanji surface forms (the characters themselves), stroke counts, on/kun readings
- Kangxi radical assignments (a public-domain classical Chinese reference)
- Part-of-speech classifications (verb / adjective / noun, etc.)
- Word frequency rankings derived from public corpora (JPDB, Mainichi)
- JLPT level tags compiled from publicly available exam syllabi
JMdict / KANJIDIC2 fallback
For Japanese surface forms outside our 8,047-word core, the reader's hover lookup falls back to the JMdict (Japanese-Multilingual Dictionary) and KANJIDIC2 community projects. This covers cold-tail words such as proper nouns, archaic forms, technical jargon, and words readers paste in from the wild.
- JMdict — © James William Breen and the Electronic Dictionary Research and Development Group, licensed under CC BY-SA 4.0. We import the dictionary in full and store the long-tail subset (208,286 entries not in our core) in our fallback tables.
- KANJIDIC2 — © EDRDG, also licensed under CC BY-SA 4.0. Used for kanji factual data (strokes, radical, on/kun readings, JLPT tier).
Modifications to the imported data: only the long-tail subset is retained (entries already covered by our compiled core are not duplicated). No other transformations are applied to JMdict / KANJIDIC2 records as stored.
Other community resources
- JLPT level tagging — derived from yomitan-jlpt-vocab (community-maintained, attribution preserved).
- Word frequency — JPDB Mainichi-derived corpus.
- Tokenization — kuromoji.js (Apache 2.0).
License of derived data
Per the share-alike obligation in CC BY-SA 4.0, any redistribution
of the JMdict / KANJIDIC2 long-tail subset stored in our jmdict_fallback* tables is licensed under the same
CC BY-SA 4.0 terms. Our originally compiled vocab_entries, kanji, grammar_points, and articles tables are
not redistributed and remain proprietary.
Contact
Questions about attribution, licensing, or commercial use of our originally compiled content: [email protected].