©2001-2008 The CJK Dictionary Institute, Inc.

Index to This Document
  1. Overview
  2. Derivational affixes
  3. Adjacency attributes
  4. Binding valency
  5. Morphological status

1. Overview

The CJK Dictionary Institute (CJKI), which specializes in CJK computational lexicography, is engaged in the continuous expansion of comprehensive CJK lexical databases. Currently, our databases contain nearly eight million entries, including a variety of grammatical and semantic attributes required for developing information retrieval applications, input method editors, named entity extraction, and electronic dictionaries.

This document describes some of the morphological attributes in our Japanese lexical databases, such as derivational affixes and binding valency, which are particularly useful for disambiguating and identifying Japanese lexemes in such applications as input method editors (IME) and search engine query processing. Information on our rich set of grammatical attributes, such as parts of speech (POS) and conjugation pattern codes, can be found at jappos.htm and our extensive frequency statistics database is described at japfreq.htm

We also maintain a comprehensive database of Simplified Chinese and Traditional Chinese morphological attributes and other data described at chinsam.htm.

2. Derivational Affixes

  1. What is a Derivational Affix?

    A derivational affix (DA) is a word element (bound morpheme) that is prefixed or suffixed to a base or stem to create new words, not merely different forms of the same word. Strictly speaking, especially in traditional morphology, derivational affixes do not have lexical meaning of their own, and only add grammatical meanings such as negation and the formation of new parts of speech. Here, we use the term to include compound-forming word elements that have a substantial lexical meaning of their own, such as 人 jin in アメリカ人 amerikajin 'American'. See "A Brief Inroduction to Japanese Morphology" for a fuller discussion.

  2. The CJKI Database of Japanese Affixes

    Jack Halpern's New Japanese English Character Dictionary (NJECD) provides an in-depth treatment of both on (Chinese-derived) and kun (native Japanese) derivational affixes, as explained in detail on pages 72a to 78a of the front matter to that dictionary. During a period of 16 years, we have made a systematic effort to collect these and provide accurate and comprehensive coverage.

    Currently our database of derivational and inflectional suffixes contains about 2700 entries. Samples of these appear in Section 4 below. To get a fuller understanding, please look up these affixes in NJECD by referring to the entry numbers, where you will find many compounds and see how these affixes actually work in forming compounds.

  3. Benefits of Derivational Affixes

    The most important benefit is that derivational affixes could significantly contribute to the accuracy of identifying countless lexemes that can be freely coined at the whim of the author. That is, they allow an application to algorithmically construct lexemes not present in the lexicon, like 食べ始める tabehajimeru 'start to eat' from 食べる taberu and the derivational suffix -始める-hajimeru 'start to do'.

    It is important to note that even though derivational affixes are often highly productive, ordinary word dictionaries, which focus on free lexemes, normally do not include them because they are bound morphemes.

  4. Examples of Derivational Affixes

    Some Kun Derivational Suffixes
    Affix NJECD
    Entry Number
    Reading Kind of Affix English
    0006 kawa suffix names of rivers
    小- 0007 ko- also prefix little, small
    小- 0007 sa- also prefix little, nice
    水- 0010 mizu- prefix water
    0011 kokoro also suffix heart, mind, spirit, soul; thoughts, ideas
    -切る 0027 -kiru verbal suffix  
    -切り 0027 -giri also suffix way of cutting
    -切れる 0027 -kireru verbal suffix be able to do, be able to finish
    -切れ 0027 -gire also suffix  
    -代わり 0030 -gawari suffix substitute, replacement
    -付ける 0031 -zukeru verbal suffix give, impart, provide with

    Some On Derivational Suffixes
    Affix Entry Number Reading Kind Of Affix English
    0007 shoo- also prefix small, little, minor, tiny
    0007 -shoo suffix names of elementary schools
    0010 -sui also suffix water, liquid, fluid; soda
    0011 -shin also suffix sense, motive
    0011 -shin also suffix heart (the organ)
    0014 kyuu- also prefix former, ex-, old-time, old
    0018 -jun also suffix order, sequence, turn
    0019 -butsu also suffix Buddhist image
    0021 -ka also suffix -ize, -ify
    0026 -hi also suffix ratio
    0026 -hi also prefix specific
    0030 -dai suffix range of a person's age in ten-year periods
    0030 -dai suffix years spanning a specific period
    0030 -dai also suffix charge, fare, fee, price

    Some Derivational Affixes by Rank
    Affix RankPOS Sub-
    000026WS ねん
    000028WS にち
    000037WP えん
    000037WS えん
    000044WS まえ
    000067WP だい
    始める000068WS はじめる
    うち 000086WP うち
    000096WS しゃ
    000107WP いま
    続ける000155WS つづける
    出す 000162WS だす
    000179WS はなし
    上がる000185WS あがる
    000196WS こえ
    000204WS とう
    上げる000239WS あげる
    000247WS ほん
    000248WS れい
    キロ 000250WP きろ

    The table below shows some examples of Japanese inflectional affixes. Inflection is explained in "A Brief Inroduction to Japanese Morphology" and the POS codes are defined on the Japanese Part of Speech Codes page.

    Some Inflectional Affixes by Rank
    AffixRank POS Sub-POS Reading

    3. Adjacency Attributes

    An adjacency attribute is a part of speech (POS) code that indicates the morphological restrictions that apply to adjacent words or word elements when these are actually used in context in the formation of compound words or affixed lexemes. There are three types of adjacency attributes, as shown in the table below:

    Adjacency Attribute Codes
    BEFORE An adjacency attribute that indicates the part of speech (POS) of the lexeme, stem or base preceding a suffix or suffix-like element. For example, "NX" for the compounding suffix 員 means that 員 can be preceded by a common noun or verbal noun, as in 研究員. Given only for suffixes.
    AFTER An adjacency attribute that indicates the part of speech (POS) of the lexeme following a prefix or prefix-like element. For example, "NC" for the adnomial prefix 元 means that 元 can be followed by a common noun, as in 元総理大臣. Given only for prefixes.
    RESULT The part of speech (POS) of the lexeme resulting from affixing a prefix or suffix. For example, "NC" for the adnomial prefix 元 means that prefixing 元 (to a common noun) results in a common noun (元総理大臣). Given only for affixes.

    The table below describes the POS (part of speech) codes used exclusively in the BEFORE and AFTER adjacency attributes. For other part of speech codes, see jappos.htm

    BEFORE and AFTER Adjacency POS Codes
    (table under construction)
    POS SubPOS English Description Japanese Description Notes Example Binding Valency
    NX   Noun class 名詞及び「する」名詞連節 Same as NC and VN.   0
    VC   Continuative 連用形     1
    SV   Verb stem        
    SA   Noun adjective stem        
    SJ   Adjective stem        
    SN   Noun stem        

    The table below shows the adjacency attribute codes for some Japanese derivational affixes. The POS codes are explained in jappos.htm

    Sample of Adjacency Attributes
    Affix Reading POS Sub-
    Valency Rank Before After Result
    がましげ がましげ FS M 1 061089 VC   AN
    がましさ がましさ WS   1 061089 VC   NC
    がらみ がらみ WS   1 061089 NC   NC
    がわり がわり WS   1 061089 NC   VN
    慣れる なれる WS   1 002465 VC   V1
    慣わす ならわす WS   1 061089 VC   V5
    WS   1 061089 NC   NC
    うまれ WS   1 061089 NC NP   NC
    WP   1 061089   NC NC
    せい WS   1 003721 NC   NC
    なま WP   1 010656   NC NC
    生まれ うまれ WS   1 002465 NC NP   NC
    ぎれ WS   1 061089 NC   NC
    切っての きっての WS   1 061089 NC NP   AA
    切る きる WS   1 001494 VC   V5
    切れ ぎれ WS   1 061089 NC   NC
    切れる きれる WS   1 002247 VC   V1
    さき WS   1 000491 VC VN   NC
    先々 せんせん FP P 0 061089   NC NC
    先先 せんせん FP P 0 061089   NC NC
    せん WS   1 061089 NC   NC
    せん WS   1 019135 NX   NC
    せん WS   1 061089 NC   NC
    ぞめ WS   1 061089 NC NP   NC
    染みる じみる WS   1 061089 NC   V1
    染め ぞめ WS   1 061089 NC   VN
    染める しめる WS   1 061089 VC   V1
    せん WS   1 002652 NC   NC
    選び えらび WS   1 041445 NC   VN
    ぜん FP P 0 005835   NC NC
    ぜん FS S 0 005835 NX   NC
    まえ FS S 0 000044 NX   NC
    前々 ぜんぜん FP P 0 061089   NC NC
    前前 ぜんぜん FP P 0 061089   NC NC
    文応 ぶんおう NE   0 061089   NN NC
    文化 ぶんか NE   0 000536   NN NC
    へい WS   1 002150 NX   NC
    だいら WS   1 061089 NP   NP
    ひら WP   1 061089   NC NC
    平成 へいせい NE   0 061089   NN NC
    べつ FS S 0 000331 NC   NC
    べつ WP   1 000331   NC VC NC
    かた WP   1 028538   NC V NC
    へん WS   1 025149 NC   NC
    ぺん WS N 1 061089 NN   NC
    へん WS   1 008970 NC NP   NC
    編み あみ WS   1 061089 NX   NC
    へん NC   0 008112 NC NP   NC
    WS   1 061089 NC   NC
    返す かえす WS   1 002476 VC   V5
    返る かえる WS   1 061089 VC   V5
    便 びん WS   1 003030 NC NN   NC

    4. Binding Valency

    The binding valency code indicates the degree of binding between a stem/lexeme and an affix. It enables an application such as a morphological analyzer or IME system to determine if a given an element is bound or free, aiding in the accurate identification of lexemes not registered in the lexicon.

    Binding Valency Codes
    (table under construction)
    Binding Valency English Description Japanese Description Notes
    0 Free form    
    1 Always bound    
    2 Sometimes bound, sometimes free?    
    D Binding valency default    
    U Binding valency unknown    

    5. Morphological Status

    The morphological status prefix is broad category for classifying Japanese affixes, and is used as the first letter in the POS codes.

    Morphological Status Prefixes
    Morph English Description Japanese Description Binding Valency
    E Japanese phrase 0
    F Non-derivational affix 非派生接辞 1
    H Honorific affix 待遇接辞 1
    N Non-affix (free word) 非接辞 0
    W Derivational or compounding affix 派生接辞 1
    X Generic affix 接辞 1
    M Word element
    (bound morpheme)
    造語成分 1