Multilingual Database of Proper Nouns
We maintain the world's largest databases of CJK proper nouns, with over ten million entries, used by some of the world's major IT companies for a wide variety of applications such as named entity recognition (NER), machine translation (MT), information retrieval (IR) and input method editors. This edition, the Multilingual Database of Proper Nouns (CJKE-DPN), currently contains about 150,000 entries (including variants), covering the most common CJK and Western personal names and surnames, brings together five languages -- Simplified Chinese (SC), Traditional Chinese (TC), Japanese, Korean and English, in a multidirectional format, and has been expanded to include Arabic (see Table 2) and Spanish (not shown here).
The database includes various data fields (many of which are not shown in the sample), such as readings in pinyin, zhuyin fuhao (註音符號), hiragana and several romanization systems, semantic classification codes and frequency rankings, locale codes, and other useful information such as frequency statistics, only some of which are shown here.
Editorial Policy
It is important to note that the TC names are not merely a code-conversion equivalent of the SC names, but are accurate on both the orthographic and the lexemic levels (similar to American 'color' vs. British 'colour' as opposed to American 'gas' vs. British 'petrol'). For example, New Zealand in SC is 新西兰 Xīnxīlán but in TC it is 紐西蘭 Niǔxīlán (click here for details).
This database is constantly kept up-to-date, and includes such recent changes and additions to proper names as the late 2005 change of the Chinese for Seoul (서울) from 汉城 (hànchéng) to 首尔 (shǒuěr).
A unique feature of this database is that we distingsuish between SC and TC readings. Thus the pinyin for SC 期荣 is qīróng, but for the TC 期榮 it is qírōng. For details, see Taiwan and PRC Pinyin differences.
Data Fields
|
CJKE Multilingual Database of Place Names
I | TYPE | English | Japanese | SC | TC | Korean | LO | YOMI | SC_PIN | TC_PIN | MOE |
---|---|---|---|---|---|---|---|---|---|---|---|
N002657 | P | Aruba | アルーバ | 阿鲁巴 | 阿盧巴 | 아루바섬 | L | あるーば | a1-lu3-ba1 | a1-lu2-ba1 | a-ru-pa-so~m |
N001635 | P | Azerbaijan | アゼルバイジャン | 阿塞拜疆 | 亞塞拜然 | 아제르바이잔 | L | あぜるばいじゃん | a1-sai1-bai4-jiang1 | ya4-se4-bai4-ran2 | a-che-ru~-pa-i-chan |
N081006 | P | Brasilia | ブラジリア | 巴西利亚 | 巴西利亞 | 브라질리아 | O | ぶらじりあ | ba1-xi1-li4-ya4 | ba1-xi1-li4-ya4 | pu~-ra-chil-ri-a |
N016658 | P | Caracas | カラカス | 加拉加斯 | 卡拉卡斯 | 카라카스 | L | からかす | jia1-la1-jia1-si1 | ka3-la1-ka3-si1 | k'a-ra-k'a-su~ |
N014214 | P | Cairo | カイロ | 开罗 | 開羅 | 카이로 | O | かいろ | kai1-luo2 | kai1-luo2 | k'a-i-ro |
N017653 | P | Canton | 広東 | 广东 | 廣東 | 광둥 | O | かんとん | guang3-dong1 | guang3-dong1 | kwang-tung |
N058842 | SP | Chad | チャド | 乍得 | 查德 | 차드 | L | ちゃど | zha4-de2 | cha2-de2 | ch'a-tu~ |
N047517 | GPu | Georgia | ジョージア | 乔治亚 | 喬治亞 | 조지아 | O | じょーじあ | qiao2-zhi4-ya4 | qiao2-zhi4-ya4 | cho-chi-a |
N023778 | P | Guinea | ギニア | 几内亚 | 幾內亞 | 기니 | O | ぎにあ | ji3-nei4-ya4 | ji3-nei4-ya4 | ki-ni |
N078960 | SP | Fukuoka | 福岡 | 福冈 | 福岡 | 후쿠오카 | O | ふくおか | fu2-gang1 | fu2-gang1 | hu-k'u-o-k'a |
N000617 | P | Ireland | アイルランド | 爱尔兰 | 愛爾蘭 | 아일랜드 | O | あいるらんど | ai4-er3-lan2 | ai4-er3-lan2 | a-il-raen-tu~ |
N068134 | P | New Zealand | ニュージーランド | 新西兰 | 紐西蘭 | 뉴질랜드 | L | にゅーじーらんど | xin1-xi1-lan2 | niu3-xi1-lan2 | nyu-chil-raen-tu~ |
N36301 | P | Seoul | ソウル | 首尔 | 首爾 | 서울 | O | そうる | shou3-er3 | shou3-er3 | so~-ul |
N054474 | P | Seoul | ソウル | 汉城 | 漢城 | 서울 | O | そうる | han4-cheng2 | han4-cheng2 | so~-ul |
N062125 | P | Tel Aviv | テルアビブ | 特拉维夫 | 特拉維夫 | 텔아비브 | O | てるあびぶ | te4-la1-wei2-fu1 | te4-la1-wei2-fu1 | t'el-a-pi-pu~ |
N004005 | P | Yemen | イエメン | 也门 | 葉門 | 예멘 | L | いえめん | ye3-men2 | ye4-men2 | ye-men |
N100468 | P | Weishan | 微山 | 微山 | 微山 | 웨이산 | O | びざん | wei1-shan1 | wei2-shan1 | we-i-san |
N080687 | P | Wuhan | 武漢 | 武汉 | 武漢 | 우한 | O | ぶかん | wu3-han4 | wu3-han4 | u-han |
CJKA Multilingual Database of Place Names
English | Japanese | SC | LO | TC | Korean | Arabic |
---|---|---|---|---|---|---|
Aruba | アルーバ | 阿鲁巴 | L | 阿盧巴 | 아루바섬 | أروبا |
Brasilia | ブラジリア | 巴西利亚 | O | 巴西利亞 | 브라질리아 | برازيليا |
Caracas | カラカス | 加拉加斯 | L | 卡拉卡斯 | 카라카스 | كراكاس |
Cairo | カイロ | 开罗 | O | 開羅 | 카이로 | القاهرة |
Chad | チャド | 乍得 | L | 查德 | 차드 | تشاد |
Georgia | ジョージア | 乔治亚 | O | 喬治亞 | 조지아 | جورجيا |
Ireland | アイルランド | 爱尔兰 | O | 愛爾蘭 | 아일랜드 | آيرلندا |
Seoul | ソウル | 首尔 | O | 首爾 | 서울 | سيول |
Seoul | ソウル | 汉城 | O | 漢城 | 서울 | سيول |
Tel Aviv | テルアビブ | 特拉维夫 | O | 特拉維夫 | 텔아비브 | تل أبيب |
Yemen | イエメン | 也门 | L | 葉門 | 예멘 | اليمن |
CJKE Multilingual Database of Personal Names
ID | TYPE | English | Japanese | SC | TC | Korean | LO | YOMI | SC_PIN | TC_PIN | MOE |
---|---|---|---|---|---|---|---|---|---|---|---|
N000034 | S | Abba | アッバ | 阿巴 | 亞伯 | 아바 | L | あっば | a1-ba1 | ya4-bo2 | a-pa |
N000035 | S | Abbas | アッバース | 阿巴斯 | 阿巴斯 | 아바스 | O | あっばーす | a1-ba1-si1 | a1-ba1-si1 | a-pa-su~ |
N002982 | G | Alberto | アルベルト | 阿尔韦托 | 阿爾韋托 | 알베르토 | O | あるべると | a1-er3-wei2-tuo1 | a1-er3-wei2-tuo1 | al-pe-ru~-t'o |
N0386171 | G | Qirong | 期栄 | 期荣 | 期榮 | 치룽 | O | きえい | qi1-rong2 | qi2-rong2 | ch'i-rung |
N000871 | F | Akiko | 暁子 | 晓子 | 曉子 | 아키코 | O | あきこ | xiao3-zi3 | xiao3-zi3 | a-k'i-k'o |
N000872 | F | Akiko | 顕子 | 显子 | 顯子 | 아키코 | O | あきこ | xian3-zi3 | xian3-zi3 | a-k'i-k'o |
N000873 | F | Akiko | 昭子 | 昭子 | 昭子 | 아키코 | O | あきこ | zhao1-zi3 | zhao1-zi3 | a-k'i-k'o |
N001161 | FM | Akira | 明 | 明 | 明 | 아키라 | O | あきら | ming2 | ming2 | a-k'i-ra |
C139707 | G | Deng | 登 | 登 | 登 | 덩 | O | とう | deng1 | deng1 | to~ng |
N000629 | S | Einstein | アインスタイン | 爱因斯坦 | 愛因斯坦 | 아인슈타인 | O | あいんすたいん | ai4-yin1-si1-tan3 | ai4-yin1-si1-tan3 | a-in-syu-t'a-in |
N000134 | G | Ernest | アーネスト | 欧内斯特 | 歐尼斯特 | 어니스트 | L | あーねすと | ou1-nei4-si1-te4 | ou1-ni2-si1-te4 | o~-ni-su~-t'u~ |
N026074 | S | Gregg | グレッグ | 格雷格 | 葛瑞格 | 그레그 | L | ぐれっぐ | ge2-lei2-ge2 | ge3-rui4-ge2 | ku~-re-ku~ |
N026075 | G | Greg | グレッグ | 格雷格 | 葛瑞格 | 그레그 | L | ぐれっぐ | ge2-lei2-ge2 | ge3-rui4-ge2 | ku~-re-ku~ |
N014143 | G | Haiyang | 海洋 | 海洋 | 海洋 | 하이양 | O | かいよう | hai3-yang2 | hai3-yang2 | ha-i-yang |
N014144 | G | Huaiyang | 懐陽 | 怀阳 | 懷陽 | 화이양 | O | かいよう | huai2-yang2 | huai2-yang2 | hwa-i-yang |
N046125 | G | Jack | ジャック | 杰克 | 傑克 | 잭 | O | じゃっく | jie2-ke4 | jie2-ke4 | chaek |
N046119 | G | Jackie | ジャッキー | 杰基 | 傑基 | 재키 | O | じゃっきー | jie2-ji1 | jie2-ji1 | chae-k'i |
N028385 | S | Kennedy | ケネディ | 肯尼迪 | 甘迺迪 | 케네디 | L | けねでぃ | ken3-ni2-di2 | gan1-nai3-di2 | k'e-ne-ti |
N014142 | P | Kaiyang | 開陽 | 开阳 | 開陽 | 카이양 | O | かいよう | kai1-yang2 | kai1-yang2 | k'a-i-yang |
N067417 | SP | Nakajima | 中島 | 中岛 | 中島 | 나카지마 | O | なかじま | zhong1-dao3 | zhong1-dao3 | na-k'a-chi-ma |
N006561 | G | William | ウィリアム | 威廉 | 威廉 | 빌리암 | O | うぃりあむ | wei1-lian2 | wei1-lian2 | pil-ri-am |
C110425 | S | Zhang | 張 | 张 | 張 | 장 | O | ちょう | zhang1 | zhang1 | chang |