China's 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo)

汉字|Adobe Blogs|Dr. Ken Lunde 2014-03-28 14:28:20

As the title makes blatantly obvious, today we will cover a topic about China (中华人民共和国 zhōnghuá rénmín gònghéguó).

China recently standardized a set of 8,105 ideographs called 通用规范汉字表 (Tōngyòng Guīfàn Hànzìbiǎo), which was published during the later half of 2013 (its cover image is shown above, and its ISBN is 978-7-01-010281-8). These 8,105 ideographs, called hanzi (汉字 hànzì) in Chinese, are separated into three levels that consist of 3,500, 3,000, and 1,605 ideographs, respectively.

Perhaps of more importance for the readership of this blog, all of its ideographs, except for a mere three, are either in Unicode or will be in short order. The bulk of them, 7,829 to be exact, are in the URO. This is expected. Of the remaining ideographs, 77 are in Extension A, 36 are in Extension B, 44 are in Extension C, eight are in Extension D, and 108 are in Extension E. The following three are not yet in Unicode: 6774, 7146, and 7373. China first submitted them to IRG #41 as a UNC (Urgently Needed Character) submission (IRG N1967), which was approved then submitted to WG2 #62 (WG2 N4508—a ZIP file). A revised version was submitted for IRG #42 (IRG N1988), which suggests that these three ideographs will be included in Extension F, but if it is not ready to be submitted to WG2 after IRG #42, they will simply be appended to the URO, perhaps at U+9FCD through U+9FCF.

Below is a table that summarizes the distribution of these 8,105 ideographs in Unicode:

UROExtension AExtension BExtension CExtension DExtension EOther
7,82977364481083

I prepared a mapping table that provides Unicode code points and 通用规范汉字表 index numbers for these 8,105 characters, arranged by CJK Unified Ideograph blocks.

I hereby predict that the 199 ideographs that are beyond the URO and Extension A, meaning those that are beyond the current scope of GB 18030 in terms of compliance, will be added to the mandatory portion of that standard at some point in the near future. Font developers who care about this region should seriously consider supporting these 199 additional ideographs in their products as a way to preempt a future GB 18030 requirement to do so.