http://input.cpatch.org/code/big5.zip -> big5-1.txt BIG-5 DATA ========== (C) Copyright 1993, Guo Jin (guojin@iss.nus.sg) ---------------------------------------------------- (1) BIG-5 double bytes coding ----------------------------- First byte: 0xA1 - 0xF9 symbols: 0xA1 - 0xA3 ( 3 sectors) common hanzi: 0xA4 - 0xC6 (35 sectors) undefined: 0xC7 - 0xC8 ( 2 sectors) rare hanzi: 0xC9 - 0xF9 (49 sectors) total defined: 3+35+2 (87 sectors) Second byte: 0x40 - 0xFE part one: 0x40 - 0x7E (63 codes) undefined: 0x7F - 0xA0 (34 codes) part two: 0xA1 - 0xFE (94 codes) total defined: 63+94 (157 codes) coding space: 87 x 157 = 13659 codes (2) BIG-5 three sections ------------------------ symbols: full: 3x157=471 codes undefined: 0xA3C0 - 0xFE (47 codes) total defined: 471-47 (424 codes) common hanzi: full: 35x157=5495 codes undefined: 0xC6A1 - 0xC6FE (94 codes) total defined: 5495-94 (5401 codes) rare hanzi: full: 49x157=7693 codes undefined: 0xD6 - 0xFE (41 codes) total defined: 7693-41 (7652 codes) defined char: pure hanzi: 5401+7652=13053 all: 424+13053=13477 (3) original BIG-5 coding scheme -------------------------------- warning: the coding scheme is quite different from its implementation listed above. o code: double bytes o each 7-bit byte: 0x21-0x7E to avoid ASCII control codes (0x00-0x1F), white space (0x20), and delete (0x7F) o plane: Byte1-bit8 Byte2-bit8 no. plane 0 0 0 ASCII 0 1 1 1 0 2 first 1 1 3 second o symbols: (first plane) * section 0x21 - 0x41 (code 0x2121 - 0x417E) with (0x41-0x21)x94=3102 codes, but only 651 defined. * since 651 < 7x94=658, only 7 sectors (0x21 - 0x27) really defined. * for the last sector really defined, there are 7=658-651 codes undefined. o control codes:(first plane) * section 0x42 - 0x43 (code 0x4221 - 0x437E) with (0x43-0x42)x94=188 codes, but only 33 defined. * since 33 < 1x94=94, only 1 sector (0x42) really defined. * 61=91-33 undefined for the sector 0x42. o first level hanzi:(first plane) * section 0x44-0x7D (code 0x4421 - 0x7D4B) with (0x7D-0x44)x94=58x94=5452 codes, but only 5401 defined. * 51=5452-5401 undefined for the last secotr. o second level hanzi: (11: seconde plane) * section 0x21 - 0x72 (code 0x2121 - 0x7244) with (0x72-0x21)x94=82x94=7708 codes, but only 7652 defined. * 49=7708-7652 undefined. o derived data: pure hanzi: 5401+7652=13053 symbol+pure hanzi: 651+13053=13704 all sectors defined: 7+58+82=147 codes undefined in the defined sectors: 147x94-13704=107 (4) references -------------- Huang, Jack K. T. and Huang, Timothy D., Republic of China's Big Fives. in An Introduction to Chinese, Japanese and Korean Computing, Chapter 5 Section 6. World Scientific, 1989. (5) tools for BIG-5 code table ------------------------------ o to display: use any BIG-5 Chinese environment. (for example, KC for DOS, ftp-able form NCTUCCCA.edu.tw [140.111.3.21] under /Chinese/DOS/Chinese-Systems/KC.) o to print: use any BIG-5 print program. (for example, hz2ps for UNIX, cnprint for DOS, UNIX and VMS, both ftp-able from ifcss.org [129.107.1.155]) (6) C source code ----------------- /********************************************************************* * * big5-table.c * ============ * * make: gcc big5-table.c -o big5-table * usage:big5-table >! big5.dat * * author: Guo Jin (guojin@iss.nus.sg) * date: Oct. 11, 1993 * * function: to construct a BIG-5 coding table for easy reference. * * algorithm: simply list all valid BIG-5 codes (including a few undefined, * see also BIG-5 coding scheme above) in a section by section format. *********************************************************************/ #define CHAR_PER_LINE 16 main() { unsigned char sec, off; printf("%s", head); for(sec=0xA1; sec<=0xF9; sec++) { for(off=0x40; off<=0x7E; off++) { if (!((off-0x40)%CHAR_PER_LINE)) { printf("%2x%2x\t", sec, off); } printf("%c%c ", sec, off); if (!((off-0x40+1)%CHAR_PER_LINE)) { printf("\r\n"); } } printf("\r\n"); for(off=0xA0; off<=0xFE; off++) { if (!((off-0xA0)%CHAR_PER_LINE)) { printf("%2x%2x\t", sec, off); } if (off==0xA0) printf("%c%c ", 0xA1, 0x40); else printf("%c%c ", sec, off); if (!((off-0xA0+1)%CHAR_PER_LINE)) { printf("\r\n"); } } printf("\r\n"); printf("\r\n"); } }