The MandarWeb Website: The Rounding Out page





Section 16. Rounding Out
This section has a bit about how the site was constructed, a few notes not in other sections, and such. You don't need to look at it unless you are already familiar with much of the site.

The Database
The "ZCodelBase.pdf" and "ZMandarinBase.pdf" files available for download were both extracted from a database file which I maintain using the Macintosh app "Bento".

This database has 5 fields, "Pinyin", "Flag", "Character", "Codel", and "English". Here is how a few records appear.



The Pinyin is as normal for this site. The "Flag" is a count of the number of Codels in the decoded character, and only appears for a single character. "Character" holds the Chinese character or characters. The fourth column contains the Codels for a decoded character, only present for a single character. The fifth column holds the English for the Pinyin text.

The English column may include some abbreviations, both normal English ones and some specific for MandarWeb. In the latter, (SN) meaning Surname, and (MW) meaning Measure Word are the most frequent. (PL) means a place name.

Currently the database contains over 7500 words or expressions, and just over 2300 characters with their decodings. These are not all unique, as the same character may appear with different tones.

For those who are familiar with databases (and perhaps spreadsheets), and who would like to manipulate the database themselves, an Excel spreadsheet version can be downloaded from:

http://www.aoi.com.au/Data/FullMandarin.xls

Anyone who would like to upgrade or maintain this database, or use it elsewhere, is welcome. It is possible that in future years I might replace "FullMandarin.xls" with an expanded or corrected version.

Characters you can't find with MandarWeb
Chinese written texts extend back over centuries, and some of the older ones contain characters which are now little used. In fact, there are over 80,000 characters, but some will only be known to scholars.

According to one site, the 2500 most common characters cover about 98% of current Chinese usage. At 3500 characters, the coverage is about 99.5%. 7000 characters are considered "in general use".

Suppose you have encountered a Chinese character which you cannot decode with any of the tools in Mandarweb. This is particularly possible with place names, some of which have remained the same over long ages. Personal and place names may have less usual characters because of a desire to be distinctive -- in the West, John Smith may change his name to Jonny Smythe to stand out from the crowd.

If you have the MacKEY5 or KEY5 package, there is way to find less usual characters. Find a character which is similar but simpler, write it in a MacKEY5 write-in file, and check its radical with the tool-tip feature. Then go to the list of characters with this radical, estimate the additional strokes needed to make the sought-after one, and check through the character list to locate it.

Of course, you can do the same thing with an older printed Chinese dictionary based on radicals. But less usual characters are likely to be complex ones including many strokes, and this makes such a search difficult.




Return to the MandarWeb Home Page




(previous version 1.04, on Web 2008 Oct 22, see http://www.aoi.com.au/mandarin/ ).
Version 2.01, 2015 Mar 26.