Eng2Jpn - 97-002 ---------------- WHAT IS IT? ----------- Eng2Jpn is a conversion/modification of Professor Jim Breen's EDICT_S dictionary file for Scott Powell's "Dictionary" program which is available for the US Robotics Palm Pilot PDA. The original EDICT_S and master EDICT file are available from the main web site: ftp://ftp.cc.monash.edu.au/pub/nihongo/ or any of the mirror sites: ftp://ftp.cdrom.com/pub/japanese/monash/ US - California ftp://kuso.shef.ac.uk/pub/japanese/monash/ UK ftp://enterprise.ic.gc.ca/pub/nihongo/ Canada ftp://ftp.sedl.org/pub/mirrors/nihongo/ US(Texas) ftp://ftp.uwtc.washington.edu/pub/Japanese/Monash/ US(Washington) ftp://ftp.xmission.com/pub/users/s/snowhare/nihongo/monash/ US(Utah) ftp://ftp.u-aizu.ac.jp/pub/SciEng/nihongo/ftp.cc.monash.edu.au/Japan ftp://ftp.funet.fi/pub/culture/japan/mirrors/monash/ Finland ftp://ftp.uni-duisburg.de Scott Powell's "Dictionary" software is available as time-limited shareware from: http://www.kagi.com/scottpowell/ **The dictionary converter program is also available from this address. THE FILES --------- Eng2Jpn.pdb - The actual dictionary file for the Pilot Eng2Jpn.vox - The original unconverted dictionary in text format Eng2Jpn.txt - This readme file WHY? ---- Well, the original E2J dictionary which is distributed with the "Dictionary" program for the pilot has a number of problems. First-off, is the size. At around 1,600 words, it's a pretty small dictionary, and doesn't cover a lot of what might be needed for those studying Japanese, or wanting to look up words which might be a little more obscure. Secondly, the dictionary uses a rather odd format for some of the romanizations of the Japanese language. For example, in the E2J dictionary, FATHER is translated as "titi". A better translation is actually "chichi". SOURCE ------ The EDICT master dictionary file was/is compiled by Professor Jim Breen of the Monash University in Australia. Currently at version 97-004, it weights in at around 61,000 words, it's one of the largest, and most-complete English/Japanese dictionaries available, and is highly recommended for people interested in the Japanese language. Unfortunately, at 61,000 words, and approximately 2.7mb, it's a little large to fit into the Pilot. For this reason, I used the EDICT_S dictionary which lands at around 10,600 words. The original EDICT file is actually a combination of high-code Japanese characters (Kanji/Katakana/Hiragana), which is encoded in one of the standard 2-byte character sets such as S-JIS, etc. The EDICT_S file which this Pilot dictionary has been created from has been romanized to remove the high-characters which aren't compatible with the "Dictionary" software. Here's the original documentation file from the EDICT_S dictionary. EDICT_S ------- The EDICT_S file consists of a selection of entries from the V97-002 edition of the EDICT file. These entries have had all fields containing kanji removed, and all the readings have been converted to Hepburn romaji. Note that the romaji is in fact "wa-puro-" romaji, i.e. it differs slightly from the usual Hepburn form as follows: See the "edict_r.doc" file for further information. The selection of entries in the EDICT_S file has been made simply by only including those entries found in the smaller "JDDICT" file. jwb@dgs.monash.edu.au April/May 1997 Some further modifications were necessary in order to fit the dictionary into a reasonable amount of space in the Pilot, and to keep it from blowing up the conversion program. These changes were: -Sorted alphabetically -Truncated to FIRST meaning of a given word where multiple meanings were present -Multiples were erased (where a word appears multiple times in Japanese, with different english translations) -Japanese words longer than 9 characters in length were removed. -And some additional (bracketed) information on words was removed. These modifications had the net effect of trimming the dictionary's size from 10,600 words (362k), down to a more manageable 8,100 words (145k) - allowing it to fit into the Pilot, and still be compatible with the "Dictionary" software. Version Notes: 1.0 - Initial release --- Dictionary at 7,417 words 1.1 - Update release --- With a new (more robust) conversion program from the author of the "Dictionary" software, I was able to expand the dictionary slightly, cutting words larger than 9 letters in length. It looks like the "Dictionary" program probly maxes out the the index entries at around 16,384 entries. My release uses 16,291. Cutting it rather close ;) Dictionary now at 8116 words (9.5% bigger) Also included is the output "longwords" file, listing what didn't make it into the file due to word-length. David S. Griffiths dgriff@direct.ca