Robert Wells Pilot Page

Wells

JStroke Screen shotIntroduction

JStroke 1.2 is a minor update made available 26-Sept-1997, which displays the SJIS hex code of recognized characters in addition to the recognition score, and corrects a bug in the JStroke 1.1 update of 24-Sept-1997 that caused the SJIS code to be off by 0x100 sometimes.

This SJIS display feature was requested by a Canon Wordtank owner so he can write the Kanji in JStroke, and then quickly enter the SJIS code into the Canon Wordtank and get to the information available there about the Kanji. This is a stop gap feature until real integrated Japanese-Kanji-English dictionary support is ready on the Pilot. This release also has much better organized source code, as described a bit more at the end of this page, thanks to Owen Taylor's contributed work. I am still intending to start serious work on it again in early October.

JStroke 1.2 is currently more a technology demonstration than a really useful application. On my Pilot 5000 it correctly recognizes over 95% of the characters I draw in less than 3.5 seconds each, often in well under a second. I am "releasing" it now in a limited way, to get reactions from other people who might be interested in collaborating on it, or using it as a development component later. My primary goal is to get other programmers interested in working cooperatively to build Japanese study modules that work together, to increase the overall offering, and to get better software to use myself. This program is free software covered by the Gnu General Public License, and sources are available, as described at the end of this page.

Features

When you start JStroke 1.2, it displays a full screen form with a 50x50 box near the bottom. Draw a Kanji character in the box. Touch the Convert button to recognize the character and convert it to the corresponding Shift-JIS character code. A popup list on the upper left side will show the most likely 5 Kanji characters, followed by their score in the recognition algorithm. The number of seconds it took to recognize the character is displayed in a text field at the top of the screen - generally characters of 10 or 11 strokes will take the longest because there are so many to choose from. Touch the Erase button to erase the Kanji rectangle and recognition fields. The Angles and Replay buttons are for debugging purposes, but won't hurt anything if you touch them. The menu About command gives basic information about JStroke 1.2; the Cut, Copy, and Paste commands relate to the text field, and aren't very useful currently.

It is important that you follow the normal conventions for stroke count, stroke order, and stroke direction, to get successful recognition. Here are some of the more important conventions, abridged from Kanji & Kana: A Handbook and Dictionary of the Japanese Writing System, by Wolfgang Hadamitzky and Mark Spahn, Tuttle, 1981:

  1. Horizontal strokes are written left to right.

  2. Vertical or slanting strokes are written top to bottom.

  3. A stroke may change direction several times - right and down, and down and right are common cases.

  4. Stroke order is from top to bottom, then left to right, middle part before short flanking side-strokes, horizontal stroke before intersecting vertical stroke. When slanting strokes intersect, upper right to lower left is written first. A piercing vertical stroke is written last; a vertical stroke that does not protrude above or below is written between the upper and lower parts. A piercing horizontal stroke is written last. First the vertical stroke, then a short horizontal stroke adjoining it to the right. An enclosure is written first.

What's Next?

Here is what I currently intend to do with JStroke over the next few months:

  1. Reduce the form to FEP proportions, probably full width but only 50-56 pixels high; reduce box from 50x50 to 48x48 or a bit smaller, have preference for left or right handed use, and use rest of space on left or right for popup list(s) and text field(s).

  2. Build variant for entering sample characters, and storing them into side database, to build set of regression tests eventually covering all characters to be recognized. Intend to use the Hadamitzky & Spahn books as authoritative on stroke order for Jouyou Kanji. Intend to enter the characters during subway rides. Leads later to automated regression test program for Pilot that runs through sample characters and records accuracy, differentiation, and speed of recognition in memo. Allows me to make experimental changes in the algorithms for better performance without fear that I'll break some recognition without realizing it.

  3. Further performance optimization. I believe I can reduce the worst case performance to less than 1 second, and that should be acceptable for most usage. I have already reduced the worst case from 19 seconds to 3.5 seconds.

  4. Develop recognition for Kana.

  5. Further x2 reduction in size of processed StrokeDic database, down to 13-16kb for 2000 characters.

  6. Improve GUI, making it smoother and faster to enter characters in rapid succession. Experiment with controllable timeout to differentiate the pause at the end of a character from the gaps between strokes.

  7. Get back to the Pilot dictionary project I originally intended to work on...

  8. As of 24-Sept-1997, this is still a pretty good list, but my current development plan is: upgrade to 0.5.0 GCC toolset; move to a new smaller database format generated directly from StrokeDic using an available Java package for Pilot database creation; speedup algorithm to take advantage of features of new data format; then add Kana, and notion that user can indicate type of character to influence new extra score operators; then on shared library packaging; then releasing, and shifting to dictionary focus for a while.

  9. Another quick 24-Sept-1997 note, based on email design questions from several different people: I intend to get the third party 3MB upgrade board sometime in October, in preparation for serious dictionary work. A friend of mine (Hi, Chan!) bought it back in August for around $270, and found it very easy to install and upgrade in his PalmPilot. I have some ideas about compressing and hyperlinking the dictionary, and also subsetting it to focus on the needs of less advanced students like myself. I intend to do it in such a way that different people can create different subsets for their own purposes, using the Java dictionary building tools on a PC or other development system. I believe a really awesome Japanese study dictionary tool can be fit in around 2MB of data space, leaving plenty of room for code and other applications. I also intend to move up to the 16x16 font, for easier reading in the dictionary. My original starting point for this project was the idea of building a dictionary that manifested all the different verb forms as separate entries that could be recognized in a search, so I could quit carrying around my verb dictionary. Note that I don't have to store all these variants in memory, I can generate them on the fly if the algorithms are clever enough. I'm getting most of the verb forms pretty solidly now so its less of an issue for me and Ann, but others must transition through the same need. Some of the verb forms look pretty different, particularly when your grasp of forms like conditional is pretty shaky anyway... A verb dictionary could fit nicely in a 1MB Pilot, and I may return to that subset idea again later on, or maybe someone else will pursue it later.

  10. Recent email exchanged with Todd Rudick suggests we may want to work sooner on a web page to allow many people to contribute handwriting samples so we can quickly build up a regression test suite for the recognition scoring engine. I will be thinking about this over the next few days... -rwells, 26-Sept-1997.

Download Instructions

Download the following ZIP file:

jstroke.zip contains two files, JStroke.prc (14kb) and JStroke.pdb (28kb). Both these files need to be downloaded to your Pilot using the Pilot Desktop "pilot\instapp.exe" program and a subsequent HotSync.

J-OS 1.9 by Yamada Tatsushi ("Hacker Dude-san") is a prerequisite for JStroke. I depend on it to display the recognized Kanji characters. See Yamada Tatsushi's Home Page for more information about his Pilot software in general - he does great stuff. Here are links to download the relevant ZIP files from his site in Japan; I believe they are also available at various Pilot application download sites:

After downloading these files to your PC, you will need to unzip the contents, and install and download to your Pilot the following files:

After downloading these four files to your Pilot, you will need to run HackMaster on your Pilot, and check the box next to J-OS 1.9, to enable the hack.

Note for Macintosh users: StuffIt Expander, ZipIt, or un-zip-512 can be used to process downloaded zip files on the Macintosh. Using un-zip-512 I have verified that the Pilot Desktop will correctly download the resulting .prc and .pdb files, even though they won't have the expected Macintosh file type or desktop icon. There used to be a "Cross Platform Page" that gave more information about Macintosh unzip alternatives, at http://www.mcad.edu/guests/ericb/xplat.comp.html, but it doesn't seem to exist anymore.

Why am I doing this?

I got a U.S. Robotics Pilot 5000 for Christmas at the end of 1996, and fell in love with it. I depend on it for scheduling, telephone numbers, and memos. The fits-in-your-pocket size, the long battery life, the easy HotSync with the PC or Mac, the direct handwriting and pen input, and the open software platform make for a brilliant set of design choices.

I have been studying Japanese for almost 3 years, and often bring several Japanese dictionaries along as I ride the subway to downtown Boston, so I can study on the train. In May 1997 I bought a Canon Wordtank 9500, a top of the line Japanese-English electronic dictionary, in hopes that I could just bring it, and leave the heavy books at home. It is an impressive device, but has a rather clunky interface (no stylus, heavy use of tiny keyboard), and while it tries to support American users, it still must be easier to use for a fluent Japanese speaker. I found myself bringing all the books AND the Wordtank...

I became convinced that I could do better developing Japanese study programs for my Pilot myself. I spent a week in June researching Japanese-related programs available already for the Pilot, Japanese software resources available over the internet, and different toolsets for developing Pilot applications. I hope to later add a page here describing what I found for development. For the moment suffice it to say that I settled after some experimentation on developing in C using the GCC toolset on my Windows95 laptop, doing quick turn-around testing using the CoPilot emulator, and using trace output into Pilot memos for debugging. I decided to use Jim Breen's repository of Japanese processing data and software as a starting point, and to contribute to it when I've got something worth contributing. I started using Yamada Tatsushi's JSuite Japanese OS for Japanese entry and display on my Pilot.

Jim Breen in Australia suggested I contact Todd David Rudick in Arizona, who has written JavaDict, a Java program that provides English-Japanese-Kanji dictionary lookup, using Jim Breen's EDICT and KANJIDIC datafiles. Todd has also developed a Kanji handwriting recognition algorithm and entered the "StrokeDic" data for the one thousand nine hundred and forty five (1945) Jouyou Kanji characters. Todd has put his software under the Free Software Licence. When I downloaded JavaDict and tried it out, it seemed pretty accurate and fast. I decided to start by doing Japanese handwriting recognition on the Pilot, using JavaDict as my starting point.

Credits

I would like to thank various people:

  1. Todd David Rudick, for JavaDict. You can download his Java application class files and his Java source code at this web site. His Kanji handwriting data and algorithms are very impressive, a wonderful piece of work. I process his StrokeDic file to produce the JStroke.pdb file, compressing it x2 in the process. I read his Java code and wrote C code to implement his algorithms. I hope to share data enhancements and algorithm improvements with him. I currently generate my .pdb file in a really awkward way, building it by downloading variants of the JStroke code that have parts of the data embedded in them and assembling it on the Pilot, then HotSync'ing it back to the PC backup area for later downloads to the Pilot. I intend to retool with a Java program that will generate the .pdb file from the StrokeDic and KanjiDic files, and freely distribute that program. Todd David Rudick's JavaDict is covered by the GNU General Public License, which you can read for more info on distribution rights.

  2. Yamada Tatsushi ("Hacker Dude-san"), for J-OS 1.9. This is a clever hack that enables Pilot applications in general for Japanese character input and display, albeit in a really tiny font. See Yamada Tatsushi's Home Page for more information. I hope to explore integrating handwriting input into his Japanese FEP with him.

  3. Jim Breen of Monash University in Australia, for his Monash Nihongo ftp Archive. The files of the EDICT/KANJIDIC Japanese/English electronic dictionary project form the starting point for most interesting work in this area, and the site also serves as a point of contact and exchange for continuing work. I used the KANJIDIC data file to do Unicode to Shift-JIS conversion, and Hamilton and Nelson code lookup while processing Todd Rudick's StrokeDic data file. I intend to use EDICT and KANJIDIC in a subsetted form in doing my dictionary software for the Pilot. Jim Breen's EDICT/KANJIDIC are covered by their own licenses; read edict.doc and kanjidic.doc for more info on distribution rights.

  4. Owen Taylor (otaylor@gtk.org), for his work on the JStroke 1.0 sources, to make them more modular so he could use the recognition scoring code in his Linux based KanjiPad application. He very quickly sent me a handoff back, and I've merged it into my sources as the primary new value in JStroke 1.1. See Owen's home page at http://www.gtk.org/~otaylor   for more information about KanjiPad and his other interests.

Copyrights, Licensing, and Source Code

// JStroke 1.x - Japanese Kanji handwriting recognition technology demo.
// Copyright (C) 1997  Robert E. Wells
// http://wellscs.com/pilot  
// mailto:robert@wellscs.com 
// 
// This program is free software; you can redistribute it and/or
// modify it under the terms of the GNU General Public License
// as published by the Free Software Foundation; either version 2
// of the License, or (at your option) any later version.
// 
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
// the GNU General Public License, at
// http://www.gnu.org/copyleft/gpl.html, for more details.
// 
// You should have received a copy of the GNU General Public License
// along with this program (gpl.html); if not, write to the Free Software
// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

To get the source code for JStroke 1.2, download the following ZIP file:

After unzipping, study the readme.txt file. You will need GCC for the Pilot and PilRC in order to compile the source code; readme.txt includes links to where the necessary development tools can be downloaded. I am intending to rework the source code extensively in my next major release, which I hope will be finished by the end of October, 1997, or in November ... I'll do periodic releases as it reaches stable points during this upcoming development period. Please contact me if you download a copy of the source code, so I can let you know when new releases are available, and coordinate changes. I welcome collaborators and would like to receive any improvements you make so I can post them here for the benefit of others.

Contact robert@wellscs.com | Updated November 27, 1998.
Wells