JAT  
Search JAT Search tips
Updated 2000-09-01
The jeKai Dictionary Project
by Tom Gally

JAT Bulletin 184-185, July-August 2000, June 17 JAT Meeting Report

JAT was kind enough to invite me to speak on the jeKai dictionary project at JAT's monthly meeting on Saturday, June 17.

The jeKai project to prepare a free, open, online Japanese-English dictionary began soon after this year's IJET in Kyoto. While still in its infancy, the project is moving along steadily. You can see the project's current status and find out how to participate at
http://www.jekai.org/

I didn't count the audience, but the room was nearly full; I would guess that there were maybe thirty people there. Most seemed to know about jeKai in advance, though I suspect that current jeKai contributors and members of the jeKai mailing list were in the minority. Perhaps half the attendees were Japanese and half were foreign. As is usual at JAT meetings, the Japanese spoke mostly in Japanese and the foreigners in English.

After brief self-introductions by everyone present, I spoke about the origin and goals of jeKai for about thirty minutes. I used a printout I had prepared in advance listing the features of jeKai as they appear in the description on the Web site:  

* Definitions that explain the meaning of words as completely as possible
* As many examples as possible of each word in real contexts
* Photographs and other illustrations, especially for entries about uniquely Japanese things
* No restrictions on the type or range of vocabulary
* No restrictions on the length of entries  

I had also printed out three jeKai entries, for 赤提灯, 彼岸花, and 地球儀, so that people could see the sorts of entries that have been produced so far. These entries, and all of the others, can be seen at the Web site.

The rest of the time was devoted to questions and discussion. I can't summarize everything (the entire meeting lasted for about two hours), so I'll just mention a few topics that hadn't been discussed in depth on that jeKai mailing list yet: corpora, editorial control, and ensuring a steady flow of new entries.

Traditionally, the best dictionaries have been produced by assembling vast numbers of citations from written works and using those citations to make judgments about the meanings, usages, and histories of words. The most famous example of a dictionary produced in this way in the Oxford English Dictionary, and a recent best-seller, "The Professor and the Madman," told one part of that story in an entertaining way. Until recently, the citation gathering had to be done by hand, but now computers have made it possible to gather and search vast collections of texts--called "corpora" (singular: "corpus")--much faster and more efficiently. As one example, I mentioned the COBUILD corpus and dictionary project in the U.K., which has yielded an excellent dictionary for learners of English. Other recent English dictionaries, including the New Oxford Dictionary of English and the Encarta World English Dictionary, are also based on electronic corporate.

Though I have heard rumors of corpus projects for Japanese, I don't know of any Japanese dictionary that is corpus-based. Lexicographer and jeKai contributor Hitoshi Sobo (惣坊均) has given some thought to the idea of producing a free corpus of Japanese, and he described his ideas at the meeting. Some others also mentioned an interesting bilingual corpus of Japanese and English that is available at

http://www.idd.tamabi.ac.jp/corpus/yourei/index.htm

Ideally, a corpus would be drawn from a collection of texts (including transcripts of speech) that are somehow balanced and representative, and COBUILD and other corpora strive toward that goal. Since jeKai contributors do not have access to such a corpus of Japanese, we must make do mostly with citations from the Web. The Web is great for size and cost, of course, but it is not really representative of the full range of Japanese. Searches for Japanese words tend to yield many hits from government reports, corporate documents, bulletin board logs, and, especially, personal diaries. The quality of the language is mixed, and it can be hard, especially for us nonnative speakers of Japanese, to spot errors and to distinguish between standard usages and personal idiosyncracies. The Web is also very weak for literature, expository prose, transcripts of spoken language, and anything more than five or six years old. If there were a free corpus such as Sobo-san has proposed, then it would be a great boon not only to jeKai and other lexicographic projects but also to Japanese language educators, linguists, and other scholars.

A couple of people raised the question of how and whether editorial control is to be exerted at jeKai. Among other things, Brian Chandler spoke about the photo.net model of incremental additions to articles, and he mentioned the importance of preserving previous versions of articles. I hope that we can implement such a system in the near future.

Another issue is where to draw the line with questionable or offensive content. At present we have no rules about what cannot be in jeKai. While I can imagine situations in which we might want to exclude certain types of content, I feel that, for the time being, we should continue to be open about accepting contributions and refuse submissions only if they are clearly incorrect or present imminent legal problems. If problems do arise in the future (such as with offensive or defamatory content), we can discuss what to do about them here.

The most important issue raised at the meeting, I felt, was how to encourage people to continue contributing to jeKai. One person said he wished there were an easier interface, as he knows no HTML. I said that, for the time being, plain text will be fine (I will prepare the HTML). Also, Paul Flint is working on a Web form that will make it possible to input entries more easily. As soon as that's ready, we will put a link on the Web site to the form. I'm also hoping that we can prepare a revision form; if you want to correct, revise, or add to an existing entry, you'll just click on a butto and the entry's content will appear in a Web form, where you can edit it in your browser.  

But even more important, I think, is the challenge of motivating ourselves to finish and send in entries. To maintain the momentum of the project, I have been trying to make sure that we have at least one new jeKai entry a day (I've missed a couple of days, including the day of the JAT meeting), and that is one reason why perhaps half of the entries have been created by me. One person mentioned that he wanted to prepare an entry for the word 武道, an interest of his, but that it turned out to be an immensely complex topic. I suggested choosing something smaller, like some specific 武道-related term, and starting with that. Since even the simplest-seeming word can, once you look into it, contain many unexpected facets, I also suggested that we should set deadlines for ourselves. Give yourself at most two hours, for example, to prepare an entry, and when you reach the deadline send it in as it is even if there is still much more that you could say. If we insist on completeness and perfection, we will achieve nothing.

While jeKai is intended not only for translators but also for students, scholars, and others interested in learning more about meanings of Japanese words, translators are uniquely qualified to contribute to the dictionary. I hope that many JAT members and other translators will come to regard jeKai as a useful and permanent forum for sharing their knowledge about Japanese with the rest of the world.

Contents | Bulletins