Natural Language Toolkit

...software, data sets and tutorials for natural language processing...

Translation

 

From NLTK

Jump to: navigation, search

Natural Language Processing raises new scientific and engineering challenges for each new language. An excellent way to improve the quality of NLP for a given language - in the longer term - is to have more speakers of the language becoming active members of the NLP community. You could help this process by forming a group to translate the NLTK book into your language.

Ongoing Translation Work

Work is ongoing in the following languages; write to the contact address (mailing list or individual) if you are interested in participating.

Language Materials Contact Leaders Corpora
Greek Gr:Book Steven Bird Theodosios Chimonidis, Evangelos Himonides none
Hindi Hi:Book mailing list Grishma Govani none
Portuguese Pt:Book, Guide mailing list Lucia Specia, Tiago Tresoldi tagged text, treebank, no lexicon
Spanish Es:Book mailing list Antoni Oliver, Maria Dolores Rodríguez none
Tamil Ta:Book mailing list Sri Ramadoss M none

Obtaining Data

An initial step, before translating the book, is to obtain linguistically annotated data, such as tagged text, a treebank, a lexicon, etc. Please try to get permission for this data (or a sample) to be included with NLTK's data distribution. Consider writing a guide for doing basic NLP tasks in your language (cf. http://nltk.org/doc/guides/portuguese.html)

Translating the book

  • Contact Steven Bird to indicate your interest, and to obtain a wiki account
  • Ideally there will be multiple people sharing the task, and a new mailing list will be set up to facilitate communication
  • Is there an automatic translation service from English to your language? (E.g. Babelfish supports about a dozen languages)
  • Create a wiki page Lg:Book where Lg is the ISO 639 code for the language; use this to link to the chapters being translated and to keep a public record of progress
  • Create a wiki page Lg:Terminology to hold a table of terminology translations, for consistency across the book; discuss terminology issues on the mailing list (cf our term index)
  • Obtain the English source for the chapter you wish to translate, by replacing the filename suffix with .txt, e.g. http://nltk.org/doc/en/tag.txt
  • Create a wiki page Lg:Chapter, where Chapter is the chapter name
  • Convert the section headings and program listings into wiki text, using == for chapter headings, === for section headings, and <pre>...</pre> for program listings.
  • Rework some of the English examples with equivalent examples using available corpora for your language
  • Write an appendix, focussing on any issues specific to your language that are not covered elsewhere in the book.
Personal tools