Code
From NLTK
NLTK includes the following software modules (47k lines of Python code):
- Corpus readers
- interfaces to many Corpora
- Tokenizers
- whitespace, newline, blankline, word, wordpunct, treebank, sexpr, regexp, Punkt sentence segmenter
- Stemmers
- Porter, Lancaster, regexp
- Taggers
- regexp, n-gram, backoff, Brill, HMM
- Parsers
- recursive descent, shift-reduce, chunk, chart, feature-based, probabilistic, ...
- Semantic interpretation
- untyped lambda calculus, first-order models, parser interface
- Wordnet
- wordnet interface, lexical relations, similarity
- Classifiers
- decision tree, maximum entropy, naive Bayes, Weka interface
- Clusterers
- expectation maximization, agglomerative, k-means
- Evaluation
- accuracy, precision, recall, windowdiff
- Estimation
- uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell
- Miscellaneous
- feature detection, unification, chatbots, many utilities
- NLTK-Contrib
- many more packages (54k lines of code)
Browse the source code: http://nltk.org/nltk/
Browse the subversion repository: http://nltk.svn.sourceforge.net/viewvc/nltk/trunk/nltk/nltk/



