Natural Language Toolkit

...software, data sets and tutorials for natural language processing...

Code

 

From NLTK

Jump to: navigation, search

NLTK includes the following software modules (47k lines of Python code):

Corpus readers
interfaces to many Corpora
Tokenizers
whitespace, newline, blankline, word, wordpunct, treebank, sexpr, regexp, Punkt sentence segmenter
Stemmers
Porter, Lancaster, regexp
Taggers
regexp, n-gram, backoff, Brill, HMM
Parsers
recursive descent, shift-reduce, chunk, chart, feature-based, probabilistic, ...
Semantic interpretation
untyped lambda calculus, first-order models, parser interface
Wordnet
wordnet interface, lexical relations, similarity
Classifiers
decision tree, maximum entropy, naive Bayes, Weka interface
Clusterers
expectation maximization, agglomerative, k-means
Evaluation
accuracy, precision, recall, windowdiff
Estimation
uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell
Miscellaneous
feature detection, unification, chatbots, many utilities
NLTK-Contrib
many more packages (54k lines of code)

Browse the source code: http://nltk.org/nltk/

Browse the subversion repository: http://nltk.svn.sourceforge.net/viewvc/nltk/trunk/nltk/nltk/

Personal tools