Nltk

Latest version: v3.8.1

Safety actively analyzes 629811 Python packages for vulnerabilities to keep your Python projects secure.

Page 10 of 13

0.9.5

Not secure

NLTK:
* text module with support for concordancing, text generation, plotting
* book module
* Major reworking of the automated theorem proving modules (Dan Garrette)
* draw.dispersion now uses pylab
* draw.concordance GUI tool
* nltk.data supports for reading corpora and other data files from within zipfiles
* trees can be constructed from strings with Tree(s) (cf Tree.parse(s))

Contrib (work in progress):
* many updates to student projects
- nltk_contrib.agreement (Thomas Lippincott)
- nltk_contrib.coref (Joseph Frazee)
- nltk_contrib.depparser (Jason Narad)
- nltk_contrib.fuf (Petro Verkhogliad)
- nltk_contrib.hadoop (Xinfan Meng)
* clean-ups: deleted stale files; moved some packages to misc

Data
* Cleaned up Gutenberg text corpora
* added Moby Dick; removed redundant copy of Blake songs.
* more tagger models
* renamed to nltk_data to facilitate installation
* stored each corpus as a zip file for quicker installation
and access, and to solve a problem with the Propbank
corpus including a file with an illegal name for MSWindows
(con.xml).

Book:
* changed filenames to chNN format
* reworked opening chapters (work in progress)

Distributions:
* fixed problem with mac installer that arose when Python binary
couldn't be found
* removed dependency of NLTK on nltk_data so that NLTK code can be
installed before the data

0.9.4

Not secure

NLTK:
- Expanded semantics package for first order logic, linear logic,
glue semantics, DRT, LFG (Dan Garrette)
- new WordSense class in wordnet.synset supporting access to synsets
from sense keys and accessing sense counts (Joel Nothman)
- interface to Mallet's linear chain CRF implementation (nltk.tag.crf)
- misc bugfixes incl Punkt, synsets, maxent
- improved support for chunkers incl flexible chunk corpus reader,
new rule type: ChunkRuleWithContext
- new GUI for pos-tagged concordancing nltk.draw.pos_concordance
- new GUI for developing regexp chunkers nltk.draw.rechunkparser
- added bio_sents() and bio_words() methods to ConllChunkCorpusReader in conll.py
to allow reading (word, tag, chunk_typ) tuples off of CoNLL-2000 corpus. Also
modified ConllChunkCorpusView to support these changes.
- feature structures support values with custom unification methods
- new flag on tagged corpus readers to use simplified tagsets
- new package for ngram language modeling with Katz backoff nltk.model
- added classes for single-parented and multi-parented trees that
automatically maintain parent pointers (nltk.tree.ParentedTree and
nltk.tree.MultiParentedTree)
- new WordNet browser GUI (Jussi Salmela, Paul Bone)
- improved support for lazy sequences
- added generate() method to probability distributions
- more flexible parser for converting bracketed strings to trees
- made fixes to docstrings to improve API documentation

Contrib (work in progress)
- new NLG package, FUF/SURGE (Petro Verkhogliad)
- new dependency parser package (Jason Narad)
- new Coreference package, incl support for
ACE-2, MUC-6 and MUC-7 corpora (Joseph Frazee)
- CCG Parser (Graeme Gange)
- first order resolution theorem prover (Dan Garrette)

Data:
- Nnw NPS Chat Corpus and corpus reader (nltk.corpus.nps_chat)
- ConllCorpusReader can now be used to read CoNLL 2004 and 2005 corpora.
- Implemented HMM-based Treebank POS tagger and phrase chunker for
nltk_contrib.coref in api.py. Pickled versions of these objects are checked
in in data/taggers and data/chunkers.

Book:
- misc corrections in response to feedback from readers

0.9.3

Not secure

NLTK:
- modified WordNet similarity code to use pre-built information content files
- new classifier-based tagger, BNC corpus reader
- improved unicode support for corpus readers
- improved interfaces to Weka, Prover9/Mace4
- new support for using MEGAM and SciPy to train maxent classifiers
- rewrite of Punkt sentence segmenter (Joel Nothman)
- bugfixes for WordNet information content module (Jordan Boyd-Graber)
- code clean-ups throughout

Book:
- miscellaneous fixes in response to feedback from readers

Contrib:
- implementation of incremental algorithm for generating
referring expressions (contributed by Margaret Mitchell)
- refactoring WordNet browser (Paul Bone)

Corpora:
- included WordNet information content files

0.9.2

NLTK:
- new theorem-prover and model-checker module nltk.inference,
including interface to Prover9/Mace4 (Dan Garrette, Ewan Klein)
- bugfix in Reuters corpus reader that causes Python
to complain about too many open files
- VerbNet and PropBank corpus readers

Data:
- VerbNet Corpus version 2.1: hierarchical, verb lexicon linked to WordNet
- PropBank Corpus: predicate-argument structures, as stand-off annotation of Penn Treebank

Contrib:
- New work on WordNet browser, incorporating a client-server model (Jussi Salmela)

Distributions:
- Mac OS 10.5 distribution

0.9.1

NLTK:
- new interface for text categorization corpora
- new corpus readers: RTE, Movie Reviews, Question Classification, Brown Corpus
- bugfix in ConcatenatedCorpusView that caused iteration to fail if it didn't start from the beginning of the corpus

Data:
- Question classification data, included with permission of Li & Roth
- Reuters 21578 Corpus, ApteMod version, from CPAN
- Movie Reviews corpus (sentiment polarity), included with permission of Lillian Lee
- Corpus for Recognising Textual Entailment (RTE) Challenges 1, 2 and 3
- Brown Corpus (reverted to original file structure: ca01-cr09)
- Penn Treebank corpus sample (simplified implementation, new readers treebank_raw and treebank_chunk)
- Minor redesign of corpus readers, to use filenames instead of "items" to identify parts of a corpus

Contrib:
- theorem_prover: Prover9, tableau, MaltParser, Mace4, glue semantics, docs (Dan Garrette, Ewan Klein)
- drt: improved drawing, conversion to FOL (Dan Garrette)
- gluesemantics: GUI demonstration, abstracted LFG code, documentation (Dan Garrette)
- readability: various text readability scores (Thomas Jakobsen, Thomas Skardal)
- toolbox: code to normalize toolbox databases (Greg Aumann)

Book:
- many improvements in early chapters in response to reader feedback
- updates for revised corpus readers
- moved unicode section to chapter 3
- work on engineering.txt (not included in 0.9.1)

Distributions:
- Fixed installation for Mac OS 10.5 (Joshua Ritterman)
- Generalize doctest_driver to work with doc_contrib

0.9

Not secure

NLTK:
- New naming of packages and modules, and more functions imported into
top-level nltk namespace, e.g. nltk.chunk.Regexp -> nltk.RegexpParser,
nltk.tokenize.Line -> nltk.LineTokenizer, nltk.stem.Porter -> nltk.PorterStemmer,
nltk.parse.ShiftReduce -> nltk.ShiftReduceParser
- processing class names changed from verbs to nouns, e.g.
StemI -> StemmerI, ParseI -> ParserI, ChunkParseI -> ChunkParserI, ClassifyI -> ClassifierI
- all tokenizers are now available as subclasses of TokenizeI,
selected tokenizers are also available as functions, e.g. wordpunct_tokenize()
- rewritten ngram tagger code, collapsed lookup tagger with unigram tagger
- improved tagger API, permitting training in the initializer
- new system for deprecating code so that users are notified of name changes.
- support for reading feature cfgs to parallel reading cfgs (parse_featcfg())
- text classifier package, maxent (GIS, IIS), naive Bayes, decision trees, weka support
- more consistent tree printing
- wordnet's morphy stemmer now accessible via stemmer package
- RSLP Portuguese stemmer (originally developed by Viviane Moreira Orengo, reimplemented by Tiago Tresoldi)
- promoted ieer_rels.py to the sem package
- improvements to WordNet package (Jussi Salmela)
- more regression tests, and support for checking coverage of tests
- miscellaneous bugfixes
- remove numpy dependency

Data:
- new corpus reader implementation, refactored syntax corpus readers
- new data package: corpora, grammars, tokenizers, stemmers, samples
- CESS-ESP Spanish Treebank and corpus reader
- CESS-CAT Catalan Treebank and corpus reader
- Alpino Dutch Treebank and corpus reader
- MacMorpho POS-tagged Brazilian Portuguese news text and corpus reader
- trained model for Portuguese sentence segmenter
- Floresta Portuguese Treebank version 7.4 and corpus reader
- TIMIT player audio support

Contrib:
- BioReader (contributed by Carlos Rodriguez)
- TnT tagger (contributed by Sam Huston)
- wordnet browser (contributed by Jussi Salmela, requires wxpython)
- lpath interpreter (contributed by Haejoong Lee)
- timex -- regular expression-based temporal expression tagger

Book:
- polishing of early chapters
- introductions to parts 1, 2, 3
- improvements in book processing software (xrefs, avm & gloss formatting, javascript clipboard)
- updates to book organization, chapter contents
- corrections throughout suggested by readers (acknowledged in preface)
- more consistent use of US spelling throughout
- all examples redone to work with single import statement: "import nltk"
- reordered chapters: 5->7->8->9->11->12->5
* language engineering in part 1 to broaden the appeal
of the earlier part of the book and to talk more about
evaluation and baselines at an earlier stage
* concentrate the partial and full parsing material in part 2,
and remove the specialized feature-grammar material into part 3

Distributions:
- streamlined mac installation (Joshua Ritterman)
- included mac distribution with ISO image

Page 10 of 13

Releases

Has known vulnerabilities

Previous Next

Nltk

Page 10 of 13

0.9.5

0.9.4

0.9.3

0.9.2

0.9.1

0.9

Page 10 of 13

Links

Releases