Nltk

Latest version: v3.8.1

Safety actively analyzes 629811 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 13

0.8

Not secure
Code:
- changed nltk.__init__ imports to explicitly import names from top-level modules
- changed corpus.util to use the 'rb' flag for opening files, to fix problems
reading corpora under MSWindows
- updated stale examples in engineering.txt
- extended feature structure interface to permit chained features, e.g. fs['F','G']
- further misc improvements to test code plus some bugfixes
Tutorials:
- rewritten opening section of tagging chapter
- reorganized some exercises

0.8b2

Code (major):
- new corpus package, obsoleting old corpora package
- supports caching, slicing, corpus search path
- more flexible API
- global updates so all NLTK modules use new corpus package
- moved nltk/contrib to separate top-level package nltk_contrib
- changed wordpunct tokenizer to use \w instead of a-zA-Z0-9
as this will be more robust for languages other than English,
with implications for many corpus readers that use it
- known bug: certain re-entrant structures in featstruct
- known bug: when the LHS of an edge contains an ApplicationExpression,
variable values in the RHS bindings aren't copied over when the
fundamental rule applies
- known bug: HMM tagger is broken
Tutorials:
- global updates to NLTK and docs
- ongoing polishing
Corpora:
- treebank sample reverted to published multi-file structure
Contrib:
- DRT and Glue Semantics code (nltk_contrib.drt, nltk_contrib.gluesemantics, by Dan Garrette)

0.8b1

Code (major):
- changed package name to nltk
- import all top-level modules into nltk, reducing need for import statements
- reorganization of sub-package structures to simplify imports
- new featstruct module, unifying old featurelite and featurestructure modules
- FreqDist now inherits from dict, fd.count(sample) becomes fd[sample]
- FreqDist initializer permits: fd = FreqDist(len(token) for token in text)
- made numpy optional
Code (minor):
- changed GrammarFile initializer to accept filename
- consistent tree display format
- fixed loading process for WordNet and TIMIT that prevented code installation if data not installed
- taken more care with unicode types
- incorporated pcfg code into cfg module
- moved cfg, tree, featstruct to top level
- new filebroker module to make handling of example grammar files more transparent
- more corpus readers (webtext, abc)
- added cfg.covers() to check that a grammar covers a sentence
- simple text-based wordnet browser
- known bug: parse/featurechart.py uses incorrect apply() function
Corpora:
- csv data file to document NLTK corpora
Contrib:
- added Glue semantics code (contrib.glue, by Dan Garrette)
- Punkt sentence segmenter port (contrib.punkt, by Willy)
- added LPath interpreter (contrib.lpath, by Haejoong Lee)
- extensive work on classifiers (contrib.classifier*, Sumukh Ghodke)
Tutorials:
- polishing on parts I, II
- more illustrations, data plots, summaries, exercises
- continuing to make prose more accessible to non-linguistic audience
- new default import that all chapters presume: from nltk.book import *
Distributions:
- updated to latest version of numpy
- removed WordNet installation instructions as WordNet is now included in corpus distribution
- added pylab (matplotlib)

0.7.5

Code:
- improved WordNet and WordNet-Similarity interface
- the Lancaster Stemmer (contributed by Steven Tomcavage)
Corpora:
- Web text samples
- BioCreAtIvE-PPI - a corpus for protein-protein interactions
- Switchboard Telephone Speech Corpus Sample (via Talkbank)
- CMU Problem Reports Corpus sample
- CONLL2002 POS+NER data
- Patient Information Leaflet corpus
- WordNet 3.0 data files
- English wordlists: basic English, frequent words
Tutorials:
- more improvements to text and images

0.7.4

Code:
- Indian POS tagged corpus reader: corpora.indian
- Sinica Treebank corpus reader: corpora.sinica_treebank
- new web corpus reader corpora.web
- tag package now supports pickling
- added function to utilities.py to guess character encoding
Corpora:
- Rotokas texts from Stuart Robinson
- POS-tagged corpora for several Indian languages (Bangla, Hindi, Marathi, Telugu) from A Kumaran
Tutorials:
- Substantial work on Part II of book on structured programming, parsing and grammar
- More bibliographic citations
- Improvements in typesetting, cross references
- Redimensioned images and tables for better use of page space
- Moved project list to wiki
Contrib:
- validation of toolbox entries using chunking
- improved classifiers
Distribution:
- updated for Python 2.5.1, Numpy 1.0.2

0.7.3

* Code:
- made chunk.Regexp.parse() more flexible about its input
- developed new syntax for PCFG grammars, e.g. A -> B C [0.3] | D [0.7]
- fixed CFG parser to support grammars with slash categories
- moved beta classify package from main NLTK to contrib
- Brill taggers loaded correctly
- misc bugfixes
* Corpora:
- Shakespeare XML corpus sample and corpus reader
* Tutorials:
- improvements to prose, exercises, plots, images
- expanded and reorganized tutorial on structured programming
- formatting improvements for Python listings
- improved plots (using pylab)
- categorization of problems by difficulty
Contrib:
- more work on kimmo lexicon and grammar
- more work on classifiers

Page 11 of 13

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.