Gensim

Latest version: v4.3.2

Safety actively analyzes 629723 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 15

0.13.0

* Added Distance Metrics to matutils.pt (bhargavvader, 656)
* Tutorials migrated from website to ipynb (j9chan, 721), (jesford, 733), (jesford, 725), (jesford, 716)
* New doc2vec intro tutorial (seanlaw, 730)
* Gensim Quick Start Tutorial (andrewjlm, 727)
* Add export_phrases(sentences) to model Phrases (hanabi1224 588)
* SparseMatrixSimilarity returns a sparse matrix if `maintain_sparsity` is True (davechallis, 590)
* added functionality for Topics of Words in document - i.e, dynamic topics. (bhargavvader, 704)
- also included tutorial which explains new functionalities, and document word-topic colring.
* Made normalization an explicit transformation. Added 'l1' norm support (dsquareindia, 649)
* added term-topics API for most probable topic for word in vocab. (bhargavvader, 706)
* build_vocab takes progress_per parameter for smaller output (zer0n, 624)
* Control whether to use lowercase for computing word2vec accuracy. (alantian, 607)
* Easy import of GloVe vectors using Gensim (Manas Ranjan Kar, 625)
- Allow easy port of GloVe vectors into Gensim
- Standalone script with command line arguments, compatible with Python>=2.6
- Usage: python -m gensim.scripts.glove2word2vec -i glove_vectors.txt -o output_word2vec_compatible.txt
* Add `similar_by_word()` and `similar_by_vector()` to word2vec (isohyt, 381)
* Convenience method for similarity of two out of training sentences to doc2vec (ellolo, 707)
* Dynamic Topic Modelling Tutorial updated with Dynamic Influence Model (bhargavvader, 689)
* Added function to filter 'n' most frequent words from the dictionary (abhinavchawla, 718)
* Raise warnings if vocab is single character elements and if alpha is increased in word2vec/doc2vec (dsquareindia, 705)
* Tests for wikidump (jonmcoe, 723)
* Mallet wrapper sparse format support (RishabGoel, 664)
* Doc2vec pre-processing script translated from bash to Python (andrewjlm, 720)

0.12.4

* Better internal handling of job batching in word2vec (535)
- up to 300% speed up when training on very short documents (~tweets)
* Word2vec CLI in line with original word2vec.c (Andrey Kutuzov, 538)
- Same default values. See diff https://github.com/akutuzov/gensim/commit/6456cbcd75e6f8720451766ba31cc046b4463ae2
- Standalone script with command line arguments matching those of original C tool.
- Usage: python -m gensim.scripts.word2vec_standalone -train data.txt -output trained_vec.txt -size 200 -window 2 -sample 1e-4
* Improved load_word2vec_format() performance (svenkreiss, 555)
- Remove `init_sims()` call for performance improvements when normalized vectors are not needed.
- Remove `norm_only` parameter (API change). Call `init_sims(replace=True)` after the `load_word2vec_format()` call for the old `norm_only=True` behavior.
* Word2vec allows non-strict unicode error handling (ignore or replace) (Gordon Mohr, 466)
* Doc2Vec `model.docvecs[key]` now raises KeyError for unknown keys (Gordon Mohr, 520)
* Fix `DocvecsArray.index_to_doctag` so `most_similar()` returns string doctags (Gordon Mohr, 560)
* On-demand loading of the `pattern` library in utils.lemmatize (Jan Zikes, 461)
- `utils.HAS_PATTERN` flag moved to `utils.has_pattern()`
* Threadsafe Word2Vec/Doc2Vec finish-check to avoid hang/unending Word2Vec/Doc2Vec training (Gordon Mohr, 571)
* Tuned `TestWord2VecModel.test_cbow_hs()` against random failures (Gordon Mohr, 531)
* Prevent ZeroDivisionError when `default_timer()` indicate no elapsed time (Gordon Mohr, 518)
* Forwards compatibility for NumPy > 1.10 (Matti Lyra, 494, 513)
- LdaModel and LdaMulticore produce a large number of DeprecationWarnings from
.inference() because the term ids in each chunk returned from utils.grouper
are floats. This behaviour has been changed so that the term IDs are now ints.
- utils.grouper returns a python list instead of a numpy array in .update() when
LdaModel is called in non distributed mode
- in distributed mode .update() will still call utils.grouper with as_numpy=True
to save memory
- LdaModel.update and LdaMulticore.update have a new keyword parameter
chunks_as_numpy=True/False (defaults to False) that allows controlling
this behaviour

0.12.3

* Make show_topics return value consistent across models (Christopher Corley, 448)
- All models with the `show_topics` method should return a list of
`(topic_number, topic)` tuples, where `topic` is a list of
`(word, probability)` tuples.
- This is a breaking change that affects users of the `LsiModel`, `LdaModel`,
and `LdaMulticore` that may be reliant on the old tuple layout of
`(probability, word)`.
* Mixed integer & string document-tags (keys to doc-vectors) will work (Gordon Mohr, 491)
- DocvecsArray's `index2doctag` list is renamed/reinterpreted as `offset2doctag`
- `offset2doctag` entries map to `doctag_syn0` indexes *after* last plain-int doctag (if any)
- (If using only string doctags, `offset2doctag` may be interpreted same as `index2doctag`.)
* New Tutorials on Dynamic Topic Modelling and Classification via Word2Vec (arttii 471, mataddy 500)
* Auto-learning for the eta parameter on the LdaModel (Christopher Corley, 479)
* Python 3.5 support
* Speed improvements to keyword and summarisation methods (erbas 441)
* OSX wheels (504)
* Win build (492)

0.12.2

* tutorial on text summarization (Ólavur Mortensen, 436)
* more flexible vocabulary construction in word2vec & doc2vec (Philipp Dowling, 434)
* added support for sliced TransformedCorpus objects, so that after applying (for instance) TfidfModel the returned corpus remains randomly indexable. (Matti Lyra, 425)
* changed the LdaModel.save so that a custom `ignore` list can be passed in (Matti Lyra, 331)
* added support for NumPy style fancy indexing to corpus objects (Matti Lyra, 414)
* py3k fix in distributed LSI (spacecowboy, 433)
* Windows fix for setup.py (428)
* fix compatibility for scipy 0.16.0 (415)

0.12.1

* improvements to testing, switch to Travis CI containers
* support for loading old word2vec models (<=0.11.1) in 0.12+ (Gordon Mohr, 405)
* various bug fixes to word2vec, doc2vec (Gordon Mohr, 393, 386, 404)
* TextSummatization support for very short texts (Federico Barrios, 390)
* support for word2vec[['word1', 'word2'...]] convenience API calls (Satish Palaniappan, 395)
* MatrixSimilarity supports indexing generator corpora (single pass)

0.12.0

* complete API, performance, memory overhaul of doc2vec (Gordon Mohr, 356, 373, 380, 384)
- fast infer_vector(); optional memory-mapped doc vectors; memory savings with int doc IDs
- 'dbow_words' for combined DBOW & word skip-gram training; new 'dm_concat' mode
- multithreading & negative-sampling optimizations (also benefitting word2vec)
- API NOTE: doc vectors must now be accessed/compared through model's 'docvecs' field
(eg: "model.docvecs['my_ID']" or "model.docvecs.most_similar('my_ID')")
- https://github.com/piskvorky/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb
* new "text summarization" module (PR 324: Federico Lopez, Federico Barrios)
- https://github.com/summanlp/docs/raw/master/articulo/articulo-en.pdf
* new matutils.argsort with partial sort
- performance speedups to all similarity queries (word2vec, Similarity classes...)
* word2vec can compute likelihood scores for classification (Mat Addy, 358)
- http://arxiv.org/abs/1504.07295
- http://nbviewer.ipython.org/github/taddylab/deepir/blob/master/w2v-inversion.ipynb
* word2vec supports "encoding" parameter when loading from C format, for non-utf8 models
* more memory-efficient word2vec training (385)
* fixes to Python3 compatibility (Pavel Kalaidin 330, S-Eugene 369)
* enhancements to save/load format (Liang Bo Wang 363, Gordon Mohr 356)
- pickle defaults to protocol=2 for better py3 compatibility
* fixes and improvements to wiki parsing (Lukas Elmer 357, Excellent5 333)
* fix to phrases scoring (Ikuya Yamada, 353)
* speed up of phrases generation (Dave Challis, 349)
* changes to multipass LDA training (Christopher Corley, 298)
* various doc improvements and fixes (Matti Lyra 331, Hongjoo Lee 334)
* fixes and improvements to LDA (Christopher Corley 323)

Page 8 of 15

Releases

Has known vulnerabilities

Previous Next

Gensim

Page 8 of 15

0.13.0

0.12.4

0.12.3

0.12.2

0.12.1

0.12.0

Page 8 of 15

Links

Releases