Gensim

Latest version: v4.3.2

Safety actively analyzes 629678 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 15

1.0.0rc1

New features:
* Add Author-topic modeling (olavurmortensen, [893](https://github.com/RaRe-Technologies/gensim/pull/893))
* Add FastText word embedding wrapper (Jayantj, [847](https://github.com/RaRe-Technologies/gensim/pull/847))
* Add WordRank word embedding wrapper (parulsethi, [1066](https://github.com/RaRe-Technologies/gensim/pull/1066), [#1125](https://github.com/RaRe-Technologies/gensim/pull/1125))
* Add sklearn wrapper for LDAModel (AadityaJ, [932](https://github.com/RaRe-Technologies/gensim/pull/932))

Improvements:
* Python 3.6 support (tmylk [1077](https://github.com/RaRe-Technologies/gensim/pull/1077))
* Phrases and Phraser allow a generator corpus (ELind77 [1099](https://github.com/RaRe-Technologies/gensim/pull/1099))
* Ignore DocvecsArray.doctag_syn0norm in save. Fix 789 (accraze, [1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
* Move load and save word2vec_format out of word2vec class to KeyedVectors (tmylk, [1107](https://github.com/RaRe-Technologies/gensim/pull/1107))
* Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (cvangysel, [1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
* Fix broken link to paper in readme (bhargavvader, [1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
* Lazy formatting in evaluate_word_pairs (akutuzov, [1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
* Deacc option to keywords pre-processing (bhargavvader, [1076](https://github.com/RaRe-Technologies/gensim/pull/1076))

Tutorial and doc improvements:

* Clarifying comment in is_corpus func in utils.py (greninja, [1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
* Tutorial Topics_and_Transformations fix markdown and add references (lgmoneda, [1120](https://github.com/RaRe-Technologies/gensim/pull/1120))
* Fix doc2vec-lee.ipynb results to match previous behavior (bahbbc, [1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
* Remove Pattern lib dependency in News Classification tutorial (luizcavalcanti, [1118](https://github.com/RaRe-Technologies/gensim/pull/1118))
* Corpora_and_Vector_Spaces tutorial text clarification (lgmoneda, [1116](https://github.com/RaRe-Technologies/gensim/pull/1116))
* Update Transformation and Topics link from quick start notebook (mariana393, [1115](https://github.com/RaRe-Technologies/gensim/pull/1115))
* Quick Start Text clarification and typo correction (luizcavalcanti, [1114](https://github.com/RaRe-Technologies/gensim/pull/1114))
* Fix typos in Author-topic tutorial (Fil, [1102](https://github.com/RaRe-Technologies/gensim/pull/1102))
* Address benchmark inconsistencies in Annoy tutorial (droudy, [1113](https://github.com/RaRe-Technologies/gensim/pull/1113))

0.13.4.1

* Disable direct access warnings on save and load of Word2vec/Doc2vec (tmylk, [1072](https://github.com/RaRe-Technologies/gensim/pull/1072))
* Making Default hs error explicit (accraze, [1054](https://github.com/RaRe-Technologies/gensim/pull/1054))
* Removed unnecessary numpy imports (bhargavvader, [1065](https://github.com/RaRe-Technologies/gensim/pull/1065))
* Utils and Matutils changes (bhargavvader, [1062](https://github.com/RaRe-Technologies/gensim/pull/1062))
* Tests for the evaluate_word_pairs function (akutuzov, [1061](https://github.com/RaRe-Technologies/gensim/pull/1061))

0.13.4

* Added suggested lda model method and print methods to HDP class (bhargavvader, [1055](https://github.com/RaRe-Technologies/gensim/pull/1055))
* New class KeyedVectors to store embedding separate from training code (anmol01gulati and droudy, [980](https://github.com/RaRe-Technologies/gensim/pull/980))
* Evaluation of word2vec models against semantic similarity datasets like SimLex-999 (akutuzov, [1047](https://github.com/RaRe-Technologies/gensim/pull/1047))
* TensorBoard word embedding visualisation of Gensim Word2vec format (loretoparisi, [1051](https://github.com/RaRe-Technologies/gensim/pull/1051))
* Throw exception if load() is called on instance rather than the class in word2vec and doc2vec (dust0x, [889](https://github.com/RaRe-Technologies/gensim/pull/889))
* Loading and Saving LDA Models across Python 2 and 3. Fix 853 (anmolgulati, [913](https://github.com/RaRe-Technologies/gensim/pull/913), [#1093](https://github.com/RaRe-Technologies/gensim/pull/1093))
* Fix automatic learning of eta (prior over words) in LDA (olavurmortensen, [1024](https://github.com/RaRe-Technologies/gensim/pull/1024)).
* eta should have dimensionality V (size of vocab) not K (number of topics). eta with shape K x V is still allowed, as the user may want to impose specific prior information to each topic.
* eta is no longer allowed the "asymmetric" option. Asymmetric priors over words in general are fine (learned or user defined).
* As a result, the eta update (`update_eta`) was simplified some. It also no longer logs eta when updated, because it is too large for that.
* Unit tests were updated accordingly. The unit tests expect a different shape than before; some unit tests were redundant after the change; `eta='asymmetric'` now should raise an error.
* Optimise show_topics to only call get_lambda once. Fix 1006. (bhargavvader, [1028](https://github.com/RaRe-Technologies/gensim/pull/1028))
* HdpModel doc improvement. Inference and print_topics (dsquareindia, [1029](https://github.com/RaRe-Technologies/gensim/pull/1029))
* Removing Doc2Vec defaults so that it won't override Word2Vec defaults. Fix 795. (markroxor, [929](https://github.com/RaRe-Technologies/gensim/pull/929))
* Remove warning on gensim import "pattern not installed". Fix 1009 (shashankg7, [1018](https://github.com/RaRe-Technologies/gensim/pull/1018))
* Add delete_temporary_training_data() function to word2vec and doc2vec models. (deepmipt-VladZhukov, [987](https://github.com/RaRe-Technologies/gensim/pull/987))
* Documentation improvements (IrinaGoloshchapova, [1010](https://github.com/RaRe-Technologies/gensim/pull/1010), [#1011](https://github.com/RaRe-Technologies/gensim/pull/1011))
* LDA tutorial by Olavur, tips and tricks (olavurmortensen, [779](https://github.com/RaRe-Technologies/gensim/pull/779))
* Add double quote in commmand line to run on Windows (akarazeev, [1005](https://github.com/RaRe-Technologies/gensim/pull/1005))
* Fix directory names in notebooks to be OS-independent (mamamot, [1004](https://github.com/RaRe-Technologies/gensim/pull/1004))
* Respect clip_start, clip_end in most_similar. Fix 601. (parulsethi, [994](https://github.com/RaRe-Technologies/gensim/pull/994))
* Replace Python sigmoid function with scipy in word2vec & doc2vec (markroxor, [989](https://github.com/RaRe-Technologies/gensim/pull/989))
* WMD to return 0 instead of inf for sentences that contain a single word (rbahumi, [986](https://github.com/RaRe-Technologies/gensim/pull/986))
* Pass all the params through the apply call in lda.get_document_topics(), test case to use the per_word_topics through the corpus in test_ldamodel (parthoiiitm, [978](https://github.com/RaRe-Technologies/gensim/pull/978))
* Pyro annotations for lsi_worker (markroxor, [968](https://github.com/RaRe-Technologies/gensim/pull/968))

0.13.3

* Add vocabulary expansion feature to word2vec. (isohyt, [900](https://github.com/RaRe-Technologies/gensim/pull/900))
* Tutorial: Reproducing Doc2vec paper result on wikipedia. (isohyt, [654](https://github.com/RaRe-Technologies/gensim/pull/654))
* Add Save/Load interface to AnnoyIndexer for index persistence (fortiema, [845](https://github.com/RaRe-Technologies/gensim/pull/845))
* Fixed issue [938](https://github.com/RaRe-Technologies/gensim/issues/938),Creating a unified base class for all topic models. ([markroxor](https://github.com/markroxor), [#946](https://github.com/RaRe-Technologies/gensim/pull/946))
- breaking change in `HdpTopicFormatter.show_topics`
* Add Phraser for Phrases optimization. ( gojomo & anujkhare , [837](https://github.com/RaRe-Technologies/gensim/pull/837))
* Fix issue 743, in word2vec's n_similarity method if at least one empty list is passed ZeroDivisionError is raised (pranay360, [883](https://github.com/RaRe-Technologies/gensim/pull/883))
* Change export_phrases in Phrases model. Fix issue 794 (AadityaJ, [879](https://github.com/RaRe-Technologies/gensim/pull/879))
- bigram construction can now support multiple bigrams within one sentence
* Fix issue [838](https://github.com/RaRe-Technologies/gensim/issues/838), RuntimeWarning: overflow encountered in exp ([markroxor](https://github.com/markroxor), [#895](https://github.com/RaRe-Technologies/gensim/pull/895))
* Change some log messages to warnings as suggested in issue 828. (rhnvrm, [884](https://github.com/RaRe-Technologies/gensim/pull/884))
* Fix issue 851, In summarizer.py, RunTimeError is raised if single sentence input is provided to avoid ZeroDivionError. (metalaman, 887)
* Fix issue [791](https://github.com/RaRe-Technologies/gensim/issues/791), correct logic for iterating over SimilarityABC interface. ([MridulS](https://github.com/MridulS), [#839](https://github.com/RaRe-Technologies/gensim/pull/839))
* Fix RP model loading for large Fortran-order arrays (piskvorky, [605](https://github.com/RaRe-Technologies/gensim/issues/938))
* Remove ShardedCorpus from init because of Theano dependency (tmylk, [919](https://github.com/RaRe-Technologies/gensim/pull/919))
* Documentation improvements ( dsquareindia & tmylk, [914](https://github.com/RaRe-Technologies/gensim/pull/914), [#906](https://github.com/RaRe-Technologies/gensim/pull/906) )
* Add Annoy memory-mapping example (harshul1610, [899](https://github.com/RaRe-Technologies/gensim/pull/899))
* Fixed issue [601](https://github.com/RaRe-Technologies/gensim/issues/601), correct docID in most_similar for clip range (parulsethi, [#994](https://github.com/RaRe-Technologies/gensim/pull/994))

0.13.2

* wordtopics has changed to word_topics in ldamallet, and fixed issue 764. (bhargavvader, [771](https://github.com/RaRe-Technologies/gensim/pull/771))
- assigning wordtopics value of word_topics to keep backward compatibility, for now
* topics, topn parameters changed to num_topics and num_words in show_topics() and print_topics() (droudy, [755](https://github.com/RaRe-Technologies/gensim/pull/755))
- In hdpmodel and dtmmodel
- NOT BACKWARDS COMPATIBLE!
* Added random_state parameter to LdaState initializer and check_random_state() (droudy, [113](https://github.com/RaRe-Technologies/gensim/pull/113))
* Topic coherence update with `c_uci`, `c_npmi` measures. LdaMallet, LdaVowpalWabbit support. Add `topics` parameter to coherencemodel. Can now provide tokenized topics to calculate coherence value. Faster backtracking. (dsquareindia, [750](https://github.com/RaRe-Technologies/gensim/pull/750), [#793](https://github.com/RaRe-Technologies/gensim/pull/793))
* Added a check for empty (no words) documents before starting to run the DTM wrapper if model = "fixed" is used (DIM model) as this causes the an error when such documents are reached in training. (eickho, [806](https://github.com/RaRe-Technologies/gensim/pull/806))
* New parameters `limit`, `datatype` for load_word2vec_format(); `lockf` for intersect_word2vec_format (gojomo, [817](https://github.com/RaRe-Technologies/gensim/pull/817))
* Changed `use_lowercase` option in word2vec accuracy to `case_insensitive` to account for case variations in training vocabulary (jayantj, [804](https://github.com/RaRe-Technologies/gensim/pull/804)
* Link to Doc2Vec on airline tweets example in tutorials page (544895340, [823](https://github.com/RaRe-Technologies/gensim/pull/823))
* Small error on Doc2vec notebook tutorial (charlessutton, [816](https://github.com/RaRe-Technologies/gensim/pull/816))
* Bugfix: Full2sparse clipped to use abs value (tmylk, [811](https://github.com/RaRe-Technologies/gensim/pull/811))
* WMD docstring: add tutorial link and query example (tmylk, [813](https://github.com/RaRe-Technologies/gensim/pull/813))
* Annoy integration to speed word2vec and doc2vec similarity. Tutorial update (droudy, [799](https://github.com/RaRe-Technologies/gensim/pull/799),[#792](https://github.com/RaRe-Technologies/gensim/pull/799) )
* Add converter of LDA model between Mallet, Vowpal Wabit and gensim (dsquareindia, [798](https://github.com/RaRe-Technologies/gensim/pull/798), [#766](https://github.com/RaRe-Technologies/gensim/pull/766))
* Distributed LDA in different network segments without broadcast (menshikh-iv, [782](https://github.com/RaRe-Technologies/gensim/pull/782))
* Update Corpora_and_Vector_Spaces.ipynb (megansquire, [772](https://github.com/RaRe-Technologies/gensim/pull/772))
* DTM wrapper bug fixes caused by renaming num_words in 755 (bhargavvader, [770](https://github.com/RaRe-Technologies/gensim/pull/770))
* Add LsiModel.docs_processed attribute (hobson, [763](https://github.com/RaRe-Technologies/gensim/pull/763))
* Dynamic Topic Modelling in Python. Google Summer of Code 2016 project. (bhargavvader, [739](https://github.com/RaRe-Technologies/gensim/pull/739), [#831](https://github.com/RaRe-Technologies/gensim/pull/831))

0.13.1

* Topic coherence C_v and U_mass (dsquareindia, 710)

Page 7 of 15

Releases

Has known vulnerabilities

Previous Next

Gensim

Page 7 of 15

1.0.0rc1

0.13.4.1

0.13.4

0.13.3

0.13.2

0.13.1

Page 7 of 15

Links

Releases