Annif

Latest version: v1.1.0

Safety actively analyzes 630602 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 8

0.54.0

This release adds a new `--jobs` parameter for the `annif train` command, which allows easy control of the number of threads/CPUs when training MLLM, fasttext and Omikuji backends. Many other improvements are included that speed up the MLLM backend, especially in the case of a large vocabulary. Also a few minor bugs have been fixed.

**Edit:** Also introduces support for adding new text-input transformation operations to Annif. Previously the input-limiting feature was implemented as a backend mechanism (446, 452), which was set up in a project configuration e.g. with a setting `input_limit=5000`; now the input-limiting feature is implemented as a more general input-text transform and it can be set up in the project configuration with `transform=limit(5000)`.

New features:
512 Support jobs parameter in train command
**Edit:** 496 Support for adding input-transformation operations

Improvements:
500 Implement custom MeanLayer in nn_ensemble
511/483 Process training docs in parallel in MLLM backend
513/519 Keep serialized dump of SKOS graph to save parsing time
518 Use least frequent token as key in TokenSetIndex used by MLLM
520 Optimize limit_mask creation

Bug fixes:
510/502 Use set as container of uris instead of list in DocumentFile
515/453 Allow NN ensemble to be used for parallel eval
517 Skip unimportant subjects in _vector_to_list_suggestion
522/521 Allow private projects to be accessed from CLI

0.53.2

This patch release includes the following changes:
- 506 Fix NN ensemble training and learning on one-document corpus
- 509 Warn instead of error in case of multiple subjects per doc in SVC training
- 503 Fix read-the-docs documentation build error due to package conflict

0.53.1

This patch release [fixes](https://github.com/NatLibFi/Annif/pull/501) a bug which prevented training the SVC backend on fulltext corpus.

0.53

0.53.0

This release adds two new backends, YAKE and SVC. The YAKE backend is a wrapper around the [YAKE library](https://github.com/LIAAD/yake), which performs lexical unsupervised keyword extraction. There is no need for training data. See the [YAKE](https://github.com/NatLibFi/Annif/wiki/Backend%3A-YAKE) wiki page for more information. In future Annif releases, it would be possible to extend YAKE support so that it can be used to suggest new terms for a vocabulary (the keywords that are not found in the vocabulary).

The SVC backend implements Linear Support Vector Classification. It is well suited for multiclass (but not multilabel) classification, for example classifying documents with the Dewey Decimal Classification or the 20 Newsgroups classification. It requires relatively little training data, and is suitable for classifications of up to around 10,000 classes. See the [SVC](https://github.com/NatLibFi/Annif/wiki/Backend%3A-SVC) wiki page for more information.

This release also upgrades many dependencies, which enables all Annif backends to run on Python 3.9 (previously nn_ensemble backend was available only for 3.6-3.8). The Docker image uses now Python 3.8 instead of 3.7.

Note that nn_ensemble models are not compatible across Python versions: e.g. a model trained on Python 3.7 can be used only on Python 3.7. Training the nn_ensemble models shows a `CustomMaskWarning`, but it is harmless (caused by a [TensorFlow bug](https://github.com/tensorflow/tensorflow/issues/49754)) and can be ignored.

Due to the update of scikit-learn, using TFIDF, MLLM or Omikuji models trained on older Annif versions will show warnings about the `TfidfVectorizer`. To the best of our knowledge, these are harmless and can be ignored. You have to retrain the models to get rid of the warnings.

This release includes also many minor improvements and bug fixes.

New features:
486 New SVC (support vector classification) backend using scikit-learn
439/461 YAKE backend
490/494 Make --version option show Annif version

Improvements:
488 Add support for ngram setting in omikuji backend

Maintenance:

0.52.0

This release includes a new MLLM backend which is a Python implementation of the Maui-like Lexical Matching algorithm. It was inspired by the [Maui algorithm](https://hdl.handle.net/10289/3513) (by Alyona Medelyan), but not a direct reimplementation. It is meant for long full-text documents and like Maui, it needs to be trained with a relatively small number (hundreds or thousands) of manually indexed documents so that the algorithm can choose the right mix of heuristics that achieves best results on a particular document collection. See [the MLLM Wiki page](https://github.com/NatLibFi/Annif/wiki/Backend%3A-MLLM) for more information.

New features include the possibility to configure two project parameters:
- `min_token_length` [can be set in the analyzer parameters](https://github.com/NatLibFi/Annif/wiki/Analyzers); e.g. setting the value to 2 allows the word "UK" to pass to a backend, while with the default value (3) the word is filtered out by the analyzer
- `lr` can be set in the [neural-network ensemble](https://github.com/NatLibFi/Annif/wiki/Backend%3A-nn_ensemble) project configuration to define the learning rate.

The STWFSA backend has been updated to use a newer version of the [stwfsapy library](https://github.com/zbw/stwfsapy). Old STWFSA models are not compatible with the new version so any STWFSA projects must be retrained. The release includes also several minor improvements and bug fixes.


New features:
462 New lexical backend MLLM
456/468 Allow configuration of token min length (credit: [mo-fu](https://github.com/mo-fu))
475 Allow configuration of nn ensemble learning rate (credit: [mo-fu](https://github.com/mo-fu))

Improvements:
478/479 Update stwfsa to 0.2.* (credit: [mo-fu](https://github.com/mo-fu))
472 Cleanup suggestion tests
480 Optimize check for deprecated subject IDs using a set

Maintenance:
474 Use GitHub Actions as CI service

Bug fixes:
470/471 Make sure suggestion scores are in the range 0.0-1.0
477 Optimize the optimize command
481 Backwards compatibility fix for the token_min_length setting
482 MLLM fix: don't include use_hidden_labels in hyperopt, it won't have any effect

Page 4 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.