Annif

Latest version: v1.1.0

Safety actively analyzes 630602 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 8

0.61

691 Upgrade Docker image to Python 3.10

Bug fixes:
674/677 Memory leak in NN ensemble backend

0.61.0

The main improvements in this release are internal changes to allow batch processing of documents for better suggestion performance and the streamlining of suggestion result representation by using sparse arrays. Currently batched processing of documents is implemented in the Omikuji, SVC, and all ensemble backends. Also a new REST API method for suggesting subjects for multiple documents has been added.

The new REST API method `/v1/projects/{project_id}/suggest-batch` accepts at most 32 documents in one POST request; the documents in the batch are processed in parallel when the used backend provides support for this. The request body is given in JSON format and, like in the case of the regular single-document suggest method, the limit, threshold and language parameters are optional and can be given as URL query parameters. For details see the [interactive OpenAPI documention](https://api.annif.org/v1/ui/#/Automatic%20subject%20indexing/annif.rest.suggest_batch) of the REST API of annif.org.

The [`annif suggest`](https://annif.readthedocs.io/en/v0.61.0/source/commands.html#annif-suggest) CLI command is augmented to accept path(s) to file(s) to be processed, in addition to stdin, to enable it to operate on multiple documents. The [`annif optimize`](https://annif.readthedocs.io/en/v0.61.0/source/commands.html#annif-optimize) command is now much faster than before and supports using a `--jobs` parameter for parallel processing.

The Annif Docker image has been updated to use Python 3.10.

Also various maintenance tasks have been performed, for example, the default branch of the git repository has been renamed from `master` to `main`, the [Schemathesis](https://github.com/schemathesis/schemathesis) tool has been introduced for testing the REST API and many dependendencies have been updated. A bug causing a memory leak in the neural network ensemble backend bas been fixed.

The next release of Annif will be version 1.0. For this purpose we have opened the [issue 616](https://github.com/NatLibFi/Annif/issues/616) for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.

Backward compatibility:
* Models trained with Annif v0.60 should remain working; the warnings by SciKit-learn are harmless
* LRAP metric has been removed from evaluation results

New features:
664 Add REST API method `/v1/projects/{project_id}/suggest-batch`
663 Support for batch suggest operations for CLI commands
423/681 Parallelize optimize command

Improvements:
678/681 Represent suggestion results as sparse arrays
665/669 Batch suggest in Omikuji backend
667/670 Batch suggest in SVC backend
677 Batch suggest in ensemble backends
671 Add log message indicating finishing projects initialization
673 Suppress duplicate log messages from subject module

Maintenance:
668 Migrate codestyle to Black v23
679/680 Switch default git branch to main
672 Fix slow CI/CD runs for Python 3.10
675 Refactor and cleanup CLI module
682/685 Schemathesis tests for REST API and OpenAPI schema fixes

0.60

647/661 Order of projects when using project configuration directory

Maintenance:
609/640 Use black code style
641 Use isort to order import statements
656 Install linting tools with Poetry in CI/CD pipeline
624 Increase timeout of test and publish GH Actions jobs
653 Add CodeQL workflow for GitHub code scanning
599/650 Avoid using pytest-flake8 plugin
 657/662 Upgrade GitHub Actions
636 Better set up for docker-compose

0.60.0

This release includes improvements and maintenance updates in particular to the Web UI and REST API as well as some new functionality, especially related to multilingual support. The Web UI no longer relies on jQuery, as the last parts that were used were replaced with Axios. The REST API and Web UI updates are by UnniKohonen, who has joined NatLibFi as a trainee in the Annif & Finto development teams.

It is now possible to override the language for subject suggestion labels instead of always using the project language: when using the `annif suggest` command by giving the new `--language/-L` option, and when using the REST API suggest method by the new optional `language` parameter.

A new resource is added to the root of the REST API (i.e. `http://<annif_host>/v1/`) that gives basic information on the API (a title for the API and the version of Annif being used). Also, the REST API spec has been updated to OpenAPI 3.0. In the Web UI it is now possible to see detailed information about a project (language, backend type, modification timestamp etc.). 

Multiprocessing support for Mac OS and Windows environments has been improved by supporting the 'spawn' multiprocessing mode.

The language detection is now performed with [Simplemma](https://github.com/adbar/simplemma) instead of pycld3. This functionality is now installed by default instead of being an optional extra.

New code style tools Black and isort are now used to help maintaining good code quality; see [CONTRIBUTING.md](https://github.com/NatLibFi/Annif/blob/master/CONTRIBUTING.md) how they can be used and instructions to how best participate in Annif development.

Many dependendencies have been updated to their most recent versions.

Note also that we are preparing for Annif 1.0 release. For this purpose we have opened the [issue 616](https://github.com/NatLibFi/Annif/issues/616) for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.


Backward compatibility:

* Models trained with Annif v0.59 should remain working; the warnings by SciKit-learn are harmless
* The `annif loadvoc` command has been removed, as in the previous release it was deprecated and replaced by the `annif load-vocab` command.

New features:
628/630 Allow overriding subject label language in CLI and REST suggest operations
637/638 Add support for spawn multiprocessing mode
654 Add project info to web UI
655/658 Add REST API root resource

Improvements:
593/626 Use Simplemma instead of pycld3 for language detection
643 Add CONTRIBUTING.md file
645 Use tailored user-agent in requests by HTTP-backend
644/649 Upgrade REST API spec to OpenAPI 3
627 Upgrade joblib to 1.2.x

0.59.0

This release makes many changes to how Annif handles vocabularies.

First, the vocabularies are now multilingual: projects with different languages can share the same vocabulary by using a common vocabulary id in the project configurations. The vocabulary id should no longer include a language specifier, which has been the practice until now. The language of the labels of subject suggestions is now defined by the project's language setting, or it can be overridden in a project by giving the language code in parentheses after the vocabulary id (e.g. `vocab=lcsh(en)` in a Finnish language project). These changes break the backward compatibility of existing projects and vocabularies.

The CLI command for loading a vocabulary has changed: the command is now `annif load-vocab` to align with the other annif commands and its first argument is a vocabulary id instead of a project id. When loading a vocabulary from a TSV file the `--language` option needs to be given to set the language. A command `annif list-vocabs` is introduced for listing vocabularies. The old `annif loadvoc` command still works in this release, but it has been deprecated and will be removed in the next Annif release.

The CLI commands are now documented in a [page on the ReadTheDocs](https://annif.readthedocs.io/en/stable/source/commands.html) instead of the Annif wiki. The development installations of Annif now use [Poetry](https://python-poetry.org/) for managing Python virtual environments and dependencies. There are also a few other minor changes, including an upgrade to Simplemma v0.8 series that introduced support for new languages.

Note also that we are starting to prepare for Annif 1.0 release. For this purpose we have opened the [issue 616](https://github.com/NatLibFi/Annif/issues/616) for discussing the expectations of backward compatibility and Semantic Versioning in releases beyond 1.0.

Backward compatibility
The changes in the vocabulary functionality require **reloading of previously loaded vocabularies** and **retraining of existing models**.

New features
559/600 Make vocabularies multilingual
602/614 Implement `load-vocab` and `list-vocabs` commands
603/610 Store vocabs in AnnifRegistry so they are shared between projects
597 Include labels without language tag and concepts without labels in vocabulary

Improvements
617/618 Upgrade to simplemma 0.8 and disable unnecessary cache
595/611 Autogenerated CLI commands documentation on ReadTheDocs
612 Add Annif logo to ReadTheDocs sidebar
608 Multilingual SubjectIndex backed by CSV file
604 Refactor SubjectSuggestion to store subject_id - not uri, label, notation

Maintenance
607 Remove language suffixes from vocabulary ids in example config
606 Refactor SubjectSet and Document to store subject IDs instead of URIs and labels
601/605 Switch to Poetry for dependency management
621 Remove curl from Docker image
622 Remove Poetry cache from Docker image

Fixes
613 Restore ability to use vocab language different from project language
619 Allow use of hyphens in vocabulary IDs
620 Make NN ensemble suggest operations silent

0.58.0

This release introduces a new [Simplemma analyzer](https://github.com/NatLibFi/Annif/wiki/Analyzers#simplemma-analyzer), support for multiple configuration files in a directory, and support for Python 3.10; support for Python 3.7 is removed.

[Simplemma](https://github.com/adbar/simplemma) is a lightweight multilingual lemmatizer, which currently supports 38 languages; an analyzer based on Simplemma is now implemented as a core feature of Annif. Using multiple project configuration files is made possible by implementing support for a project configuration directory: Annif reads all files matching pattern `*.cfg` and `*.toml` in the directory and merges their contents. The default name of the configuration directory is `projects.d`, but any directory can be selected with `-p/--projects` command option or `ANNIF_PROJECTS` environment variable.

Python 3.10 support is reached by updating multiple dependencies; retraining of existing projects should not be necessary. The language filtering optional feature is not yet available on Python 3.10, because of the lack of support of pycld3 for Python 3.10.

New features:
* 584/585 Support for multiple configuration files in a directory
* 590/591 Add simplemma analyzer

Improvements:
* 589/592 Add Python 3.10 support & update dependencies

Maintenance:
* 594 Upgrade Simplemma to version 0.7
* 587 Update GitHub Actions
* 588 Delete .coveragerc configuration file
* 598 Pin flake8 to version 4.x to avoid pytest-flake8 breakage

Bug fixes:
* 586 Fix readthedocs documentation builds

Page 2 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.