in the [Releases section](https://github.com/SeldonIO/MLServer/releases) of
1. Once you are happy with the draft, click `Publish`.
For official releases, MLServer uses long-lived release branches.
These branches will always follow the `release/<major>.<minor>.x` pattern (e.g.
`release/1.2.x`) and will be used for every `<major>.<minor>.x` [official
release](versioning-scheme) (e.g. the `release/1.2.x` will be used for
`1.2.0`, `1.2.1`, etc.).
Note that these branches will always be pushed straight to the main
`github.com/SeldonIO/MLServer` and not to a fork and will never get merged with
Therefore, when starting a new **major or minor official release** please
create a `release/<major>.<minor>.x` branch.
Alternatively, when preparing a **patch official release**, please
_cherry-pick_ all relevant merged PRs from `master` into the existing
The MLServer project publishes three types of release versions:
- **Dev pre-releases**, used to test new features before an official release.
They will follow the schema `<next-minor-version>.dev<incremental-index>`
- **Release candidates**, used to test an official release before the actual
This type of releases can be useful to test minor releases across different
projects. They follow the schema `<next-minor-version>.rc<incremental-index>`
- **Official releases**, used only for actual public releases. The version tag
will only contain the next minor version (e.g. `1.2.0`), without any
Based on the above, a usual release cycle between two official releases would
generally look like the following (where stability increases as you go down on
Each release of MLServer will build and publish a set of artifacts, both at the
runtime level and the wider MLServer level:
- Docker image containing every inference runtime maintained within the
MLServer repo, tagged as `seldonio/mlserver:<version>` (e.g.
Note that this image can grow quite large.
- _Slim_ Docker image containing only the core MLServer package (i.e. without
any runtimes), tagged as `seldonio/mlserver:<version>-slim` (e.g.
This image is used, as the default for custom runtimes.
- Python package for the core MLServer modules (i.e. without any runtime),
which will get published in PyPI, named simply `mlserver`.
- For each inference runtime (e.g. `mlserver-sklearn`, `mlserver-xgboost`,
- Docker image containing only that specific runtime, tagged as
- Python package for the specific runtime, which will get published in PyPI
[1.3.2](https://github.com/SeldonIO/MLServer/releases/tag/1.3.2) - 10 May 2023
<!-- Release notes generated using configuration in .github/release.yml at 1.4.0.dev2 -->
* Use default initialiser if not using a custom env by [adriangonz](https://github.com/adriangonz) in https://github.com/SeldonIO/MLServer/pull/1104
* Add support for online drift detectors by [ascillitoe](https://github.com/ascillitoe) in https://github.com/SeldonIO/MLServer/pull/1108
* added intera and inter op parallelism parameters to the hugggingface … by [saeid93](https://github.com/saeid93) in https://github.com/SeldonIO/MLServer/pull/1081
* Fix settings reference in runtime docs by [adriangonz](https://github.com/adriangonz) in https://github.com/SeldonIO/MLServer/pull/1109
* Bump Alibi libs requirements by [adriangonz](https://github.com/adriangonz) in https://github.com/SeldonIO/MLServer/pull/1121
* Add default LD_LIBRARY_PATH env var by [adriangonz](https://github.com/adriangonz) in https://github.com/SeldonIO/MLServer/pull/1120
* Ignore both .metrics and .envs folders by [adriangonz](https://github.com/adriangonz) in https://github.com/SeldonIO/MLServer/pull/1132
* [ascillitoe](https://github.com/ascillitoe) made their first contribution in https://github.com/SeldonIO/MLServer/pull/1108
**Full Changelog**: https://github.com/SeldonIO/MLServer/compare/1.3.1...1.3.2
[1.3.1](https://github.com/SeldonIO/MLServer/releases/tag/1.3.1) - 27 Apr 2023
- Move OpenAPI schemas into Python package ((https://github.com/SeldonIO/MLServer/issues/1095))
[1.3.0](https://github.com/SeldonIO/MLServer/releases/tag/1.3.0) - 27 Apr 2023
> WARNING :warning: : The `1.3.0` has been yanked from PyPi due to a packaging issue. This should have been now resolved in `>= 1.3.1`.
Custom Model Environments
More often that not, your custom runtimes will depend on external 3rd party dependencies which are not included within the main MLServer package - or different versions of the same package (e.g. `scikit-learn==1.1.0` vs `scikit-learn==1.2.0`). In these cases, to load your custom runtime, MLServer will need access to these dependencies.
In MLServer `1.3.0`, it is now [possible to load this custom set of dependencies by providing them](https://mlserver.readthedocs.io/en/latest/user-guide/custom.html#loading-a-custom-python-environment), through an [environment tarball](https://mlserver.readthedocs.io/en/latest/examples/conda/README.html), whose path can be specified within your `model-settings.json` file. This custom environment will get provisioned on the fly after loading a model - alongside the default environment and any other custom environments.
Under the hood, each of these environments will run their own separate pool of workers.
The MLServer framework now includes a simple interface that allows you to register and keep track of any [custom metrics](https://mlserver.readthedocs.io/en/latest/user-guide/metrics.html#custom-metrics):
- `[mlserver.register()](https://mlserver.readthedocs.io/en/latest/reference/api/metrics.html#mlserver.register)`: Register a new metric.
- `[mlserver.log()](https://mlserver.readthedocs.io/en/latest/reference/api/metrics.html#mlserver.log)`: Log a new set of metric / value pairs.
Custom metrics will generally be registered in the `[load()](https://mlserver.readthedocs.io/en/latest/reference/api/model.html#mlserver.MLModel.load)` method and then used in the `[predict()](https://mlserver.readthedocs.io/en/latest/reference/api/model.html#mlserver.MLModel.predict)` method of your [custom runtime](https://mlserver.readthedocs.io/en/latest/user-guide/custom.html). These metrics can then be polled and queried via [Prometheus](https://mlserver.readthedocs.io/en/latest/user-guide/metrics.html#settings).
MLServer `1.3.0` now includes an autogenerated Swagger UI which can be used to interact dynamically with the Open Inference Protocol.
The autogenerated Swagger UI can be accessed under the `/v2/docs` endpoint.
Alongside the [general API documentation](https://mlserver.readthedocs.io/en/latest/user-guide/openapi.html#Swagger-UI), MLServer also exposes now a set of API docs tailored to individual models, showing the specific endpoints available for each one.
The model-specific autogenerated Swagger UI can be accessed under the following endpoints:
MLServer now includes improved Codec support for all the main different types that can be returned by HugginFace models - ensuring that the values returned via the Open Inference Protocol are more semantic and meaningful.
Massive thanks to [pepesi](https://github.com/pepesi) for taking the lead on improving the HuggingFace runtime!
Support for Custom Model Repositories
Internally, MLServer leverages a Model Repository implementation which is used to discover and find different models (and their versions) available to load. The latest version of MLServer will now allow you to swap this for your own model repository implementation - letting you integrate against your own model repository workflows.
This is exposed via the [model_repository_implementation](https://mlserver.readthedocs.io/en/latest/reference/settings.html#mlserver.settings.Settings.model_repository_implementation) flag of your `settings.json` configuration file.
Thanks to [jgallardorama](https://github.com/jgallardorama) (aka [jgallardorama-itx](https://github.com/jgallardorama-itx) ) for his effort contributing this feature!
Batch and Worker Queue Metrics