Catboost

Latest version: v1.2.5

Safety actively analyzes 629811 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 14

0.20

Not secure

New submodule for text processing!
It contains two classes to help you make text features ready for training:
- [Tokenizer](https://github.com/catboost/catboost/blob/afb8331a638de280ba2aee3831ac9df631e254a0/library/text_processing/tokenizer/tokenizer.pxi#L77) -- use this class to split text into tokens (automatic lowercase and punctuation removal)
- [Dictionary](https://github.com/catboost/catboost/tree/master/library/text_processing/dictionary) -- with this class you create a dictionary which maps tokens to numeric identifiers. You then use these identifiers as new features.

New features:
- Enabled `boost_from_average` for `MAPE` loss function

Bug fixes:
- Fixed `Pool` creation from `pandas.DataFrame` with discontinuous columns, 1079
- Fixed `standalone_evaluator`, PR 1083

Speedups:
- Huge speedup of preprocessing in python-package for datasets with many samples (>10 mln)

0.19.1

Not secure

New features:
- With this release we support `Text` features for *classification on GPU*. To specify text columns use `text_features` parameter. Achieve better quality by using text information of your dataset. See more in [Learning CatBoost with text features](https://github.com/catboost/tutorials/blob/master/text_features/text_features_in_catboost.ipynb)
- `MultiRMSE` loss function is now available on CPU. Labels for the multi regression mode should be specified in separate `Label` columns
- MonoForest framework for model analysis, based on our NeurIPS 2019 [paper](https://papers.nips.cc/paper/9530-monoforest-framework-for-tree-ensemble-analysis). Learn more in [MonoForest tutorial](https://github.com/catboost/tutorials/tree/master/model_analysis/monoforest_tutorial.ipynb)
- `boost_from_average` is now `True` by default for `Quantile` and `MAE` loss functions, which improves the resulting quality

Speedups:
- Huge reduction of preprocessing time for datasets loaded from files and for datasets with many samples (> 10 million), which was a bottleneck for GPU training
- 3x speedup for small datasets

0.18.1

Not secure

New features:
- Now `datasets.msrank()` returns _full_ msrank dataset. Previously, it returned the first 10k samples.
We have added `msrank_10k()` dataset implementing the past behaviour.

Bug fixes:
- `get_object_importance()` now respects parameter `top_size`, 1045 by ibuda

0.18

Not secure

- The main feature of the release is huge speedup on small datasets. We now use MVS sampling for CPU regression and binary classification training by default, together with `Plain` boosting scheme for both small and large datasets. This change not only gives the huge speedup but also provides quality improvement!
- The `boost_from_average` parameter is available in `CatBoostClassifier` and `CatBoostRegressor`
- We have added new formats for describing monotonic constraints. For example, `"(1,0,0,-1)"` or `"0:1,3:-1"` or `"FeatureName0:1,FeatureName3:-1"` are all valid specifications. With Python and `params-file` json, lists and dictionaries can also be used

Bugs fixed:
- Error in `Multiclass` classifier training, 1040
- Unhandled exception when saving quantized pool, 1021
- Python 3.7: `RuntimeError` raised in `StagedPredictIterator`, 848

0.17.5

Not secure

Bugs fixed:
- `System of linear equations is not positive definite` when training MultiClass on Windows, 1022

0.17.4

Not secure

Improvements:
- Massive 2x speedup for `MultiClass` with many classes
- Updated MVS implementation. See _Minimal Variance Sampling in Stochastic Gradient Boosting_ by Bulat Ibragimov and Gleb Gusev at [NeurIPS 2019](https://neurips.cc/Conferences/2019)
- Added `sum_models` in R-package, 1007

Bugs fixed:
- Multi model initialization in python, 995
- Mishandling of 255 borders in training on GPU, 1010

Page 6 of 14

Releases

Has known vulnerabilities

Previous Next

Catboost

Page 6 of 14

0.20

0.19.1

0.18.1

0.18

0.17.5

0.17.4

Page 6 of 14

Links

Releases