Catboost

Latest version: v1.2.5

Safety actively analyzes 621521 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 14

1.2.5

New features
* \[Python-package\]: Support custom eval metrics on GPU. 1792. Thanks to pnsemyon.

Bugfixes
* \[Python-package\]: Check eval_period parameter validity for staged prediction. 2593
* \[Python-package\]: Fix _CustomLoggersStack.pop logic. 2620
* \[R-package\]: Fix Caret object: Inconsistent grid creation with documentation. 2606
* \[JVM applier\]: Fix issues with exposing undesired symbols in JNI shared libraries (including allocators) on macOS. 2606
* Fix training with embedding features on GPU. 2249, 2308, 2591
* Fix training with text features on GPU
* Use correct sample count in MultiRMSE on multiple GPUs. 2557
* Fix sign of 2nd order derivative in Huber loss
* Enable gradient walker for non-additive metrics
* Fixes for Cox objective: buffer overflow in derivatives calculation, derivatives summation, metric calculation, disable ordered boosting
* Fix text features data serialization in the model files

1.2.3

Python package
* Support Python 3.12. 2510
* \[Performance\]: Fix ineffective loops in Cython. Significant speedups (up to 3x) on dataset construction from data in C-order can be expected.
* \[Performance\]: Make features data initialization from C-order `numpy.ndarray`s with `float32` data type multithreaded. Significant speedups of 5x up to 10x (on CPUs with many cores) can be expected. 385, 2542
* Save training metrics into the model metadata. So `best_score_`, `evals_result_`, `best_iteration_` model attributes now work after model saving and loading. Can be removed by model metadata manipulation if needed. 1166
* \[Breaking change\]. Support a separate boolean target type, now `Class` predictions for models that have been trained with boolean targets will also be boolean instead of `True`, `False` strings as before. Such models will be incompatible with the previous versions of CatBoost appliers. If you want the old behavior convert your target to `False`, `True` strings before training. 1954
* Restrict `jupyterlab` version for setup to 3.x for now. Fixes 2530
* `utils.read_cd`: Support CD files with non-increasing column indices.
* Make `log_cout`, `log_cerr` specification consistent, avoid reset in recursive calls.
* Late-initialize default values for `log_cout`, `log_cerr`. 2195
* Add missing generated metrics: `Cox`, `PairLogitPairwise`, `UserPerObjMetric`, `SurvivalAft`.

New features
* Support boolean target/labels type during training in Python and Spark (in the latter case only when using `fit` with `Pool` arguments) and `Class` prediction in Python. 1954
* \[Spark\]: Support Spark 3.5.x.
* \[C/C++ applier\]. Add functions for getting indices of features of different types to C and C++ API. 2568. Thanks to nimusp.
* \[C/C++ applier\]. Add staged prediction functions to C API. 2584. Thanks to Mb-NextTime.
* \[JVM applier\]. Add loading CatBoostModel from a byte array to API. 2539
* \[Linux\] Support CgroupsV2 when computing default number of threads used in parallel computations. 2519. Thanks to elukey.
* \[CLI\] Support printing `Auxiliary` columns by name in evaluation result output. 1659
* Save training metrics into the model metadata. Can be removed by model metadata manipulation if needed. 1166

Build & testing
* \[Windows\]: Use `clang-cl` from Visual Studio 2022 for the build without CUDA (build with CUDA still uses standard Microsoft toolchain from Visual Studio 2019).
* \[macOS\]: Pass `os.version` to `conan` host settings to ensure version consistency.
* \[Linux aarch64\]: Set `-mno-outline-atomics` for modern versions of CLang and GCC to avoid unresolved symbols linking errors. 2527
* Added missing `CMakeLists` for unit tests for `util`. 2525

Bugfixes
* \[Performance\]: Fix performance regression that could slow down training on GPU by 50% on some datasets that had been introduced in release 1.2. Thanks to JeanPaulShapo.
* \[Python-package\]: Fix segfault on Pool(data=None). 2522
* \[Python-package\]: Fix Python exception in `Pool()` when `pairs_weight` is a numpy array. 1913
* \[Python-package\]: Fix segfault and other strange errors when specifying custom logger with `__call__` method. 2277
* \[Python-package\]: Fix returning complex params in hyperparameter search. 1741, 1833
* \[Python-package\]: Fix ignored exceptions for missed metrics descriptions on startup. This has not been visible to users but has been making debugging more difficult.
* \[Python-package\]: Fix misleading `Targets are required for YetiRank loss function.` error in Cross validation. 2083
* \[Python-package\]: Fix `Pool.get_label()` returns constant `True` for boolean labels. 2133
* \[Python-package\]: Copying models does not lose `best_score_`, `evals_result_`, `best_iteration_` attributes values anymore. 1793
* \[Spark\]: Fix hangs at the end of the training. 2151
* `Precision` metric default value in the absense of positive samples is changed to 0 and a warning is added
(similar to the behavior of `scikit-learn` implementation). 2422
* Fix ignoring embedding features
* Try to avoid hash collisions when computing group ids with datasets with a lot of groups (may occur in datasets with around a 10^9 samples).
* Fix Multiclass models export to C++ and Python code. 2549
* Fix dataset_statistics mode when no `Target` data is available.
* Fix `Error: can't proceed some features` error on GPU. 1024
* Fix `allow_const_label=True` for classification. 1933
* Add checking of approx and target dimensions for `SurvivalAft` objective/metric.
* Fix Focal loss derivatives sign. 2563

1.2.2

Bugfixes
* Fix Segmentation fault when using custom `eval_metric` in binary python packages of version 1.2.1 on PyPI. 2486
* Fix LossFunctionChange fstr with embedding features.
* Fix a segmentation fault in JVM applier when using embedding features on JVM 11+.
* Fix CTR data handling in model summation (especially for models with CTRs with multiple target quantizations).

1.2.1

New features
* Allow to optimize specific ranking loss functions with YetiRank and YetiRankPairwise by specifying `mode` parameter. See [Which Tricks are Important for Learning to Rank?](https://arxiv.org/abs/2204.01500) paper for details (this family of losses is called `YetiLoss` there). CPU-only for now.
* Add Kernel Gradient Boosting support (use `catboost.sample_gaussian_process` function). 2408, thanks to TakeOver. See [Gradient Boosting Performs Gaussian Process Inference](https://arxiv.org/abs/2206.05608) paper for details.
* LambdaMart loss: support new target metrics MRR, ERR and MAP.
* StochasticRank loss: support new target metrics ERR and MRR.
* Support MultiRMSE on GPU. 2264, 2390
* Load JSON model format in Java Client. 1627, thanks to timotta
* Implement exporting of Multiclass models to C++ and Python. 2283, thanks to antoninkriz

Improvements
* Speedup BM25 feature calcers 3x
* Use `int` instead of deprecated `numpy.int`. 2378
* Add `ModelCalcerWrapper::CalcFlatTransposed`, 2413 thanks to faucct
* Update dependencies to avoid known vulnerabilities

Bugfixes
* Fix __shfl_up_sync mask. 2339
* TFocalMetric negative values fix. 2386, thanks to diditforlulz273
* Focal loss: Use user-defined alpha and gamma
* Fix exception propagation: Rethrow exceptions caused by user's python code as C++ exceptions
* CatBoost trained with user defined objective was incompatible with ShapValues calculation
* Avoid nan's in Newton step calculation for RMSEWithUncertainty
* Fix score method for y with shape (N, 1). 2405
* Fix scalePosWeight support for Spark. 2470

1.2

Not secure
Major changes
CatBoost's build system has been switched from Ya Make (Yandex's build system) to [CMake](https://cmake.org/). This means more transparency in the build process and more familiar tools for Open Source developers.
For now it is possible to build CatBoost for:
* Linux on x86-64 with or without CUDA
* Linux on aarch64 with or without CUDA
* macOS on x86-64 and arm64, including creating universal binaries
* Windows on x86-64 with or without CUDA
* Android (only model applier) on [All supported ABIs](https://developer.android.com/ndk/guides/abis).

This allowed us to prepare the Python package in the source distribution form (also known as `sdist`). 830

* `msvs` subdirectory with the Microsoft Visual Studio solution has been removed. Visual Studio solutions can be generated using CMake instead.
* `make` subdirectory with Makefiles has been removed. Use `CMake` + `ninja` (recommended) or `CMake` + `make` instead.

Python package
* Switch to the standard Python build and installation method that uses `setup.py` instead of the custom `mk_wheel.py` script. All common scenarios (`sdist`, `build`, `install`, editable `install`, `bdist_wheel`) are supported.
* Switch wheel platform tag on Linux from obsolete `manylinux1` to `manylinux2014`.
* The source distribution is now available on PyPI. 830
* Support Python 3.11. 2213
* Drop support for obsolete Python 3.6.
* Make wheels [PEP427](https://peps.python.org/pep-0427/)-compliant. #2165
* Fix wrong checksums in wheels that caused problems with poetry. 2331
* Improved performance due to caching TBB local executors. 2203
* Add `fixed_binary_splits` to the regressor, classifier, and ranker.
* Compatibility with pandas 2.0. 2320
* CatBoost widget is now compatible with ipywidgets 8.x. 2266

Rust package
* Support CUDA applier. 1925, thanks to getumen.
* Properly forward debug/release setting to native library build.
* Passing features: switch from `String` and `Vec` types for features to `AsRef` of slices to make code more generic
* Support text and embedding features.
* Support multidimensional output in predictions.

New features
* \[JVM applier\]: Support CUDA.
* \[Spark\]: Support Spark 3.4.x (if you want to use Spark with python 3.11 use this version).
* Static model applier library now works on Windows.
* Add `binary-classification-threshold` parameter to the CLI model applier.
* Support Multi-target regression with text features (but only Bag-of-Words features are generated for now). 2229
* Support `RMSEWithUncertainty` loss function on GPU.
* Support `MultiLogloss` and `MultiCrossEntropy` loss functions with numerical features on GPU.
* Support `MultiLogloss` loss function with text features on CPU and GPU. 1885
* Enable univariate metrics for models with uncertainty
* Add `Focal` loss (CPU-only for now). 1807, thanks to diditforlulz273.

Improvements
* Removed legacy dependency on Python 2 interpreter in the build process. 2297
* Calc metrics: Throw catboost exception if column index exceeds column count.
* Speedup `MultiLogloss` on CPU by 8% per tree (110K samples, 20 targets, 480 float features, 3 cat features, 16 cores CPU).
* Update .NET projects from obsolete .NET Core 2.1 to .NET Core 3.1.
* Code generation for new CUDA Compute Architectures 8.6, 8.9 and 9.0 is enabled by default (requires CUDA 11.8 to build from source).
* Check that evaluator implementation is available in `TFullModel::SetEvaluatorType` (it was possible to get a Segmentation fault when calling it for non-available implementstion). Add `TFullModel::GetSupportedEvaluatorTypes`.
* Cross Validation on GPU no longer requires `allow_write_files=True`.

Bugfixes
* \[Python-package\]: Clear model params before load_model. Fixes 2205.
* \[Python-package\]: Fix CatBoostRanker score computation. 2231
* \[Python-package\]: Fix `_get_embedding_feature_indices`. 2273
* \[Python-package\]: Fix `set_feature_names` with text or embedding features. 2090
* \[Python-package\]: pandas.Categorical.categories is not necessarily a numpy.ndarray. 1965
* \[Spark\]: Pass classpath in a file to avoid hitting cmdline length limits. 1842
* \[CUDA Applier\]: Apply scale and bias.
* \[CUDA Applier\]: Fix that `libs/model_interface applier` always produced an error in CUDA mode.
* Fix CUDA error 700 in pairwise ranking.
* Fix kernel registration for distributed training on GPU.
* Fix `floating point exception' on CPU for small datasets on GPU.
* Fix wrong log message 'There are invalid params and some of them will be ignored'. 2253
* Fix incorrect results and crashes for GPU applier on Nvidia Ampere - based GPUs.
* Fix 'CUDA error 9' in Multi-GPU training.
* Fix serialization of embedding features structures in the model.
* Fix GPU buffer overrun in distributed multi-classification training.
* Fix `catboost/cuda/cuda_util/sort.cpp:166: CUDA error 9` on Nvidia Ampere - based GPUs.
* Fix inf/nan parsing in dataset input files.
* Fix floating point exception for very small datasets on GPU.
* Fix: built static applier library lacked the part with 'global' objects. 2187
* Fix sum of models with categorical features with CTRs.
* Fix: model_interface/cmake_example failed build "‘runtime_error’ is not a member of ‘std’". 2324, thanks to Mandelag.
* Fix Segmentation fault in Cross Validation and hyperparameter search functions that use it on GPU.
* Fix Segmentation fault in `utils.eval_metrics` for groupwise metrics when group data has not been specified. 2343
* Fix errors when running Cross Validation repeatedly on GPU. 2221

1.1.1

Not secure
New features
* Support building for Linux on aarch64 from sources using CMake (no prebuilt binaries or PyPI packages yet). 1981
* [C/C++ applier] Support embedding features. 2172
* [C/C++ applier] Add `GetModelUsedFeaturesNames`. 2204
* [Python] Add text features to `utils.create_cd`. 2193
* [Spark] Full support for Apache Spark 3.3
* [Spark] Read/write PySpark's DataFrame-like API for Pool. 2030
* [Spark] Allow to specify trainingDriver and worker listening ports. 2181

Bugfixes
* Fix prediction dimension check for RMSEWithUncertainty and MultiQuantile. 2155
* [C/C++ applier] Fix segmentation fault in prediction for multiple objects for multiple dimension models.
* [JVM applier] Fix catboost-common dependency version in catboost-prediction (Fixes JVM applier on macOS). 2121
* [Python] Update for pandas 1.5.0: iteritems -> items (Fixes annoying deprecation warning). 2179
* [Python] Fix segmentation fault when target is `np.ndarray` with `dtype=object`. 2201
* [Python] Fix specifying `feature_names` in `utils.create_cd`. 2211

Page 1 of 14

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.