Autogluon

Latest version: v1.1.0

Safety actively analyzes 629811 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 6

0.5.2

Not secure

v0.5.2 is a security hotfix release.

This release is **non-breaking** when upgrading from v0.5.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.5.1...v0.5.2

This version supports Python versions 3.7 to 3.9.

0.5.1

Not secure

We're happy to announce the AutoGluon 0.5 release. This release contains major optimizations and bug fixes to autogluon.multimodal and autogluon.timeseries modules, as well as inference speed improvements to autogluon.tabular.

This release is non-breaking when upgrading from v0.5.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains [58 commits from 14 contributors](https://github.com/awslabs/autogluon/graphs/contributors?from=2022-06-22&to=2022-07-18&type=c)!

Full Contributor List (ordered by of commits):

- zhiqiangdon, yinweisu, Innixma, canerturkmen, sxjscience, bryanyzhu, jsharpna, gidler, gradientsky, Linuxdex, muxuezi, yiqings, huibinshen, FANGAreNotGnu

This version supports Python versions 3.7 to 3.9.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.5.0...v0.5.1

AutoMM

Changed to a new namespace `autogluon.multimodal` (AutoMM), which is a deep learning "model zoo" of model zoos. On one hand, AutoMM can automatically train deep models for unimodal (image-only, text-only or tabular-only) problems. On the other hand, AutoMM can automatically solve multimodal (any combinations of image, text, and tabular) problems by fusing multiple deep learning models. In addition, AutoMM can be used as a base model in AutoGluon Tabular and participate in the model ensemble.

New features

- Supported zero-shot learning with CLIP (1922) zhiqiangdon
- Users can directly perform zero-shot image classification with the [CLIP model](https://arxiv.org/abs/2103.00020). Moreover, users can extract image and text embeddings with CLIP to do image-to-text or text-to-image retrieval.

- Improved efficient finetuning
- Support “bit_fit”, “norm_fit“, “lora”, “lora_bias”, “lora_norm”. In four multilingual datasets ([xnli](https://huggingface.co/datasets/xnli), [stsb_multi_mt](http://stsb_multi_mt/), [paws-x](https://huggingface.co/datasets/paws-x), [amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi)), “lora_bias”, which is a combination of [LoRA](https://arxiv.org/abs/2106.09685) and [BitFit](https://arxiv.org/abs/2106.10199), achieved the best overall performance. Compared to finetuning the whole network, “lora_bias” will only finetune **<0.5%** of the network parameters and can achieve comparable performance on “stsb_multi_mt” (#1780, 1809). Raldir zhiqiangdon
- Support finetuning the [mT5-XL](https://huggingface.co/google/mt5-xl) model that has 1.7B parameters on a single NVIDIA G4 GPU. In AutoMM, we only use the T5-encoder (1.7B parameters) like [Sentence-T5](https://aclanthology.org/2022.findings-acl.146.pdf). (#1933) sxjscience

- Added more data augmentation techniques
- [Mixup](https://arxiv.org/pdf/1710.09412.pdf) for image data. (#1730) Linuxdex
- [TrivialAugment](https://arxiv.org/pdf/2103.10158.pdf) for both image and text data. (#1792) lzcemma
- [Easy text augmentations](https://arxiv.org/pdf/1901.11196.pdf). (#1756) lzcemma

- Enhanced teacher-student model distillation
- Support distilling the knowledge from a unimodal/multimodal teacher model to a student model. (1670, 1895) zhiqiangdon

More tutorials and examples

- [Beginner tutorials](https://auto.gluon.ai/stable/tutorials/multimodal/index.html) of applying AutoMM to image, text, or multimodal (including tabular) data. (#1861, 1908, 1858, 1869) bryanyzhu sxjscience zhiqiangdon

- [A zero-shot image classification tutorial](https://auto.gluon.ai/stable/tutorials/multimodal/clip_zeroshot.html) with the CLIP model. (#1942) bryanyzhu

- A tutorial of using [CLIP model to extract embeddings](https://auto.gluon.ai/stable/tutorials/multimodal/clip_embedding.html) for image-text retrieval. (#1957) bryanyzhu

- [A tutorial](https://auto.gluon.ai/stable/tutorials/multimodal/customization.html) to introduce comprehensive AutoMM configurations (#1861). zhiqiangdon

- [AutoMM for tabular data examples](https://github.com/awslabs/autogluon/tree/master/examples/automm/tabular_dl) (#1752, 1893, 1903). yiqings

- [AutoMM distillation example](https://github.com/awslabs/autogluon/tree/master/examples/automm/distillation) (#1846). FANGAreNotGnu

- A Kaggle notebook about how to use AutoMM to predict pet adoption: https://www.kaggle.com/code/linuxdex/use-autogluon-to-predict-pet-adoption. The model achieves the score equivalent to **top 1% (20th/3537) in this kernel-only competition (test data is only available in the kernel without internet access)** (#1796, 1847, 1894, 1943). Linuxdex

0.5.0

Not secure

We're happy to announce the AutoGluon 0.5 release. This release contains major new modules `autogluon.timeseries` and `autogluon.multimodal`. In collaboration with [the Yu Group](https://www.stat.berkeley.edu/~yugroup/) of Statistics and EECS from UC Berkeley, we have added interpretable models (imodels) to `autogluon.tabular`.

This release is non-breaking when upgrading from v0.4.2. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains [**91** commits from **13** contributors](https://github.com/awslabs/autogluon/graphs/contributors?from=2022-06-01&to=2022-06-22&type=c)!

Full Contributor List (ordered by of commits):

- Innixma, canerturkmen, zhiqiangdon, sxjscience, yinweisu, Linuxdex, yiqings, gradientsky, csinva, FANGAreNotGnu, huibinshen, Raldir, lzcemma

The imodels integration is based on the following work,

Singh, C., Nasseri, K., Tan, Y.S., Tang, T. and Yu, B., 2021. [imodels: a python package for fitting interpretable models.](https://joss.theoj.org/papers/10.21105/joss.03192#) Journal of Open Source Software, 6(61), p.3192.

This version supports Python versions 3.7 to 3.9.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.4.1...v0.5.0

Full release notes will be available shortly.

0.4.3

Not secure

v0.4.3 is a security hotfix release.

This release is **non-breaking** when upgrading from v0.4.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.4.2...v0.4.3

This version supports Python versions 3.7 to 3.9.

0.4.2

Not secure

v0.4.2 is a hotfix release to fix [breaking change](https://github.com/protocolbuffers/protobuf/issues/10051) in protobuf.

This release is **non-breaking** when upgrading from v0.4.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.4.1...v0.4.2

This version supports Python versions 3.7 to 3.9.

0.4.1

Not secure

We're happy to announce the AutoGluon 0.4.1 release. 0.4.1 contains minor enhancements to Tabular, Text, Image, and Multimodal modules, along with many quality of life improvements and fixes.

This release is **non-breaking** when upgrading from v0.4.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains [**55** commits from **10** contributors](https://github.com/awslabs/autogluon/graphs/contributors?from=2022-03-10&to=2022-05-23&type=c)!

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.4.0...v0.4.1

Special thanks to yiqings, leandroimail, huibinshen who were first time contributors to AutoGluon this release!

Full Contributor List (ordered by of commits):
- Innixma, zhiqiangdon, yinweisu, sxjscience, yiqings, gradientsky, willsmithorg, canerturkmen, leandroimail, huibinshen.

This version supports Python versions 3.7 to 3.9.

Changes

AutoMM

New features

- Added `optimization.efficient_finetune` flag to support multiple efficient finetuning algorithms. (1666) sxjscience
- Supported options:
- `bit_fit`: ["BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models"](https://arxiv.org/abs/2106.10199)
- `norm_fit`: An extension of the algorithm in ["Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs"](https://arxiv.org/abs/2003.00152) and BitFit. We finetune both the parameters in the norm layers as long as the biases.

- Enabled knowledge distillation for AutoMM (1670) zhiqiangdon
- Distillation API for `AutoMMPredictor` reuses the `.fit()` function:

python
from autogluon.text.automm import AutoMMPredictor
teacher_predictor = AutoMMPredictor(label="label_column").fit(train_data)
student_predictor = AutoMMPredictor(label="label_column").fit(
train_data,
hyperparameters=student_and_distiller_hparams,
teacher_predictor=teacher_predictor,
)

- Option to turn on returning feature column information (1711) zhiqiangdon
- The feature column information is turned on for feature column distillation; for other cases it is turned off by default to reduce dataloader‘s latency.
- Added a `requires_column_info` flag in data processors and a utility function to turn this flag on or off.

- FT-Transformer implementation for tabular data in AutoMM (1646) yiqings
- Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko, "Revisiting Deep Learning Models for Tabular Data" 2022. ([arxiv](https://arxiv.org/abs/2106.11959), [official implementation](https://github.com/Yura52/tabular-dl-revisiting-models))

- Make CLIP support multiple images per sample (1606) zhiqiangdon
- Added multiple images support for CLIP. Improved data loader robustness: added missing images handling to prevent training crashes.
- Added the choice of using a zero image if an image is missing.

- Avoid using `eos` as the sep token for CLIP. (1710) zhiqiangdon

- Update fusion transformer in AutoMM (1712) yiqings
- Support constant learning rate in `polynomial_decay` scheduler.
- Update `[CLS]` token in numerical/categorical transformer.

- Added more image augmentations: `verticalflip`, `colorjitter`, `randomaffine` (1719) Linuxdex, sxjscience

- Added prompts for the percentage of missing images during image column detection. (1623) zhiqiangdon

- Support `average_precision` in AutoMM (1697) sxjscience

- Convert `roc_auc` / `average_precision` to `log_loss` for torchmetrics (1715) zhiqiangdon
- `torchmetrics.AUROC` requires both positive and negative examples are available in a mini-batch. When training a large model, the per gpu batch size is probably small, leading to an incorrect `roc_auc` score. Conversion from `roc_auc` to `log_loss` improves training stablility.

- Added `pytorch-lightning` 1.6 support (1716) sxjscience

Checkpointing and Model Outputs Changes

- Updated the names of top-k checkpoint average methods and support customizing model names for terminal input (1668) zhiqiangdon
- Following paper: https://arxiv.org/pdf/2203.05482.pdf to update top-k checkpoint average names: `union_soup` -> `uniform_soup` and `best_soup` -> `best`.
- Update function names (`customize_config_names` -> `customize_model_names` and `verify_config_names` -> `verify_model_names`) to make it easier to understand them.
- Support customizing model names for the terminal input.

- Implemented the GreedySoup algorithm proposed in [paper](https://arxiv.org/pdf/2203.05482.pdf). Added `union_soup`, `greedy_soup`, `best_soup` flags and changed the default value correspondingly. (#1613) sxjscience

- Updated the `standalone` flag in `automm.predictor.save()` to save the pertained model for offline deployment (1575) yiqings
- An efficient implementation to save the donwloaded models from transformers for the offline deployment. Revised logic is in 1572, and discussed in 1572 (comment).

- Simplified checkpoint template (1636) zhiqiangdon
- Stopped using pytorch lightning's model checkpoint template in saving `AutoMMPredictor`'s final model checkpoint.
- Improved the logic of continuous training. We pass the `ckpt_path` argument to pytorch lightning's trainer only when `resume=True`.

- Unified AutoMM's model output format and support customizing model names (1643) zhiqiangdon
- Now each model's output is dictionary with the model prefix as the first level key. The format is uniform between single model and fusion model.
- Now users can customize model names by using the internal registered names (`timm_image`, `hf_text`, `clip`, `numerical_mlp`, `categorical_mlp`, and `fusion_mlp`) as prefixes. This is helpful when users want to simultaneously use two models of the same type, e.g., `hf_text`. They can just use names `hf_text_0` and `hf_text_1`.

- Support `standalone` feature in `TextPredictor` (1651) yiqings

- Fixed saving and loading tokenizers and text processors (1656) zhiqiangdon
- Saved pre-trained huggingface tokenizers separately from the data processors.
- This change is backwards-compatibile with checkpoints saved by verison `0.4.0`.

- Change load from a classmethod to staticmethod to avoid incorrect usage. (1697) sxjscience

- Added `AutoMMModelCheckpoint` to avoid evaluating the models to obtain the scores (1716) sxjscience
- checkpoint will save the best_k_models into a yaml file so that it can be loaded later to determine the path to model checkpoints.

- Extract column features from AutoMM's model outputs (1718) zhiqiangdon
- Add one util function to extract column features for both image and text.
- Support extracting column features for models `timm_image`, `hf_text`, and `clip`.

- Make AutoMM dataloader return feature column information (1710) zhiqiangdon

Bug fixes

- Fixed calling `save_pretrained_configs` in `AutoMMPrediction.save(standalone=True)` when no fusion model exists ([here](https://github.com/awslabs/autogluon/blob/5a323641072431091d2be5e6dbef5a87b646a408/text/src/autogluon/text/automm/utils.py#L644 )) (1651) yiqings

- Fixed error raising for setting key that does not exist in the configuration (1613) sxjscience

- Fixed warning message about bf16. (1625) sxjscience

- Fixed the corner case of calculating the gradient accumulation step (1633) sxjscience

- Fixes for top-k averaging in the multi-gpu setting (1707) zhiqiangdon

Tabular

- Limited RF `max_leaf_nodes` to 15000 (previously uncapped) (1717) Innixma
- Previously, for very large datasets RF/XT memory and disk usage would quickly become unreasonable. This ensures that at a certain point RF and XT will no longer become larger given more rows of training data. Benchmark results showed that the change is an improvement, particularly for the `high_quality` preset.

- Limit KNN to 32 CPUs to avoid OpenBLAS error (1722) Innixma
- Issue 1020. When training K-nearest-neighbors (KNN) models, sometimes a rare error can occur that crashes the entire process:

BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault: 11

This error occurred when the machine had many CPU cores (>64 vCPUs) due to too many threads being created at once. By limiting to 32 cores used, the error is avoided.

- Improved memory warning thresholds (1626) Innixma

- Added `get_results` and `model_base_kwargs` (1618) Innixma
- Added `get_results` to searchers, useful for debugging and for future extensions to HPO functionality.
Added new way to init a `BaggedEnsembleModel` that avoids having to init the base model prior to initing the bagged ensemble model.

- Update resource logic in models (1689) Innixma
- Previous implementation would crash if user specified `auto` for resources, fixed in this PR.
- Added `get_minimum_resources` to explicitly define minimum resource requirements within a method.

- Updated feature importance default `subsample_size` 1000 -> 5000, `num_shuffle_sets 3` -> 5 (1708) Innixma
- This will improve the quality of the feature importance values by default, especially the 99% confidence bounds. The change increases the time taken by ~8x, but this is acceptable because of the numerous inference speed optimizations done since these defaults were first introduced.

- Added notice to ensure serializable custom metrics (1705) Innixma

Bug fixes

- Fixed `evaluate` when `weight_evaluation=True` (1612) Innixma
- Previously, AutoGluon would crash if the user specified `predictor.evaluate(...)` or `predictor.evaluate_predictions(...)` when `self.weight_evaluation==True`.

- Fixed RuntimeError: dictionary changed size during iteration (1684, 1685) leandroimail

- Fixed CatBoost custom metric & F1 support (1690) Innixma

- Fixed HPO not working for bagged models if the bagged model is loaded from disk (1702) Innixma

- Fixed Feature importance erroring if `self.model_best` is `None` (can happen if no Weighted Ensemble is fit) (1702) Innixma

Documentation

- updated the text tutorial of cutomizing hyperparameters (1620) zhiqiangdon
- Added customizeable backbones from the Huggingface model zoo and how to use local backbones.

- Improved implementations and docstrings of `save_pretrained_models` and `convert_checkpoint_name`. (1656) zhiqiangdon

- Added cheat sheet to website (1605) yinweisu

- Doc fix to use correct predictor when calling leaderboard (1652) Innixma

Miscellaneous changes

- [security] updated `pillow` to `9.0.1`+ (1615) gradientsky

- [security] updated `ray` to `1.10.0`+ (1616) yinweisu

- Tabular regression tests improvements (1555) willsmithorg
- Regression testing of model list and scores in tabular on small synthetic datasets (for speed).
- Tests about 20 different calls to `TabularPredictor` on both regression and classification tasks, multiple presets etc.
- When a test fails it dumps out the config change required to make it pass, for ease of updating.

- Disabled image/text predictor when gpu is not available in `TabularPredictor` (1676) yinweisu
- Resources are validated before bagging is started. Image/text predictor model would require minimum of 1 gpu.

- Use class property to set keys in model classes. In this way, if we customize the prefix key, other keys are automatically updated. (1669) zhiqiangdon

Various bugfixes, documentation and CI improvements
- yinweisu (1605, 1611, 1631, 1638, 1691)
- zhiqiangdon (1721)
- Innixma (1608, 1701)
- sxjscience (1714)

Page 3 of 6

Releases

Has known vulnerabilities

Previous Next

Autogluon

Page 3 of 6

0.5.2

0.5.1

0.5.0

0.4.3

0.4.2

0.4.1

Page 3 of 6

Links

Releases