Trl

Latest version: v0.8.6

Safety actively analyzes 629564 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 6

0.7.6

Patch release: Multi-tag instead of single tags for `xxxTrainer`

This is a patch release to push multiple tags (e.g. `trl` & `sft`) instead of one tag

What's Changed

0.7.5

IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

Important enhancements for `DPOTrainer`

This release introduces many new features in TRL for `DPOTrainer`:

- IPO-loss for a better generalization of DPO algorithm
- KTO & cDPO loss
- You can also pass pre-computed logits to `DPOTrainer`

* [DPO] Refactor eval logging of dpo trainer by mnoukhov in https://github.com/huggingface/trl/pull/954
* Fixes reward and text gathering in distributed training by edbeeching in https://github.com/huggingface/trl/pull/850
* remove spurious optimize_cuda_cache deprecation warning on init by ChanderG in https://github.com/huggingface/trl/pull/1045
* Revert "[DPO] Refactor eval logging of dpo trainer (954)" by lvwerra in https://github.com/huggingface/trl/pull/1047
* Fix DPOTrainer + PEFT 2 by rdk31 in https://github.com/huggingface/trl/pull/1049
* [DPO] IPO Training loss by kashif in https://github.com/huggingface/trl/pull/1022
* [DPO] cDPO loss by kashif in https://github.com/huggingface/trl/pull/1035
* [DPO] use ref model logprobs if it exists in the data by kashif in https://github.com/huggingface/trl/pull/885
* [DP0] save eval_dataset for subsequent calls by kashif in https://github.com/huggingface/trl/pull/1125
* [DPO] rename kto loss by kashif in https://github.com/huggingface/trl/pull/1127
* [DPO] add KTO loss by kashif in https://github.com/huggingface/trl/pull/1075

Automatic `xxxTrainer` tagging on the Hub

Now, trainers from TRL pushes automatically tags `trl-sft`, `trl-dpo`, `trl-ddpo` when pushing models on the Hub

* [`xxxTrainer`] Add tags to all trainers in TRL by younesbelkada in https://github.com/huggingface/trl/pull/1120

unsloth 🤝 TRL

We encourage users to try out [unsloth library](https://github.com/unslothai/unsloth) for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

* [`Docs`] Add unsloth optimizations in TRL's documentation by younesbelkada in https://github.com/huggingface/trl/pull/1119

What's Changed

* set dev version by younesbelkada in https://github.com/huggingface/trl/pull/970
* [`Tests`] Add non optional packages tests by younesbelkada in https://github.com/huggingface/trl/pull/974
* [DOCS] Fix outdated references to `examples/` by alvarobartt in https://github.com/huggingface/trl/pull/977
* Update README.md by GeekDream-x in https://github.com/huggingface/trl/pull/994
* [DataCollatorForCompletionOnlyLM] Warn on identical `eos_token_id` and `pad_token_id` by MustSave in https://github.com/huggingface/trl/pull/988
* [`DataCollatorForCompletionOnlyLM`] Add more clarification / guidance in the case `tokenizer.pad_token_id == tokenizer.eos_token_id` by younesbelkada in https://github.com/huggingface/trl/pull/992
* make distributed true for multiple process by allanj in https://github.com/huggingface/trl/pull/997
* Fixed wrong trigger for warning by zabealbe in https://github.com/huggingface/trl/pull/971
* Update how_to_train.md by halfrot in https://github.com/huggingface/trl/pull/1003
* Adds `requires_grad` to input for non-quantized peft models by younesbelkada in https://github.com/huggingface/trl/pull/1006
* [Multi-Adapter PPO] Fix and Refactor reward model adapter by mnoukhov in https://github.com/huggingface/trl/pull/982
* Remove duplicate data loading in rl_training.py by viethoangtranduong in https://github.com/huggingface/trl/pull/1020
* [Document] Minor fixes of sft_trainer document by mutichung in https://github.com/huggingface/trl/pull/1029
* Update utils.py by ZihanWang314 in https://github.com/huggingface/trl/pull/1012
* spelling is hard by grahamannett in https://github.com/huggingface/trl/pull/1043
* Fixing accelerator version function call. by ParthaEth in https://github.com/huggingface/trl/pull/1056
* [SFT Trainer] precompute packed iterable into a dataset by lvwerra in https://github.com/huggingface/trl/pull/979
* Update doc CI by lewtun in https://github.com/huggingface/trl/pull/1060
* Improve PreTrainedModelWrapper._get_current_device by billvsme in https://github.com/huggingface/trl/pull/1048
* Update doc for the computer_metrics argument of SFTTrainer by albertauyeung in https://github.com/huggingface/trl/pull/1062
* [`core`] Fix failing tests on main by younesbelkada in https://github.com/huggingface/trl/pull/1065
* [`SFTTrainer`] Fix Trainer when args is None by younesbelkada in https://github.com/huggingface/trl/pull/1064
* enable multiple eval datasets by peter-sk in https://github.com/huggingface/trl/pull/1052
* Add missing `loss_type` in `ValueError` message by alvarobartt in https://github.com/huggingface/trl/pull/1067
* Add args to SFT example by lewtun in https://github.com/huggingface/trl/pull/1079
* add local folder support as input for rl_training. by sywangyi in https://github.com/huggingface/trl/pull/1078
* Make CI happy by younesbelkada in https://github.com/huggingface/trl/pull/1080
* Removing `tyro` in `sft_llama2.py` by vwxyzjn in https://github.com/huggingface/trl/pull/1081
* Log arg consistency by tcapelle in https://github.com/huggingface/trl/pull/1084
* Updated documentation for docs/source/reward_trainer.mdx to import th… by cm2435 in https://github.com/huggingface/trl/pull/1092
* [Feature] Add Ascend NPU accelerator support by statelesshz in https://github.com/huggingface/trl/pull/1096
* `peft_module_casting_to_bf16` util method, `append_concat_token` flag, remove callback `PeftSavingCallback` by pacman100 in https://github.com/huggingface/trl/pull/1110
* Make prepending of bos token configurable. by pacman100 in https://github.com/huggingface/trl/pull/1114
* fix gradient checkpointing when using PEFT by pacman100 in https://github.com/huggingface/trl/pull/1118
* Update `description` in `setup.py` by alvarobartt in https://github.com/huggingface/trl/pull/1101

New Contributors

* alvarobartt made their first contribution in https://github.com/huggingface/trl/pull/977
* GeekDream-x made their first contribution in https://github.com/huggingface/trl/pull/994
* MustSave made their first contribution in https://github.com/huggingface/trl/pull/988
* allanj made their first contribution in https://github.com/huggingface/trl/pull/997
* zabealbe made their first contribution in https://github.com/huggingface/trl/pull/971
* viethoangtranduong made their first contribution in https://github.com/huggingface/trl/pull/1020
* mutichung made their first contribution in https://github.com/huggingface/trl/pull/1029
* ZihanWang314 made their first contribution in https://github.com/huggingface/trl/pull/1012
* grahamannett made their first contribution in https://github.com/huggingface/trl/pull/1043
* ChanderG made their first contribution in https://github.com/huggingface/trl/pull/1045
* rdk31 made their first contribution in https://github.com/huggingface/trl/pull/1049
* ParthaEth made their first contribution in https://github.com/huggingface/trl/pull/1056
* billvsme made their first contribution in https://github.com/huggingface/trl/pull/1048
* albertauyeung made their first contribution in https://github.com/huggingface/trl/pull/1062
* peter-sk made their first contribution in https://github.com/huggingface/trl/pull/1052
* sywangyi made their first contribution in https://github.com/huggingface/trl/pull/1078
* tcapelle made their first contribution in https://github.com/huggingface/trl/pull/1084
* cm2435 made their first contribution in https://github.com/huggingface/trl/pull/1092
* statelesshz made their first contribution in https://github.com/huggingface/trl/pull/1096
* pacman100 made their first contribution in https://github.com/huggingface/trl/pull/1110

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.7.4...v0.7.5

0.7.4

Patch Release

This release is a patch release that addresses an issue for users that have TRL installed without PEFT

What's Changed

0.7.3

`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

In this release we introduce two new features, `IterativeTrainer` from gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

* Introducing the Iterative Trainer by gaetanlop in https://github.com/huggingface/trl/pull/737

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper [“NEFTune: Noisy Embeddings Improve Instruction Finetuning”](https://arxiv.org/abs/2310.05914) from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

* [`SFTTrainer`] Adds NEFTune into `SFTTrainer` by younesbelkada in https://github.com/huggingface/trl/pull/871
* [`NEFTune`] Make use of forward hooks instead by younesbelkada in https://github.com/huggingface/trl/pull/889
* Generalize NEFTune for FSDP, DDP, ... by younesbelkada in https://github.com/huggingface/trl/pull/924
* [`NEFTune`] Make use of forward hooks instead by younesbelkada in https://github.com/huggingface/trl/pull/889

Read more about it [here](https://huggingface.co/docs/trl/sft_trainer#enhance-models-performances-using-neftune)

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

* [`DPO`] fix DPO + GC issues by younesbelkada in https://github.com/huggingface/trl/pull/927
* [`core` / `DDP`] Fix RM trainer + DDP + quantization + propagate `gradient_checkpointing_kwargs` in SFT & DPO by younesbelkada in https://github.com/huggingface/trl/pull/912

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

* [DPO] add SLiC hinge loss to DPOTrainer by kashif in https://github.com/huggingface/trl/pull/866
* Fix DPOTrainer + PEFT by younesbelkada in https://github.com/huggingface/trl/pull/941
* [DPO] Merge initial peft model if trainer has a peft_config by kashif in https://github.com/huggingface/trl/pull/956
* Adds model kwargs to SFT and DPO trainers by edbeeching in https://github.com/huggingface/trl/pull/951
* fix: dpo trainer ds config by mengban in https://github.com/huggingface/trl/pull/957
* hotfix for dpo trainer by mnoukhov in https://github.com/huggingface/trl/pull/919
* Fix dpo_llama2.py by younesbelkada in https://github.com/huggingface/trl/pull/934

What's Changed

0.7.2

In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with `SFTTrainer`

How to use Flash Attention with `SFTTrainer`:

* Update sft_trainer.mdx to highlight Flash Attention features by younesbelkada in https://github.com/huggingface/trl/pull/807

What's Changed

0.7.1

Patch release: fix bug with `PPOTrainer` and `log_stats`

Fixed a bug with `log_stats` of `PPOTrainer` to avoid breaking behaviour

* [`PPOTrainer`] A workaround for failing log_stats by younesbelkada in https://github.com/huggingface/trl/pull/708

What's Changed

Page 3 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.