Trl

Latest version: v0.8.6

Safety actively analyzes 629564 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 6

0.4.4

Patch release

* [`core`] unpin accelerate by younesbelkada in https://github.com/lvwerra/trl/pull/418

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.3...v0.4.4

0.4.3

Patch release - pin accelerate version

* Skip flaky test until next transformers release by younesbelkada in https://github.com/lvwerra/trl/pull/410
* Pin accelerate version by younesbelkada in https://github.com/lvwerra/trl/pull/414

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.2...v0.4.3

0.4.2

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes `RewardTrainer` and `SFTTrainer` to easily conduct your RLHF projects end-to-end!

Introducing `SFTTrainer` and `RewardTrainer`

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

* [`core`] officially support SFT (Supervised Finetuning) by younesbelkada in https://github.com/lvwerra/trl/pull/323
* [`SFT`] Fix sft issues by younesbelkada in https://github.com/lvwerra/trl/pull/336
* [`docs`] fix SFT doc by younesbelkada in https://github.com/lvwerra/trl/pull/367
* [`core`] Officially Support Reward Modeling by younesbelkada in https://github.com/lvwerra/trl/pull/303
* Resolve broken evaluation/prediction for RewardTrainer by tomaarsen in https://github.com/lvwerra/trl/pull/404

QLoRA integration

Pass 4bit models directly into `PPOTrainer` for more memory efficient training

* [`core`] Add 4bit QLora by younesbelkada in https://github.com/lvwerra/trl/pull/383
* [`bnb`] fix 4 bit SFT by younesbelkada in https://github.com/lvwerra/trl/pull/396

Updated StackLlama example

Great work by mnoukhov that managed to fix the issues related with StackLlama and the new versions of `accelerate`, `peft` and `transformers`. The completely reproducible examples below:

* StackLLaMA: correctly merge peft model by mnoukhov in https://github.com/lvwerra/trl/pull/398
* StackLlama: fixed RL training and added args by mnoukhov in https://github.com/lvwerra/trl/pull/400
* Fixed some type annotations of trl.trainer.PPoTrainer by JulesGM in https://github.com/lvwerra/trl/pull/392
* StackLLaMA: fix supervised finetuning and reward model training by mnoukhov in https://github.com/lvwerra/trl/pull/399

Bug fixes and improvements

* [`core`] refactor peft API by younesbelkada in https://github.com/lvwerra/trl/pull/231
* Batched generation by lvwerra in https://github.com/lvwerra/trl/pull/228
* Reduce memory consumption in batched_forward_pass by ohashi56225 in https://github.com/lvwerra/trl/pull/234
* [`core`] Add warning when negative KL by younesbelkada in https://github.com/lvwerra/trl/pull/239
* adds early stopping by edbeeching in https://github.com/lvwerra/trl/pull/238
* PPO config __init__ is bloated by GauravVirmani in https://github.com/lvwerra/trl/pull/241
* feat(ci): enable `pip` cache by SauravMaheshkar in https://github.com/lvwerra/trl/pull/198
* Improve logging for PPO + Docs page by natolambert in https://github.com/lvwerra/trl/pull/243
* Fix typo by heya5 in https://github.com/lvwerra/trl/pull/253
* Using batched generate in sentiment scripts by GauravVirmani in https://github.com/lvwerra/trl/pull/249
* [`core`] Fix DeepSpeed zero-3 issue by younesbelkada in https://github.com/lvwerra/trl/pull/182
* [`distributed`] Fix early stopping and DP by younesbelkada in https://github.com/lvwerra/trl/pull/254
* [`core`] Fix ds issue by younesbelkada in https://github.com/lvwerra/trl/pull/260
* Add LlaMa in tests + `create_reference_model` by younesbelkada in https://github.com/lvwerra/trl/pull/261
* Use active model to generate response in example on README (269) by rmill040 in https://github.com/lvwerra/trl/pull/271
* stack-llama by edbeeching in https://github.com/lvwerra/trl/pull/273
* Adding pointer back to Meta's LLaMA. by meg-huggingface in https://github.com/lvwerra/trl/pull/277
* fix doc string problem in ppo trainer loss function by thuwyh in https://github.com/lvwerra/trl/pull/279
* Add LLaMA tutorial to docs by natolambert in https://github.com/lvwerra/trl/pull/278
* Fix swapped helper texts by philipp-classen in https://github.com/lvwerra/trl/pull/284
* fix typo in gpt2-sentiment.ipynb by eltociear in https://github.com/lvwerra/trl/pull/293
* add functionality to push best models to the hub during training by Bearnardd in https://github.com/lvwerra/trl/pull/275
* Small improvements / fixes to toxicity example by natolambert in https://github.com/lvwerra/trl/pull/266
* Fix arguments description by lvzii in https://github.com/lvwerra/trl/pull/298
* [`t5`] Fix negative kl issue by younesbelkada in https://github.com/lvwerra/trl/pull/262
* Log Token distribution of Query / Response by natolambert in https://github.com/lvwerra/trl/pull/295
* clean examples folder by natolambert in https://github.com/lvwerra/trl/pull/294
* fixed typo in error message by soerenarlt in https://github.com/lvwerra/trl/pull/312
* fix DS for peft ref_model in ppo trainer by halfrot in https://github.com/lvwerra/trl/pull/309
* [`CI`] Fix broken tests by younesbelkada in https://github.com/lvwerra/trl/pull/318
* [`Docs`] Add details on multi-GPU / multi-node by younesbelkada in https://github.com/lvwerra/trl/pull/320
* Give a key to the wandb PPOConfig config entry by JulesGM in https://github.com/lvwerra/trl/pull/315
* added doc for using torch.distributed.launch/run by oroojlooy in https://github.com/lvwerra/trl/pull/324
* Fix argument's description by vinhkhuc in https://github.com/lvwerra/trl/pull/339
* stack_llama: update instructions in README, fix broken _get_submodules and save tokenizer by teticio in https://github.com/lvwerra/trl/pull/358
* stack_llama: add parameter to control max_length (to mitigate OOM errors) by teticio in https://github.com/lvwerra/trl/pull/359
* [`PPO`] Relax negative KL constraint by younesbelkada in https://github.com/lvwerra/trl/pull/352
* [`PPOTrainer`] Fix tensorboard issue by younesbelkada in https://github.com/lvwerra/trl/pull/330
* 140/best n sampling by metric-space in https://github.com/lvwerra/trl/pull/326
* Fix bug when loading local peft model by Opdoop in https://github.com/lvwerra/trl/pull/342
* add is_trainable in kwargs by Opdoop in https://github.com/lvwerra/trl/pull/363
* Remove obsolete layer_norm_names parameter and add peft>=0.3.0 to requirements by teticio in https://github.com/lvwerra/trl/pull/366
* Delete test_training.py by younesbelkada in https://github.com/lvwerra/trl/pull/371
* [`core`] Fix warning issue by younesbelkada in https://github.com/lvwerra/trl/pull/377
* Update customization.mdx by binganao in https://github.com/lvwerra/trl/pull/390
* fix dataloader typo in ppo_trainer.py by LZY-the-boys in https://github.com/lvwerra/trl/pull/389
* from_pretrain with peft adapter on the hub ( 379) by glerzing in https://github.com/lvwerra/trl/pull/380
* keep state_dict kwargs instead of popping it in save_pretrained by rizar in https://github.com/lvwerra/trl/pull/393
* Remove unused imports in docs. by vwxyzjn in https://github.com/lvwerra/trl/pull/406

New Contributors

* ohashi56225 made their first contribution in https://github.com/lvwerra/trl/pull/234
* GauravVirmani made their first contribution in https://github.com/lvwerra/trl/pull/241
* SauravMaheshkar made their first contribution in https://github.com/lvwerra/trl/pull/198
* heya5 made their first contribution in https://github.com/lvwerra/trl/pull/253
* rmill040 made their first contribution in https://github.com/lvwerra/trl/pull/271
* thuwyh made their first contribution in https://github.com/lvwerra/trl/pull/279
* philipp-classen made their first contribution in https://github.com/lvwerra/trl/pull/284
* Bearnardd made their first contribution in https://github.com/lvwerra/trl/pull/275
* lvzii made their first contribution in https://github.com/lvwerra/trl/pull/298
* soerenarlt made their first contribution in https://github.com/lvwerra/trl/pull/312
* halfrot made their first contribution in https://github.com/lvwerra/trl/pull/309
* oroojlooy made their first contribution in https://github.com/lvwerra/trl/pull/324
* vinhkhuc made their first contribution in https://github.com/lvwerra/trl/pull/339
* teticio made their first contribution in https://github.com/lvwerra/trl/pull/358
* metric-space made their first contribution in https://github.com/lvwerra/trl/pull/326
* Opdoop made their first contribution in https://github.com/lvwerra/trl/pull/342
* binganao made their first contribution in https://github.com/lvwerra/trl/pull/390
* LZY-the-boys made their first contribution in https://github.com/lvwerra/trl/pull/389
* glerzing made their first contribution in https://github.com/lvwerra/trl/pull/380
* rizar made their first contribution in https://github.com/lvwerra/trl/pull/393
* mnoukhov made their first contribution in https://github.com/lvwerra/trl/pull/398
* tomaarsen made their first contribution in https://github.com/lvwerra/trl/pull/404
* vwxyzjn made their first contribution in https://github.com/lvwerra/trl/pull/406

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.1...v0.4.2

0.4.1

Large models training, Naive Pipeline Parallelism, `peft` Data Parallelism support and distributed training bug fixes

This release includes a set of features and bug fixes to scale up your RLHF experiments for much larger models leveraging `peft` and `bitsandbytes`.

Naive Pipeline Parallelism support

* Let's support naive Pipeline Parallelism by younesbelkada in https://github.com/lvwerra/trl/pull/210

We introduce a new paradigm in `trl` , termed as Naive Pipeline Parallelism, to fit large scale models on your training setup and apply RLHF on them. This feature uses `peft` to train adapters and `bitsandbytes` to reduce the memory foot print of your active model

![image](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl-npp.png)

`peft` Data Parallelism support

* [`peft`] Fix DP issues by younesbelkada in https://github.com/lvwerra/trl/pull/221
* [`core`] fix DP issue by younesbelkada in https://github.com/lvwerra/trl/pull/222

There were some bugs with respect to `peft` integration and DP. This release includes the bug fixes to enable multi-GPU training using `accelerate` + DDP (DIstributed Data Parallel)

Memory optimization

Your training runs can be now much more memory efficient thanks to few tricks / bug fixes:
Now `PPOConfig` also supports the flag `optimize_cuda_cache` (set to `False` by default) to avoid increasing CUDA memory issues

* Grad accumulation and memory bugfix by edbeeching in https://github.com/lvwerra/trl/pull/220
* adds a missing detach to the ratio by edbeeching in https://github.com/lvwerra/trl/pull/224

Pytorch 2.0 fixes

This release also includes minor fixes related to PyTorch 2.0 release

* [`test`] attempt to fix CI test for PT 2.0 by younesbelkada in https://github.com/lvwerra/trl/pull/225

What's Changed

* adds sentiment example for a 20b model by edbeeching in https://github.com/lvwerra/trl/pull/208
* Update README.md blog post link by TeamDman in https://github.com/lvwerra/trl/pull/212
* spell mistakes by k-for-code in https://github.com/lvwerra/trl/pull/213
* spell corrections by k-for-code in https://github.com/lvwerra/trl/pull/214
* Small changes when integrating into H4 by natolambert in https://github.com/lvwerra/trl/pull/216

New Contributors
* TeamDman made their first contribution in https://github.com/lvwerra/trl/pull/212
* k-for-code made their first contribution in https://github.com/lvwerra/trl/pull/213

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.0...v0.4.1

0.4.0

Apply RLHF and fine-tune your favorite large model on consumer GPU using `peft` and `trl` ! Share also easily your trained RLHF adapters on the Hub with few lines of code

With this integration you can train `gpt-neo-x` (20B parameter model - 40GB in `bfloat16`) on a 24GB consumer GPU!

What's Changed

* Allow running evaluate-toxicity with cpu by jordimas in https://github.com/lvwerra/trl/pull/195
* [`core`] Fix quality issue by younesbelkada in https://github.com/lvwerra/trl/pull/197
* Add 1.12.1 torch compatibility in sum method by PanchenkoYehor in https://github.com/lvwerra/trl/pull/190
* `peft` integration by edbeeching in https://github.com/lvwerra/trl/pull/163
* [`core`] Update dependency by younesbelkada in https://github.com/lvwerra/trl/pull/206

New Contributors

* PanchenkoYehor made their first contribution in https://github.com/lvwerra/trl/pull/190

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.3.1...v0.4.0

0.3.1

What's Changed
* Clarifications of acronyms and initialisms by meg-huggingface in https://github.com/lvwerra/trl/pull/185
* Update detoxifying_a_lm.mdx by younesbelkada in https://github.com/lvwerra/trl/pull/186
* Fix reference to example by jordimas in https://github.com/lvwerra/trl/pull/184

New Contributors
* meg-huggingface made their first contribution in https://github.com/lvwerra/trl/pull/185
* jordimas made their first contribution in https://github.com/lvwerra/trl/pull/184

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.3.0...v0.3.1

Page 5 of 6

Releases

Has known vulnerabilities

Previous Next

Trl

Page 5 of 6

0.4.4

0.4.3

0.4.2

0.4.1

0.4.0

0.3.1

Page 5 of 6

Links

Releases