What's Changed
* fix style, typos, license by natolambert in https://github.com/lvwerra/trl/pull/103
* fix re-added file by natolambert in https://github.com/lvwerra/trl/pull/116
* add citation by natolambert in https://github.com/lvwerra/trl/pull/124
* add manual seeding for RL experiments by natolambert in https://github.com/lvwerra/trl/pull/118
* add `set_seed` to __init__.py by lvwerra in https://github.com/lvwerra/trl/pull/127
* update docs with Seq2seq models, set_seed, and create_reference_model by lvwerra in https://github.com/lvwerra/trl/pull/128
* [`bug`] Update gpt2-sentiment.py by younesbelkada in https://github.com/lvwerra/trl/pull/132
* Fix Sentiment control notebook by lvwerra in https://github.com/lvwerra/trl/pull/126
* realign values by lvwerra in https://github.com/lvwerra/trl/pull/137
* Change unclear variables & fix typos by natolambert in https://github.com/lvwerra/trl/pull/134
* Feat/reward summarization example by TristanThrush in https://github.com/lvwerra/trl/pull/115
* [`core`] Small refactor of forward pass by younesbelkada in https://github.com/lvwerra/trl/pull/136
* [`tests`] Add correct repo name by younesbelkada in https://github.com/lvwerra/trl/pull/138
* fix forward batching for seq2seq and right padding models. by lvwerra in https://github.com/lvwerra/trl/pull/139
* fix bug in batched_forward_pass by ArvinZhuang in https://github.com/lvwerra/trl/pull/144
* [`core`] Add `torch_dtype` support by younesbelkada in https://github.com/lvwerra/trl/pull/147
* [`core`] Fix dataloader issue by younesbelkada in https://github.com/lvwerra/trl/pull/154
* [`core`] enable `bf16` training by younesbelkada in https://github.com/lvwerra/trl/pull/156
* [`core`] fix saving multi-gpu by younesbelkada in https://github.com/lvwerra/trl/pull/157
* Added imports by BirgerMoell in https://github.com/lvwerra/trl/pull/159
* Add CITATION.cff by kashif in https://github.com/lvwerra/trl/pull/169
* [Doc] Add how to use Lion optimizer by younesbelkada in https://github.com/lvwerra/trl/pull/152
* policy kl [old | new] by kashif in https://github.com/lvwerra/trl/pull/168
* add minibatching by lvwerra in https://github.com/lvwerra/trl/pull/153
* fix bugs in tutorial by shizhediao in https://github.com/lvwerra/trl/pull/175
* [`core`] Add `max_grad_norm` support by younesbelkada in https://github.com/lvwerra/trl/pull/177
* Add toxcitiy example by younesbelkada in https://github.com/lvwerra/trl/pull/162
* [`Docs`] Fix barplot by younesbelkada in https://github.com/lvwerra/trl/pull/181
New Contributors
* natolambert made their first contribution in https://github.com/lvwerra/trl/pull/103
* ArvinZhuang made their first contribution in https://github.com/lvwerra/trl/pull/144
* BirgerMoell made their first contribution in https://github.com/lvwerra/trl/pull/159
* kashif made their first contribution in https://github.com/lvwerra/trl/pull/169
* shizhediao made their first contribution in https://github.com/lvwerra/trl/pull/175
**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.2.1...v0.3.0