Torchrl

Latest version: v0.3.1

Safety actively analyzes 621521 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.3.1

This release provides a bunch of bug fixes and speedups.

What's Changed

[BugFix] Fix broken gym tests (1980)
[BugFix,CI] Fix Windows CI (1983)
[Minor] Cleanup
[CI] Install stable torch and tensordict for release tests (1978)
[Refactor] Remove remnant legacy functional calls (1973)
[Minor] Use the main branch for the M1 build wheels (1965)
[BugFix] Fixed import for importlib (1914)
[BugFix] Fix offline CatFrames for pixels (1964)
[BugFix] Fix offline CatFrames (1953)
[BugFix] Fix batch-size expansion in functionalization (1959)
[BugFix] Update iql docstring example (1950)
[BugFix] Update cql docstring example (1951)
[BugFix] Fix examples (1945)
[BugFix] Remove reset on last step of a rollout (1936)
[BugFix] Vmap randomness for value estimator (1942)
[BugFix] Fix multiple context syntax in multiagent examples (1943)
[BugFix] Fix habitat (1941)
[BugFix] Fix env.shape regex matches (1940)
[Minor] Add env.shape attribute (1938)
[BugFix] Fix replay buffer extension with lists (1937)
[BugFix] No grad on collector reset (1927)
[BugFix] fix trunc normal device (1931)
[BugFix, Performance] Fewer imports at root (1930)
[BugFix] Fix OOB TruncatedNormal LP (1924)
[BugFix] Fix KLPENPPOLoss KL computation (1922)
[Doc] Fix onw typo (1917)
[BugFix] Make sure ParallelEnv does not overflow mem when policy requires grad (1909)
[BugFix] Non exclusive terminated and truncated (1911)
[BugFix] Use setdefault in _cache_values (1910)
[BugFix] Fix Ray collector example error (1908)
[BugFix] Make KL-controllers independent of the model (1903)
[Minor] Remove warnings in test_cost (1902)
[BugFix] Adaptable non-blocking for mps and non cuda device in batched-envs (1900)
[BugFix] Fix flaky rb tests (1901)
[BugFix] Fix exploration in losses (1898)
[BugFix] Solve recursion issue in losses hook (1897)
[Doc] Update getting-started-5.py (1894)
[Doc] Getting started tutos (1886)
[BugFix] Use traj_terminated in SliceSampler (1884)
[Doc] Improve PrioritizedSampler doc and get rid of np dependency as much as possible (1881)
[BugFix] Fix _reset data passing in parallel env (1880)
[BugFix] state typo in RNG control module (1878)
[BugFix] Fix a bug in SliceSampler, indexes outside sampler lengths were produced (1874)
[BugFix] check_env_specs seeding logic (1872)
[BugFix] Fix update in serial / parallel env (1866)
[Doc] Installation instructions in API ref (1871)
[BugFix] better device consistency in EGreedy (1867)
[BugFix] Fix load_state_dict and is_empty td bugfix impact (1869)
[Doc] Fix tutos (1863)

**Full Changelog**: https://github.com/pytorch/rl/compare/v0.3.0...v0.3.1

0.3.0

In this release, we focused on building a [**Data Hub for offline RL**](https://pytorch.org/rl/reference/data.html#datasets), providing a universal <lib>2gym conversion tool (1795) and improving the doc.

TorchRL Data Hub

TorchRL now offers many offline datasets in robotics and control or gaming, all under a single data format ([TED for TorchRL Episode Data Format](https://pytorch.org/rl/reference/data.html#torchrl-episode-data-format-ted)). All datasets are one step away of being downloaded: `dataset = <Name>ExperienceReplay(dataset_id, root="/path/to/storage", download=True)` is all you need to get started.
This means that you can now download OpenX 1751 or Roboset 1743 datasets and combine them in a single replay buffer 1768 or swap one another in no time and with no extra code.
We allow many new sampling techniques, like sampling slices of trajectories with or without repetition etc.
As always you can append your favourite transform to these transforms.

TorchRL2Gym universal converter

1795 introduces a new universal converter for simulation libraries to gym.
As RL practitioner, it's sometimes difficult to accommodate for the many different environment APIs that exist. TorchRL now provides a way of registering any env in gym(nasium). This allows users to build their dataset in torchrl and integrate them in their code base with no effort if they are already using gym as a backend. It also allows to transform DMControl or Brax envs (among others) to gym without the need for an extra library.

PPO and A2C compatibility with distributed models

Functional calls can now be turned off for PPO and A2C loss modules, allowing users to run RLHF training loops at scale! 1804

 TensorDict-free replay buffers

You can now use TorchRL's replay buffer with ANY tensor-based structure, whether it involves dict, tuples or lists. In principle, storing data **contiguously** on **disk** given any gym environment is as simple as

python
rb = ReplayBuffer(storage=LazyMemmapStorage(capacity))
obs_, reward, terminal, truncated, info = env.step(action)
rb.add((obs, obs_, reward, terminal, truncated, info, action))

sampling a tuple obs, reward, terminal, truncated, info
obs, obs_, reward, terminal, truncated, info = rb.sample()


This is independent of TensorDict and it supports many components of our replay buffers as well as transforms. Check the doc [here](https://pytorch.org/rl/reference/data.html#composable-replay-buffers).

 Multiprocessed replay buffers

TorchRL's replay buffers can now be shared across processes. Multiprocessed RBs can not only be read from but also extended on different workers. 1724

SOTA checks

We introduce a list of scripts to check that our training scripts work ok before each release: 1822

Throughput of Gym and DMControl

We removed loads of checks in GymLikeEnv if some basic conditions are met, which improves the throughput significantly for simple envs. 1803

 Algorithms

We introduce discrete CQL 1666 , discrete IQL 1793 and Impala 1506.

What's Changed: PR description
* [BugFix] Fix incorrect deprecation warning by mikemykhaylov in https://github.com/pytorch/rl/pull/1655
* [Bug] TensorDictMaxValueWriter raises error when no sample in a batch is accepted by albertbou92 in https://github.com/pytorch/rl/pull/1664
* [BugFix] Fix "done" instead of "terminated" mistakes by MarCnu in https://github.com/pytorch/rl/pull/1661
* [Feature] CatFrames constant padding by albertbou92 in https://github.com/pytorch/rl/pull/1663
* doc(README): remove typo by Deep145757 in https://github.com/pytorch/rl/pull/1665
* [Docs] Update README.md by vaibhav-009 in https://github.com/pytorch/rl/pull/1667
* [Minor] Update dreamer example tests by vmoens in https://github.com/pytorch/rl/pull/1668
* [Feature] Introduce grouping in VMAS by matteobettini in https://github.com/pytorch/rl/pull/1658
* [BugFix] assertion error message, envs/util.py by laszloKopits in https://github.com/pytorch/rl/pull/1669
* [Doc] Set `action_spec` instead of `input_spec` by FrankTianTT in https://github.com/pytorch/rl/pull/1657
* [BugFix] Fix submitit IP address/node name retrieval by vmoens in https://github.com/pytorch/rl/pull/1672
* [Doc] Document (and test) compound actor by vmoens in https://github.com/pytorch/rl/pull/1673
* [Doc] Update rollout_recurrent.png to account for terminal by vmoens in https://github.com/pytorch/rl/pull/1677
* [Doc] Add EGreedyWrapper back in the doc by vmoens in https://github.com/pytorch/rl/pull/1680
* [Doc] Fix `TanhDelta` docstring by matteobettini in https://github.com/pytorch/rl/pull/1683
* [Doc] Add discord badge on README by vmoens in https://github.com/pytorch/rl/pull/1686
* [CI] Downgrade RAY to fix CI by vmoens in https://github.com/pytorch/rl/pull/1687
* [BugFix] MaxValueWriter cuda compatibility by albertbou92 in https://github.com/pytorch/rl/pull/1689
* Upload docs for preview on HUD by DanilBaibak in https://github.com/pytorch/rl/pull/1682
* [Doc] Update pendulum and rnn tutos by vmoens in https://github.com/pytorch/rl/pull/1691
* [Algorithm] Discrete CQL by BY571 in https://github.com/pytorch/rl/pull/1666
* [BugFix] Minor fix in the logging of PPO and A2C examples by albertbou92 in https://github.com/pytorch/rl/pull/1693
* [CI] Enable retry mechanism by DanilBaibak in https://github.com/pytorch/rl/pull/1681
* [Refactor] Minor changes in prep of https://github.com/pytorch/tensordict/pull/541 by vmoens in https://github.com/pytorch/rl/pull/1696
* [BugFix] fix dreamer actor by FrankTianTT in https://github.com/pytorch/rl/pull/1697
* [Refactor] Deprecate direct usage of memmap tensors by vmoens in https://github.com/pytorch/rl/pull/1684
* Revert "[Refactor] Deprecate direct usage of memmap tensors" by vmoens in https://github.com/pytorch/rl/pull/1698
* [Refactor] Deprecate direct usage of memmap tensors by vmoens in https://github.com/pytorch/rl/pull/1699
* [Doc] Fix discord link by vmoens in https://github.com/pytorch/rl/pull/1701
* [BugFix] make sure the params of exploration-wrapper is float by FrankTianTT in https://github.com/pytorch/rl/pull/1700
* [Fix] EndOfLifeTransform fix in end of life detection by albertbou92 in https://github.com/pytorch/rl/pull/1705
* [CI] Fix benchmark on gpu by vmoens in https://github.com/pytorch/rl/pull/1706
* [Algorithm] IMPALA and VTrace module by albertbou92 in https://github.com/pytorch/rl/pull/1506
* [Doc] Fix discord link by vmoens in https://github.com/pytorch/rl/pull/1712
* [Refactor] Refactor functional calls in losses by vmoens in https://github.com/pytorch/rl/pull/1707
* [CI] Fix CI by vmoens in https://github.com/pytorch/rl/pull/1711
* [BugFix] Make casting to 'meta' device uniform across cost modules by vmoens in https://github.com/pytorch/rl/pull/1715
* [BugFix] Change ppo mujoco example to match paper results by albertbou92 in https://github.com/pytorch/rl/pull/1714
* [Minor] Hide params in ddpg actor-critic by vmoens in https://github.com/pytorch/rl/pull/1716
* [BugFix] Fix hold_out_net by vmoens in https://github.com/pytorch/rl/pull/1719
* [BugFix] `RewardSum` key check by matteobettini in https://github.com/pytorch/rl/pull/1718
* [Feature] Allow usage of a different device on main and sub-envs in ParallelEnv and SerialEnv by vmoens in https://github.com/pytorch/rl/pull/1626
* [Refactor] Better weight update in collectors by vmoens in https://github.com/pytorch/rl/pull/1723
* [Feature] Shared replay buffers by vmoens in https://github.com/pytorch/rl/pull/1724
* [CI] FIx nightly builds on osx by vmoens in https://github.com/pytorch/rl/pull/1726
* [BugFix] _call_actor_net does not handle multiple inputs by albertbou92 in https://github.com/pytorch/rl/pull/1728
* [Feature] Python-based RNN Modules by albertbou92 in https://github.com/pytorch/rl/pull/1720
* [BugFix, Test] Fix flaky gym vecenvs tests by vmoens in https://github.com/pytorch/rl/pull/1727
* [BugFix] Fix non-full TensorStorage indexing by vmoens in https://github.com/pytorch/rl/pull/1730
* [Feature] Minari datasets by vmoens in https://github.com/pytorch/rl/pull/1721
* [Feature] All VMAS scenarios available by matteobettini in https://github.com/pytorch/rl/pull/1731
* [Feature] pickle-free RB checkpointing by vmoens in https://github.com/pytorch/rl/pull/1733
* [CI] Fix doc upload by vmoens in https://github.com/pytorch/rl/pull/1738
* [BugFix] Fix RNNs trajectory split in VMAP calls by vmoens in https://github.com/pytorch/rl/pull/1736
* [CI] Fix doc upload by vmoens in https://github.com/pytorch/rl/pull/1739
* [BugFix, Feature] Fix DDQN implementation by vmoens in https://github.com/pytorch/rl/pull/1737
* [Algorithm] Update DQN example by albertbou92 in https://github.com/pytorch/rl/pull/1512
* [BugFix] Use rsync in doc workflow by vmoens in https://github.com/pytorch/rl/pull/1741
* [BugFix] Fix compat with new memmap API by vmoens in https://github.com/pytorch/rl/pull/1744
* [Feature] Roboset datasets by vmoens in https://github.com/pytorch/rl/pull/1743
* [Algorithm] Simpler IQL example by BY571 in https://github.com/pytorch/rl/pull/998
* [Performance] Faster RNNs by vmoens in https://github.com/pytorch/rl/pull/1732
* [BugFix, Test] Fix torch.vmap call in RNN tests by vmoens in https://github.com/pytorch/rl/pull/1749
* [BugFix] Fix discrete SAC log-prob by vmoens in https://github.com/pytorch/rl/pull/1750
* [Minor] Remove dead code in RolloutFromModel by ianbarber in https://github.com/pytorch/rl/pull/1752
* [Minor] Fix runnability of RLHF example in examples/rlhf by ianbarber in https://github.com/pytorch/rl/pull/1753
* [Feature] SliceSampler by vmoens in https://github.com/pytorch/rl/pull/1748
* [CI] Fix windows CI by vmoens in https://github.com/pytorch/rl/pull/1746
* [CI] Fix CI for optional dependencies by vmoens in https://github.com/pytorch/rl/pull/1754
* [Feature] V-D4RL by vmoens in https://github.com/pytorch/rl/pull/1756
* [Benchmark] Fix RB benchmarks by vmoens in https://github.com/pytorch/rl/pull/1760
* [BugFix] Fix RLHF by vmoens in https://github.com/pytorch/rl/pull/1757
* [BugFix] Fix slice sampler by vmoens in https://github.com/pytorch/rl/pull/1762
* [Feature] BurnInTransform by albertbou92 in https://github.com/pytorch/rl/pull/1765
* [Bug] Minor change burnin transform by albertbou92 in https://github.com/pytorch/rl/pull/1770
* [BugFix] Fix sampling of last item in SliceSampler by vmoens in https://github.com/pytorch/rl/pull/1774
* [Feature] Open-X Embodiement datasets by vmoens in https://github.com/pytorch/rl/pull/1751
* [BugFix] Fix documentation of threads for batched envs. by skandermoalla in https://github.com/pytorch/rl/pull/1776
* [BugFix, CI] Fix OpenML datasets runs by vmoens in https://github.com/pytorch/rl/pull/1779
* [Versioning] Bump v0.3.0 and fix m1-wheels by vmoens in https://github.com/pytorch/rl/pull/1780
* [Feature] Composite replay buffers by vmoens in https://github.com/pytorch/rl/pull/1768
* [BugFix, Feature] Vmap randomness in losses by BY571 in https://github.com/pytorch/rl/pull/1740
* [Algorithm] Update discrete SAC example by BY571 in https://github.com/pytorch/rl/pull/1745
* [Docs] Pointers to BenchMARL by matteobettini in https://github.com/pytorch/rl/pull/1710
* [Feature] Immutable writer for datasets by vmoens in https://github.com/pytorch/rl/pull/1781
* [Feature] Remove and check for prints in codebase using flake8-print by vmoens in https://github.com/pytorch/rl/pull/1758
* [BUG] Missing import for some Samplers in Data module by albertbou92 in https://github.com/pytorch/rl/pull/1784
* [BugFix] Ensure that infos and samples have the same batch-size in SamplerEnsemble by vmoens in https://github.com/pytorch/rl/pull/1786
* [BugFix] Writers extend() method should always return indices in data.device by albertbou92 in https://github.com/pytorch/rl/pull/1785
* [Doc] Revamp envs doc by vmoens in https://github.com/pytorch/rl/pull/1787
* [BugFix] Less flaky gym vecenv test by vmoens in https://github.com/pytorch/rl/pull/1790
* [CI] Regroup tests by vmoens in https://github.com/pytorch/rl/pull/1791
* [CI] Remove stable GPU tests from CI by vmoens in https://github.com/pytorch/rl/pull/1792
* Update README.md to fix CI banner by vmoens in https://github.com/pytorch/rl/pull/1794
* [Feature] `SamplerWithoutReplacement` state dictionary by matteobettini in https://github.com/pytorch/rl/pull/1788
* [BugFix] Higher time threshold for PEnv by vmoens in https://github.com/pytorch/rl/pull/1799
* [Feature] SignTransform by albertbou92 in https://github.com/pytorch/rl/pull/1798
* [Feature] Extend MaxValueWriter with reduce parameter for the rank_key by albertbou92 in https://github.com/pytorch/rl/pull/1796
* [BugFix] Fixes bug in MaxValueWriter tests by albertbou92 in https://github.com/pytorch/rl/pull/1801
* [Performance] faster gym-like class by vmoens in https://github.com/pytorch/rl/pull/1803
* [Feature] GenDGRL by vmoens in https://github.com/pytorch/rl/pull/1773
* [Performance] Minor improvements to step_and_maybe_reset in batched envs by vmoens in https://github.com/pytorch/rl/pull/1807
* [Algorithm] Discrete IQL by BY571 in https://github.com/pytorch/rl/pull/1793
* [Doc] More depth in VMAS docs by matteobettini in https://github.com/pytorch/rl/pull/1802
* [BugFix] Remove select() in favor of empty() by vmoens in https://github.com/pytorch/rl/pull/1811
* Bump jinja2 from 3.1.2 to 3.1.3 in /docs by dependabot in https://github.com/pytorch/rl/pull/1812
* [BugFix] Make `TransformedEnv` mirror `allow_done_after_reset` property of base env by matteobettini in https://github.com/pytorch/rl/pull/1810
* [Doc] Update StepCounter doc by skandermoalla in https://github.com/pytorch/rl/pull/1813
* [Feature] Improve info_dict reader by vmoens in https://github.com/pytorch/rl/pull/1809
* [CI, Minor] Regroup Gen-DGRL CI with other libs by vmoens in https://github.com/pytorch/rl/pull/1814
* [Versioning] Housekeeping in setup.py by vmoens in https://github.com/pytorch/rl/pull/1816
* [Feature] TorchRL2Gym conversion by vmoens in https://github.com/pytorch/rl/pull/1795
* [BugFix, CI] Fix snapshop imports in stable CI by vmoens in https://github.com/pytorch/rl/pull/1821
* [Feature] More flexibility in loading PettingZoo by matteobettini in https://github.com/pytorch/rl/pull/1817
* [Docs] Fix doc of ToTensorImage transforms.py by skandermoalla in https://github.com/pytorch/rl/pull/1824
* [BugFix] Fix device of container generated values in transforms by vmoens in https://github.com/pytorch/rl/pull/1827
* [Feature] Atari DQN dataset by vmoens in https://github.com/pytorch/rl/pull/1815
* [Feature] Non-functional objectives (PPO, A2C, Reinforce) by vmoens in https://github.com/pytorch/rl/pull/1804
* [Refactor] change default CKPT_BACKEND to torch by vmoens in https://github.com/pytorch/rl/pull/1830
* pyproject.toml: remove unknown properties by GaetanLepage in https://github.com/pytorch/rl/pull/1828
* [Doc, Feature] Doc improvements for video recording and CSV video formats by vmoens in https://github.com/pytorch/rl/pull/1829
* [Feature] PyTrees in replay buffers by vmoens in https://github.com/pytorch/rl/pull/1831
* [BugFix] Fix sequential step counts by vmoens in https://github.com/pytorch/rl/pull/1838
* [Doc] TED format by vmoens in https://github.com/pytorch/rl/pull/1836
* [Doc] References to TED by vmoens in https://github.com/pytorch/rl/pull/1839
* [BugFix] Temporarily set lazy legacy to True by vmoens in https://github.com/pytorch/rl/pull/1840
* [BugFix] Fix gym info scalar infos by vmoens in https://github.com/pytorch/rl/pull/1842
* [Refactor] LAZY_LEGACY_OP=False by vmoens in https://github.com/pytorch/rl/pull/1832
* [Feature] `serial_for_single` arg in batched envs by vmoens in https://github.com/pytorch/rl/pull/1846
* [BugFix] Fix VD4RL by vmoens in https://github.com/pytorch/rl/pull/1834
* [Doc] Make tutos runnable without colab by vmoens in https://github.com/pytorch/rl/pull/1826
* [Feature] Fine control over devices in collectors by vmoens in https://github.com/pytorch/rl/pull/1835
* [Feature, BugFix] Better thread control in penv and collectors by vmoens in https://github.com/pytorch/rl/pull/1848
* [CI] Update macos image by vmoens in https://github.com/pytorch/rl/pull/1849
* [BugFix] thread setting bug by vmoens in https://github.com/pytorch/rl/pull/1852
* Remove unused completed_keys property from StepCounter. by skandermoalla in https://github.com/pytorch/rl/pull/1854
* [Feature] Submitit run script by albertbou92 in https://github.com/pytorch/rl/pull/1822
* [BugFix] Fix flaky gym penv test by vmoens in https://github.com/pytorch/rl/pull/1853
* [CI] Fix macos build by vmoens in https://github.com/pytorch/rl/pull/1856

New Contributors
* mikemykhaylov made their first contribution in https://github.com/pytorch/rl/pull/1655
* MarCnu made their first contribution in https://github.com/pytorch/rl/pull/1661
* Deep145757 made their first contribution in https://github.com/pytorch/rl/pull/1665
* vaibhav-009 made their first contribution in https://github.com/pytorch/rl/pull/1667
* laszloKopits made their first contribution in https://github.com/pytorch/rl/pull/1669
* ianbarber made their first contribution in https://github.com/pytorch/rl/pull/1752
* dependabot made their first contribution in https://github.com/pytorch/rl/pull/1812
* GaetanLepage made their first contribution in https://github.com/pytorch/rl/pull/1828

**Full Changelog**: https://github.com/pytorch/rl/compare/v0.2.1...v0.3.0

0.2.1

New Contributors
* duburcqa made their first contribution in https://github.com/pytorch/rl/pull/1615

**Full Changelog**: https://github.com/pytorch/rl/compare/v0.2.0...v0.2.1

0.2.0

This release provides many new features and bug fixes.

TorchRL now publishes Apple Silicon compatible wheels.
We drop coverage of python 3.7 in favour of 3.11.

New and updated algorithms

Most algorithms have been cleaned and designed to reach (at least) SOTA results.

![image](https://github.com/pytorch/rl/assets/25529882/c6a97c8a-5efa-4508-ac34-79b860bac95b)

Compatibility with MARL settings has been drastically improved, and we provide a good amount of MARL examples within the library:

![image](https://github.com/pytorch/rl/assets/25529882/b7799087-cd0d-4476-8550-cc9514ca7271)

A prototype RLHF training script is also proposed (1597)

A whole new category of offline RL algorithms have been integrated: Decision transformers.

* [Algorithm] Update offpolicy examples by BY571 in https://github.com/pytorch/rl/pull/1206
* [Algorithm] Online Decision transformer by BY571 in https://github.com/pytorch/rl/pull/1149
* [Algorithm] QMixer loss and multiagent models by matteobettini in https://github.com/pytorch/rl/pull/1378
* [Algorithm] RLHF end-to-end, clean by vmoens in https://github.com/pytorch/rl/pull/1597
* [Algorithm] Update A2C examples by albertbou92 in https://github.com/pytorch/rl/pull/1521
* [Algorithm] Update DDPG Example by BY571 in https://github.com/pytorch/rl/pull/1525
* [Algorithm] Update DT by BY571 in https://github.com/pytorch/rl/pull/1560
* [Algorithm] Update PPO examples by albertbou92 in https://github.com/pytorch/rl/pull/1495
* [Algorithm] Update SAC Example by BY571 in https://github.com/pytorch/rl/pull/1524
* [Algorithm] Update TD3 Example by BY571 in https://github.com/pytorch/rl/pull/1523

New features

One of the major new features of the library is the introduction of the terminated / truncated / done distinction at __no cost__ within the library. All third-party and primary environments are now compatible with this, as well as losses and data collection primitives (collector etc). This feature is also compatible with complex data structures, such as those found in MARL training pipelines.

All losses are now compatible with tensordict-free inputs, for a more generic deployment.

New transforms

Atari games can now benefit from a EndOfLifeTransform that allows to use the end-of-life as a done state in the loss (1605)

We provide a KL transform to add a KL factor to the reward in RLHF settings.

Action masking is made possible through the ActionMask transform (1421)

VC1 is also integrated for better image embedding.

* [Feature] Allow sequential transforms to work offline by vmoens in https://github.com/pytorch/rl/pull/1136
* [Feature] ClipTransform + rename `min/maximum` -> `low/high` by vmoens in https://github.com/pytorch/rl/pull/1500
* [Feature] End-of-life transform by vmoens in https://github.com/pytorch/rl/pull/1605
* [Feature] KL Transform for RLHF by vmoens in https://github.com/pytorch/rl/pull/1196
* [Features] Conv3dNet and PermuteTransform by xmaples in https://github.com/pytorch/rl/pull/1398
* [Feature, Refactor] Scale in ToTensorImage based on the dtype and new from_int parameter by hyerra in https://github.com/pytorch/rl/pull/1208
* [Feature] CatFrames used as inverse by BY571 in https://github.com/pytorch/rl/pull/1321
* [Feature] Masking actions by vmoens in https://github.com/pytorch/rl/pull/1421
* [Feature] VC1 integration by vmoens in https://github.com/pytorch/rl/pull/1211

New models

We provide GRU alongside LSTM for POMDP training.

MARL model coverage is now richer of a MultiAgentMLP and MultiAgentCNN! Other improvments for MARL include coverage for nested keys in most places of the library (losses, data collection, environments...)/

* [Feature] Support for GRU by vmoens in https://github.com/pytorch/rl/pull/1586
* [Feature] TanhModule by vmoens in https://github.com/pytorch/rl/pull/1213
* [Features] Conv3dNet and PermuteTransform by xmaples in https://github.com/pytorch/rl/pull/1398
* [Feature] CNN version of MultiAgentMLP by MarkHaoxiang in https://github.com/pytorch/rl/pull/1479

Other features (misc)

* [Feature] RLHF Rollouts (reopened) by vmoens in https://github.com/pytorch/rl/pull/1329
* [Feature] Add CQL by BY571 in https://github.com/pytorch/rl/pull/1239
* [Feature] Allow multiple (nested) action, reward, done keys in `env`,`vec_env` and `collectors` by matteobettini in https://github.com/pytorch/rl/pull/1462
* [Feature] Auto-DoubleToFloat by vmoens in https://github.com/pytorch/rl/pull/1442
* [Feature] CompositeSpec.lock by vmoens in https://github.com/pytorch/rl/pull/1143
* [Feature] Device transform by vmoens in https://github.com/pytorch/rl/pull/1472
* [Feature] Dispatch DiscreteSAC loss module by Blonck in https://github.com/pytorch/rl/pull/1248
* [Feature] Dispatch PPO loss module by Blonck in https://github.com/pytorch/rl/pull/1249
* [Feature] Dispatch REDQ loss module by Blonck in https://github.com/pytorch/rl/pull/1251
* [Feature] Dispatch SAC loss module by Blonck in https://github.com/pytorch/rl/pull/1244
* [Feature] Dispatch TD3 loss module by Blonck in https://github.com/pytorch/rl/pull/1254
* [Feature] Dispatch for DDPG loss module by Blonck in https://github.com/pytorch/rl/pull/1215
* [Feature] Dispatch for SAC loss module by Blonck in https://github.com/pytorch/rl/pull/1223
* [Feature] Dispatch reinforce loss module by Blonck in https://github.com/pytorch/rl/pull/1252
* [Feature] Distpatch IQL loss module by Blonck in https://github.com/pytorch/rl/pull/1230
* [Feature] Fix DType casting lazy init by vmoens in https://github.com/pytorch/rl/pull/1589
* [Feature] Heterogeneous Environments compatibility by matteobettini in https://github.com/pytorch/rl/pull/1411
* [Feature] Log hparams from python dict by matteobettini in https://github.com/pytorch/rl/pull/1517
* [Feature] MARL exploration e-greedy compatibility by matteobettini in https://github.com/pytorch/rl/pull/1277
* [Feature] Make advantages compatible with Terminated, Truncated, Done by vmoens in https://github.com/pytorch/rl/pull/1581
* [Feature] Make losses inherit from TDMBase by vmoens in https://github.com/pytorch/rl/pull/1246
* [Feature] Making action masks compatible with q value modules and e-greedy by matteobettini in https://github.com/pytorch/rl/pull/1499
* [Feature] Nested keys in `OrnsteinUhlenbeckProcess` by matteobettini in https://github.com/pytorch/rl/pull/1305
* [Feature] Optional mapping of "state" in gym specs by matteobettini in https://github.com/pytorch/rl/pull/1431
* [Feature] Parallel environments lazy heterogenous data compatibility by matteobettini in https://github.com/pytorch/rl/pull/1436
* [Feature] Pettingzoo: add multiagent dimension to single agent groups by matteobettini in https://github.com/pytorch/rl/pull/1550
* [Feature] RLHF Reward Model (reopened) by vmoens in https://github.com/pytorch/rl/pull/1328
* [Feature] RLHF dataloading by vmoens in https://github.com/pytorch/rl/pull/1309
* [Feature] RLHF networks by apbard in https://github.com/pytorch/rl/pull/1319
* [Feature] Refactor categorical dists: Masked one-hot and pass-through gradients by vmoens in https://github.com/pytorch/rl/pull/1488
* [Feature] ReplayBuffer.empty by vmoens in https://github.com/pytorch/rl/pull/1238
* [Feature] Separate losses by MateuszGuzek in https://github.com/pytorch/rl/pull/1240
* [Feature] Single call to value network in advantages [bis] by vmoens in https://github.com/pytorch/rl/pull/1263
* [Feature] Single call to value network in advantages by vmoens in https://github.com/pytorch/rl/pull/1256
* [Feature] TensorStorage by vmoens in https://github.com/pytorch/rl/pull/1310
* [Feature] Threaded collection and parallel envs by vmoens in https://github.com/pytorch/rl/pull/1559
* [Feature] Unbind specs by vmoens in https://github.com/pytorch/rl/pull/1555
* [Feature] VMAS obs dict by matteobettini in https://github.com/pytorch/rl/pull/1419
* [Feature] VMAS: choose between categorical or one-hot actions by matteobettini in https://github.com/pytorch/rl/pull/1484
* [Feature] dispatch for DQNLoss by vmoens in https://github.com/pytorch/rl/pull/1194
* [Feature] log histograms by vmoens in https://github.com/pytorch/rl/pull/1306
* [Feature] make csv logger `exist_ok` on logging folder by matteobettini in https://github.com/pytorch/rl/pull/1561
* [Feature] shifted for all adv by vmoens in https://github.com/pytorch/rl/pull/1276

New environments and third-party improvements

We now cover SMAC-v2, PettingZoo, IsaacGymEnvs (prototype) and RoboHive. The D4RL dataset can now be used without the eponym library, which permit training with more recent or older versions of gym.

* [Environment, Docs] SMACv2 and docs on action masking by matteobettini in https://github.com/pytorch/rl/pull/1466
* [Environment] Petting zoo by matteobettini in https://github.com/pytorch/rl/pull/1471
* [Feature] D4rl direct download by MateuszGuzek in https://github.com/pytorch/rl/pull/1430
* [Feature] Gym 'vectorized' envs compatibility by vmoens in https://github.com/pytorch/rl/pull/1519
* [Feature] Gym compatibility: Terminal and truncated by vmoens in https://github.com/pytorch/rl/pull/1539
* [Feature] IsaacGymEnvs integration by vmoens in https://github.com/pytorch/rl/pull/1443
* [Feature] RoboHive integration by vmoens in https://github.com/pytorch/rl/pull/1119

Performance improvements

We provide several speed improvements, in particular for data collection.

![image](https://github.com/pytorch/rl/assets/25529882/b2894440-2ba2-4935-a3d8-05279577b5db)

* [Performance] Accelerate GAE by Blonck in https://github.com/pytorch/rl/pull/1142
* [Performance] Accelerate TD lambda return estimate by Blonck in https://github.com/pytorch/rl/pull/1158
* [Performance] Accelerate `_split_and_pad_sequence` by Blonck in https://github.com/pytorch/rl/pull/1147
* [Performance] Faster GAE by vmoens in https://github.com/pytorch/rl/pull/1153
* [Performance] Faster losses by vmoens in https://github.com/pytorch/rl/pull/1272
* [Performance] Improve performance and streamline the generating of the gammalambda tensor by Blonck in https://github.com/pytorch/rl/pull/1171
* [Performance] Miscellaneous efficiency improvements by vmoens in https://github.com/pytorch/rl/pull/1513
* [Performance] Reduce key accessing in transforms by matteobettini in https://github.com/pytorch/rl/pull/1590
* [Performance] Some efficiency improvements by vmoens in https://github.com/pytorch/rl/pull/1250
* [Performance] Vmas vectorized reset by matteobettini in https://github.com/pytorch/rl/pull/1146

Bug fixes

* [BugFIx] Fix entropy signature in truncated normal by vmoens in https://github.com/pytorch/rl/pull/1536
* [BugFix,CI] Fix virtualenv not found by vmoens in https://github.com/pytorch/rl/pull/1280
* [BugFix] Add `torch.no_grad()` for rendering in multiagent PPO tutorial by matteobettini in https://github.com/pytorch/rl/pull/1511
* [BugFix] Batched envs compatibility with custom keys by matteobettini in https://github.com/pytorch/rl/pull/1348
* [BugFix] C++17 by vmoens in https://github.com/pytorch/rl/pull/1169
* [BugFix] Check env specs for nested envs by matteobettini in https://github.com/pytorch/rl/pull/1332
* [BugFix] CompositeSpec.unsqueeze by btx0424 in https://github.com/pytorch/rl/pull/1464
* [BugFix] DDPG select also critic input for actor loss by matteobettini in https://github.com/pytorch/rl/pull/1563
* [BugFix] DQN loss dispatch respect configured tensordict keys by Blonck in https://github.com/pytorch/rl/pull/1285
* [BugFix] Discrete SAC rewrite by matteobettini in https://github.com/pytorch/rl/pull/1461
* [BugFix] Empty-spec tolerance by vmoens in https://github.com/pytorch/rl/pull/1501
* [BugFix] Fix Brax reset by vmoens in https://github.com/pytorch/rl/pull/1195
* [BugFix] Fix CatFrames by vmoens in https://github.com/pytorch/rl/pull/1336
* [BugFix] Fix ClipTransform device by vmoens in https://github.com/pytorch/rl/pull/1508
* [BugFix] Fix Cython for D4RL by vmoens in https://github.com/pytorch/rl/pull/1429
* [BugFix] Fix DDPG by vmoens in https://github.com/pytorch/rl/pull/1183
* [BugFix] Fix DDPG squeezing by matteobettini in https://github.com/pytorch/rl/pull/1487
* [BugFix] Fix Dreamer test error by vmoens in https://github.com/pytorch/rl/pull/1558
* [BugFix] Fix Gym Categorical/One-hot issues by vmoens in https://github.com/pytorch/rl/pull/1482
* [BugFix] Fix KL import errors by vmoens in https://github.com/pytorch/rl/pull/1207
* [BugFix] Fix KLTransform execution with LSTM by vmoens in https://github.com/pytorch/rl/pull/1426
* [BugFix] Fix KeyError in inverse transform replay buffer by BY571 in https://github.com/pytorch/rl/pull/1165
* [BugFix] Fix LSTM - VecEnv compatibility by vmoens in https://github.com/pytorch/rl/pull/1427
* [BugFix] Fix LSTM use with padded/masked segments by smorad in https://github.com/pytorch/rl/pull/1399
* [BugFix] Fix NoopResetEnv behavior when trials exceeded. by skandermoalla in https://github.com/pytorch/rl/pull/1477
* [BugFix] Fix QValueModule multi_one_hot by smorad in https://github.com/pytorch/rl/pull/1439
* [BugFix] Fix RLHF tests - transformers v4.34 by vmoens in https://github.com/pytorch/rl/pull/1601
* [BugFix] Fix RewardSum spec transform to mimic reward spec by matteobettini in https://github.com/pytorch/rl/pull/1478
* [BugFix] Fix SAC alpha optim by vmoens in https://github.com/pytorch/rl/pull/1192
* [BugFix] Fix SAC by vmoens in https://github.com/pytorch/rl/pull/1189
* [BugFix] Fix SAC by vmoens in https://github.com/pytorch/rl/pull/1190
* [BugFix] Fix SACv2 by vmoens in https://github.com/pytorch/rl/pull/1191
* [BugFix] Fix SMAC-v2 by vmoens in https://github.com/pytorch/rl/pull/1538
* [BugFix] Fix TD3 and compat with https://github.com/pytorch-labs/tensordict/pull/482 by vmoens in https://github.com/pytorch/rl/pull/1375
* [BugFix] Fix TD3 inplace updates by vmoens in https://github.com/pytorch/rl/pull/1219
* [BugFix] Fix TD3 target net by vmoens in https://github.com/pytorch/rl/pull/1186
* [BugFix] Fix `LazyStackedCompositeSpec` and introducing `consolidate_spec` by matteobettini in https://github.com/pytorch/rl/pull/1392
* [BugFix] Fix `step_mdp()` by matteobettini in https://github.com/pytorch/rl/pull/1334
* [BugFix] Fix action mask test by vmoens in https://github.com/pytorch/rl/pull/1492
* [BugFix] Fix brax by vmoens in https://github.com/pytorch/rl/pull/1346
* [BugFix] Fix bug in ppo example config by degensean in https://github.com/pytorch/rl/pull/1396
* [BugFix] Fix envpool by vmoens in https://github.com/pytorch/rl/pull/1530
* [BugFix] Fix error message of .set_keys() in advantage modules by Blonck in https://github.com/pytorch/rl/pull/1218
* [BugFix] Fix examples by vmoens in https://github.com/pytorch/rl/pull/1173
* [BugFix] Fix locked params modif by vmoens in https://github.com/pytorch/rl/pull/1307
* [BugFix] Fix max length by vmoens in https://github.com/pytorch/rl/pull/1233
* [BugFix] Fix missing ("next", "observation") key in dispatch of losses by Blonck in https://github.com/pytorch/rl/pull/1235
* [BugFix] Fix nested CompositeSpec creation by vmoens in https://github.com/pytorch/rl/pull/1261
* [BugFix] Fix nightly tensordict dependency by skandermoalla in https://github.com/pytorch/rl/pull/1302
* [BugFix] Fix ppo example by vmoens in https://github.com/pytorch/rl/pull/1225
* [BugFix] Fix ppo training NaN occurences by vmoens in https://github.com/pytorch/rl/pull/1403
* [BugFix] Fix reward sum within parallel envs by vmoens in https://github.com/pytorch/rl/pull/1454
* [BugFix] Fix run_type_checks by vmoens in https://github.com/pytorch/rl/pull/1570
* [BugFix] Fix safe tanh for older torch versions by vmoens in https://github.com/pytorch/rl/pull/1220
* [BugFix] Fix serialization of parallel envs by vmoens in https://github.com/pytorch/rl/pull/1197
* [BugFix] Fix split_trajs by vmoens in https://github.com/pytorch/rl/pull/1444
* [BugFix] Fix tanh/atanh vmap compatibility by vmoens in https://github.com/pytorch/rl/pull/1217
* [BugFix] Fix the bug of `RoundRobinWriter.extend(data)` by xmaples in https://github.com/pytorch/rl/pull/1295
* [BugFix] Fix tutorials by vmoens in https://github.com/pytorch/rl/pull/1382
* [BugFix] Fix typo in CatFrames Transform error message. by skandermoalla in https://github.com/pytorch/rl/pull/1491
* [BugFix] Fix vmap in VmapModule (torch 1.13 compat) by vmoens in https://github.com/pytorch/rl/pull/1350
* [BugFix] Improve collector buffer initialisation when policy spec is unavailable by matteobettini in https://github.com/pytorch/rl/pull/1547
* [BugFix] Instantiate 2 losses with different keys by matteobettini in https://github.com/pytorch/rl/pull/1553
* [BugFix] KL module integration by vmoens in https://github.com/pytorch/rl/pull/1212
* [BugFix] Key selection in batched envs by vmoens in https://github.com/pytorch/rl/pull/1253
* [BugFix] Load collector frames and iter by matteobettini in https://github.com/pytorch/rl/pull/1557
* [BugFix] Make VecNorm Transform pickable by albertbou92 in https://github.com/pytorch/rl/pull/1596
* [BugFix] Minor fixes PPO / A2C examples by albertbou92 in https://github.com/pytorch/rl/pull/1591
* [BugFix] Multiagent "auto" entropy fix in SAC by matteobettini in https://github.com/pytorch/rl/pull/1494
* [BugFix] Nested envs compatibility by matteobettini in https://github.com/pytorch/rl/pull/1347
* [BugFix] Nested key in replay buffer by matteobettini in https://github.com/pytorch/rl/pull/1485
* [BugFix] Nested keys in transforms by matteobettini in https://github.com/pytorch/rl/pull/1355
* [BugFix] Nested keys to probabilistic modules by matteobettini in https://github.com/pytorch/rl/pull/1363
* [BugFix] Parametric `rand_action()` in `BaseEnv` by matteobettini in https://github.com/pytorch/rl/pull/1267
* [BugFix] Parametric collectors by matteobettini in https://github.com/pytorch/rl/pull/1303
* [BugFix] Patch SAC to allow state_dict manipulation before exec by vmoens in https://github.com/pytorch/rl/pull/1607
* [BugFix] PettingZoo seeding by matteobettini in https://github.com/pytorch/rl/pull/1554
* [BugFix] Pickable buffer by albertbou92 in https://github.com/pytorch/rl/pull/1410
* [BugFix] QValue modules and nested action by matteobettini in https://github.com/pytorch/rl/pull/1351
* [BugFix] Reward sum custom key by matteobettini in https://github.com/pytorch/rl/pull/1413
* [BugFix] SafeModule not safely handling specs by matteobettini in https://github.com/pytorch/rl/pull/1352
* [BugFix] Small patches to SMAC by matteobettini in https://github.com/pytorch/rl/pull/1533
* [BugFix] Sparse info in SMACv2 by matteobettini in https://github.com/pytorch/rl/pull/1546
* [BugFix] ToTensorImage unsqueeze would not update the observation spec by hyerra in https://github.com/pytorch/rl/pull/1161
* [BugFix] Torch 1.13 compat by vmoens in https://github.com/pytorch/rl/pull/1294
* [BugFix] Unbreak tensordict import by vmoens in https://github.com/pytorch/rl/pull/1231
* [BugFix] Vectorized priority update in replay buffers by matteobettini in https://github.com/pytorch/rl/pull/1598
* [BugFix] _transpose_time with single dim by vmoens in https://github.com/pytorch/rl/pull/1155
* [BugFix] `RewardSum` transform for multiple reward keys by matteobettini in https://github.com/pytorch/rl/pull/1544
* [BugFix] `step_mdp` nested keys by matteobettini in https://github.com/pytorch/rl/pull/1339
* [BugFix] include buffers in policy_weights by vmoens in https://github.com/pytorch/rl/pull/1185
* [BugFix] load_state_dict in param updates for collectors by vmoens in https://github.com/pytorch/rl/pull/1145
* [BugFix] make value estimator with value_key from the PPOLoss init arg by xmaples in https://github.com/pytorch/rl/pull/1144
* [BugFix] unlock in tensordictmodules tests by vmoens in https://github.com/pytorch/rl/pull/1417
* [BugFix] valid_size not saved as attribute by tcbegley in https://github.com/pytorch/rl/pull/1337

Miscellaneous

* Envpool Tests to Nova by osalpekar in https://github.com/pytorch/rl/pull/1283
* Fix CI by matteobettini in https://github.com/pytorch/rl/pull/1368
* Fix MacOS Mujoco Failure by osalpekar in https://github.com/pytorch/rl/pull/1450
* Linux GPU Brax Unittests by osalpekar in https://github.com/pytorch/rl/pull/1133
* Linux Gym Unittests to GHA by osalpekar in https://github.com/pytorch/rl/pull/1139
* Linux Olddeps tests to Nova by osalpekar in https://github.com/pytorch/rl/pull/1289
* Move to More Efficient Windows Runner by osalpekar in https://github.com/pytorch/rl/pull/1476
* OptDeps Tests to Nova by osalpekar in https://github.com/pytorch/rl/pull/1290
* Remove Distributed CCI job by osalpekar in https://github.com/pytorch/rl/pull/1374
* Remove Envpool from CCI by osalpekar in https://github.com/pytorch/rl/pull/1390
* Remove old CircleCI Lint by osalpekar in https://github.com/pytorch/rl/pull/1134
* Removing Migrated and Unused CCI jobs by osalpekar in https://github.com/pytorch/rl/pull/1288
* Revert "[Feature] Single call to value network in advantages" by vmoens in https://github.com/pytorch/rl/pull/1262
* Revert "[Refactor,Performance] Faster collectors" by vmoens in https://github.com/pytorch/rl/pull/1330
* Sklearn test to Nova by osalpekar in https://github.com/pytorch/rl/pull/1291
* Windows Unittests on GHA by osalpekar in https://github.com/pytorch/rl/pull/1086
* [Benchmark,CI] Benchmarks in PR (pre) by vmoens in https://github.com/pytorch/rl/pull/1342
* [Benchmark,CI] Benchmarks in PR by vmoens in https://github.com/pytorch/rl/pull/1341
* [Benchmark] Benchmark Gym vs TorchRL by vmoens in https://github.com/pytorch/rl/pull/1602
* [Benchmark] Benchmark losses by vmoens in https://github.com/pytorch/rl/pull/1287
* [Benchmark] Benchmark number GPU vectorised environments in VMAS (TorchRL vs RLlib) by matteobettini in https://github.com/pytorch/rl/pull/1446
* [Benchmark] Improve benchmark precision + step_mdp + fix GPU by vmoens in https://github.com/pytorch/rl/pull/1340
* [CI] Add macOS M1 binaries Wheels by DanilBaibak in https://github.com/pytorch/rl/pull/1504
* [CI] Add ninja for MacOS builts by vmoens in https://github.com/pytorch/rl/pull/1564
* [CI] Concurrency on gha by vmoens in https://github.com/pytorch/rl/pull/1152
* [CI] Deprecate Windows GPU CCI by osalpekar in https://github.com/pytorch/rl/pull/1387
* [CI] Doc CI fix by matteobettini in https://github.com/pytorch/rl/pull/1384
* [CI] Fix CI PettingZoo by matteobettini in https://github.com/pytorch/rl/pull/1528
* [CI] Fix CI by vmoens in https://github.com/pytorch/rl/pull/1529
* [CI] Fix GHA gpu tests by vmoens in https://github.com/pytorch/rl/pull/1356
* [CI] Fix Jax version in Jumanji by vmoens in https://github.com/pytorch/rl/pull/1242
* [CI] Fix Mujoco version by vmoens in https://github.com/pytorch/rl/pull/1475
* [CI] Fix RoboHive CI by vmoens in https://github.com/pytorch/rl/pull/1541
* [CI] Fix brax and habitat by vmoens in https://github.com/pytorch/rl/pull/1353
* [CI] Fix examples CI by matteobettini in https://github.com/pytorch/rl/pull/1489
* [CI] Fix failing jobs by vmoens in https://github.com/pytorch/rl/pull/1318
* [CI] Fix failing jobs by vmoens in https://github.com/pytorch/rl/pull/1335
* [CI] Fix habitat CI by vmoens in https://github.com/pytorch/rl/pull/1537
* [CI] Fix jumanji by vmoens in https://github.com/pytorch/rl/pull/1566
* [CI] Fix nightly build dependency on tensordict by vmoens in https://github.com/pytorch/rl/pull/1300
* [CI] Fix opt deps machine and docker by vmoens in https://github.com/pytorch/rl/pull/1362
* [CI] Fix tuto deps by matteobettini in https://github.com/pytorch/rl/pull/1416
* [CI] Fix wheels by vmoens in https://github.com/pytorch/rl/pull/1301
* [CI] Less old deps by vmoens in https://github.com/pytorch/rl/pull/1255
* [CI] Less warnings in CI (costs) by vmoens in https://github.com/pytorch/rl/pull/1349
* [CI] Merge Distributed and Linux GPU job by osalpekar in https://github.com/pytorch/rl/pull/1182
* [CI] Migrate examples by vmoens in https://github.com/pytorch/rl/pull/1364
* [CI] Move linux stable to GHA by vmoens in https://github.com/pytorch/rl/pull/1503
* [CI] Reduce CI time by vmoens in https://github.com/pytorch/rl/pull/1226
* [CI] Remove CCI Config by osalpekar in https://github.com/pytorch/rl/pull/1456
* [CI] Remove examples from CCI by vmoens in https://github.com/pytorch/rl/pull/1367
* [CI] Update cuda version by vmoens in https://github.com/pytorch/rl/pull/1380
* [CI] Windows GPU Tests by osalpekar in https://github.com/pytorch/rl/pull/1386
* [Doc] Add link to paper in readme by giadefa in https://github.com/pytorch/rl/pull/1298
* [Doc] Add paper refs in doc and KB by vmoens in https://github.com/pytorch/rl/pull/1241
* [Doc] CITATION.cff by vmoens in https://github.com/pytorch/rl/pull/1229
* [Doc] Do not clean gh-pages by vmoens in https://github.com/pytorch/rl/pull/1150
* [Doc] Fix GPU benchmark by vmoens in https://github.com/pytorch/rl/pull/1151
* [Doc] Fix advantage examples by vmoens in https://github.com/pytorch/rl/pull/1600
* [Doc] Fix default value of `tanh_loc` in the documentation of `TruncatedNormal`. by skandermoalla in https://github.com/pytorch/rl/pull/1205
* [Doc] Fix doctest examples by degensean in https://github.com/pytorch/rl/pull/1393
* [Doc] Fix exploration modules docstrings by vmoens in https://github.com/pytorch/rl/pull/1326
* [Doc] Fix tanh_loc in docstrings by vmoens in https://github.com/pytorch/rl/pull/1203
* [Doc] TorchRL Logo by vmoens in https://github.com/pytorch/rl/pull/1234
* [Doc] Update citation by vmoens in https://github.com/pytorch/rl/pull/1228
* [Doc] Update coding_ppo.py by kushaangupta in https://github.com/pytorch/rl/pull/1483
* [Doc] correct typos in pendulum tutorial by kushaangupta in https://github.com/pytorch/rl/pull/1502
* [Doc] fixed typos in ppo tutorial by MatteoGaetzner in https://github.com/pytorch/rl/pull/1314
* [Docs] Fix multi-agent tutorial by matteobettini in https://github.com/pytorch/rl/pull/1599
* [Docs] Multi-agent environments by matteobettini in https://github.com/pytorch/rl/pull/1383
* [Example] Multiagent examples: MAPPO-IPPO-MADDPG-IDDPG-IQL-QMIX-VDN by matteobettini in https://github.com/pytorch/rl/pull/1027
* [Fix] Remove loss device by matteobettini in https://github.com/pytorch/rl/pull/1395
* [Lint] Add TorchFix linter by kit1980 in https://github.com/pytorch/rl/pull/1580
* [Minor] Capture error in CatFrame edit by vmoens in https://github.com/pytorch/rl/pull/1498
* [Minor] Fix prints by vmoens in https://github.com/pytorch/rl/pull/1257
* [Minor] Fix typo by vmoens in https://github.com/pytorch/rl/pull/1193
* [Minor] Missing commit from 1488 by vmoens in https://github.com/pytorch/rl/pull/1490
* [Minor] Missing lint by vmoens in https://github.com/pytorch/rl/pull/1556
* [Minor] More efficient SAC v1 by vmoens in https://github.com/pytorch/rl/pull/1507
* [Minor] Remove ya gymnasium deprecation warning in vectorized envs by vmoens in https://github.com/pytorch/rl/pull/1573
* [Minor] small fixes by vmoens in https://github.com/pytorch/rl/pull/1237
* [Nova] Jumanji Tests to GHA by osalpekar in https://github.com/pytorch/rl/pull/1282
* [Nova] Remove windows Unittests from CCI by osalpekar in https://github.com/pytorch/rl/pull/1159
* [Nova] Removing CircleCI Gym Unittests by osalpekar in https://github.com/pytorch/rl/pull/1179
* [Nova] Vmas Tests to GHA by osalpekar in https://github.com/pytorch/rl/pull/1284
* [Quality] Filter out warnings in subprocs by vmoens in https://github.com/pytorch/rl/pull/1552
* [Refacto] Migration due to tensordict 473 and 474 by vmoens in https://github.com/pytorch/rl/pull/1354
* [Refactor,Performance] Faster collectors (bis) by vmoens in https://github.com/pytorch/rl/pull/1331
* [Refactor,Performance] Faster collectors by vmoens in https://github.com/pytorch/rl/pull/1327
* [Refactor] Better GymLikeEnv by vmoens in https://github.com/pytorch/rl/pull/1168
* [Refactor] Better batch-size handling by RBs by vmoens in https://github.com/pytorch/rl/pull/1311
* [Refactor] Better updaters by vmoens in https://github.com/pytorch/rl/pull/1184
* [Refactor] Change objectives parameter/buffer/target logic by vmoens in https://github.com/pytorch/rl/pull/1424
* [Refactor] Edit ppo params by vmoens in https://github.com/pytorch/rl/pull/1322
* [Refactor] Expose all wrappers in torchrl.envs by vmoens in https://github.com/pytorch/rl/pull/1532
* [Refactor] Faster envs (2) by vmoens in https://github.com/pytorch/rl/pull/1457
* [Refactor] Fix imports by vmoens in https://github.com/pytorch/rl/pull/1551
* [Refactor] Follow-up on tensordict PR 473 by vmoens in https://github.com/pytorch/rl/pull/1361
* [Refactor] More unravel fixes by vmoens in https://github.com/pytorch/rl/pull/1357
* [Refactor] Nested reward and done specs by vmoens in https://github.com/pytorch/rl/pull/1115
* [Refactor] Refactor DDPG loss in standalone methods by vmoens in https://github.com/pytorch/rl/pull/1603
* [Refactor] Refactor _reset in ParallelEnv by vmoens in https://github.com/pytorch/rl/pull/1172
* [Refactor] Refactor losses for generalization by vmoens in https://github.com/pytorch/rl/pull/1286
* [Refactor] Remove pkg_resources import by vmoens in https://github.com/pytorch/rl/pull/1379
* [Refactor] Remove private calls to _set by vmoens in https://github.com/pytorch/rl/pull/1370
* [Refactor] Shape ops in LSTM based on tensor shape, not tensordict by vmoens in https://github.com/pytorch/rl/pull/1170
* [Refactor] Use _set_tuple for faster set by vmoens in https://github.com/pytorch/rl/pull/1372
* [Refactor] Use `wait` instead of `is_set` to get results in ParallelEnv by vmoens in https://github.com/pytorch/rl/pull/1562
* [Refactor] Use masking in collectors by vmoens in https://github.com/pytorch/rl/pull/1412
* [Refactor] Vmas nested by matteobettini in https://github.com/pytorch/rl/pull/1366
* [Refactor] the usage of tensordict keys in loss modules by Blonck in https://github.com/pytorch/rl/pull/1175
* [Setup] Update setup.py python versions by vmoens in https://github.com/pytorch/rl/pull/1496
* [Test,BugFix] Fix Jax backend tests by vmoens in https://github.com/pytorch/rl/pull/1162
* [Test,CI,Feature] Total time per test by vmoens in https://github.com/pytorch/rl/pull/1232
* [Test] Remove import of test class by matteobettini in https://github.com/pytorch/rl/pull/1549
* [Test] Skip tests in python 3.11 by vmoens in https://github.com/pytorch/rl/pull/1535
* [Test] Skip threading tests in OSX by vmoens in https://github.com/pytorch/rl/pull/1571
* [Test] Test split trajs by vmoens in https://github.com/pytorch/rl/pull/1445
* [Test] Test state_dict and loss modules by vmoens in https://github.com/pytorch/rl/pull/1527
* [Tests] Collector compatibility for heterogeneous environments by matteobettini in https://github.com/pytorch/rl/pull/1414
* [Tests] DDPG extra critic input tests by matteobettini in https://github.com/pytorch/rl/pull/1568
* [Tutorial] Multiagent PPO tutorial by matteobettini in https://github.com/pytorch/rl/pull/1385
* [Versioning] Python 3.11 by vmoens in https://github.com/pytorch/rl/pull/1433
* [Versioning] Use python 3.8 for GPU tests by vmoens in https://github.com/pytorch/rl/pull/1577
* [Versioning] Write version all cases in setup.py by vmoens in https://github.com/pytorch/rl/pull/1579
* d4rl Test to Nova by osalpekar in https://github.com/pytorch/rl/pull/1293
* python 3.11 in README by vmoens in https://github.com/pytorch/rl/pull/1434

New Contributors
* Blonck made their first contribution in https://github.com/pytorch/rl/pull/1142
* hyerra made their first contribution in https://github.com/pytorch/rl/pull/1161
* skandermoalla made their first contribution in https://github.com/pytorch/rl/pull/1205
* giadefa made their first contribution in https://github.com/pytorch/rl/pull/1298
* MatteoGaetzner made their first contribution in https://github.com/pytorch/rl/pull/1314
* MateuszGuzek made their first contribution in https://github.com/pytorch/rl/pull/1240
* degensean made their first contribution in https://github.com/pytorch/rl/pull/1393
* smorad made their first contribution in https://github.com/pytorch/rl/pull/1399
* kushaangupta made their first contribution in https://github.com/pytorch/rl/pull/1483
* kit1980 made their first contribution in https://github.com/pytorch/rl/pull/1580
* MarkHaoxiang made their first contribution in https://github.com/pytorch/rl/pull/1479
* DanilBaibak made their first contribution in https://github.com/pytorch/rl/pull/1504

A great THANKS to our contributors, in particular (but not in any particular order) skandermoalla, matteobettini, BY571 and albertbou92 for their tremendous dedication.

**Full Changelog**: https://github.com/pytorch/rl/compare/v0.1.1...v0.2.0

0.1.1

What's Changed
* [Feature] Stacking specs by vmoens in https://github.com/pytorch/rl/pull/892
* [Feature] Multicollector interruptor by albertbou92 in https://github.com/pytorch/rl/pull/963
* [BugFix] VMAS api fix by matteobettini in https://github.com/pytorch/rl/pull/978
* [CI] Fix D4RL tests in CI by vmoens in https://github.com/pytorch/rl/pull/976
* [CI] Fix CI by vmoens in https://github.com/pytorch/rl/pull/982
* [Refactor] Binary spec inherits from discrete spec by matteobettini in https://github.com/pytorch/rl/pull/984
* [Feature] `_DataCollector` -> `DataCollectorBase` by vmoens in https://github.com/pytorch/rl/pull/985
* [Feature] Discrete SAC by BY571 in https://github.com/pytorch/rl/pull/882
* [Refactor, Doc] Refactor refs to SafeModule to TensorDictModule unless necessary by vmoens in https://github.com/pytorch/rl/pull/986
* [BugFix] Quickfix by vmoens in https://github.com/pytorch/rl/pull/991
* [Feature] Add Dropout to MLP module by BY571 in https://github.com/pytorch/rl/pull/988
* [Feature] Warn when collectors collect more frames than requested by matteobettini in https://github.com/pytorch/rl/pull/989
* [BugFix] make "_reset", "step_count", and other done_based keys follow done_spec by matteobettini in https://github.com/pytorch/rl/pull/981
* [Feature] Bandit datasets by vmoens in https://github.com/pytorch/rl/pull/912
* [BugFix] Fix sampling in PPO tutorial by vmoens in https://github.com/pytorch/rl/pull/996
* [Refactor] Refactor losses (value function, doc, input batch size) by vmoens in https://github.com/pytorch/rl/pull/987
* [BugFix,Feature,Doc] Fix replay buffers sampling info, docstrings and iteration by vmoens in https://github.com/pytorch/rl/pull/1003
* [Feature] Replace ValueError by warning in collectors when total_frames is not an exact multiple of frames_per_batch by albertbou92 in https://github.com/pytorch/rl/pull/999
* [BugFix] Only call replay buffer transforms when there are by vmoens in https://github.com/pytorch/rl/pull/1008
* [BugFix] Patch tests in 1008 by vmoens in https://github.com/pytorch/rl/pull/1009
* [Feature] Multidim value functions by vmoens in https://github.com/pytorch/rl/pull/1007
* [BugFix] Fix exploration (OU and Gaussian) by vmoens in https://github.com/pytorch/rl/pull/1006
* [CI] Fix python version in habitat by vmoens in https://github.com/pytorch/rl/pull/1010
* Advantages pass `time_dim`and docfix by matteobettini in https://github.com/pytorch/rl/pull/1014
* [Refactor] Faster transformed distributions by vmoens in https://github.com/pytorch/rl/pull/1017
* [WIP, CI] Upgrade cuda channel by vmoens in https://github.com/pytorch/rl/pull/1019
* [BugFix] Fix collector reset with truncation by vmoens in https://github.com/pytorch/rl/pull/1021
* [Refactor] Improve collector performance by matteobettini in https://github.com/pytorch/rl/pull/1020
* [BugFix] Fix params and buffer casting for policies by vmoens in https://github.com/pytorch/rl/pull/1022
* [Feature] PPO allow entropy logging when entropy_coeff is 0 by matteobettini in https://github.com/pytorch/rl/pull/1025
* [Feature] Distributed data collector (ray) by albertbou92 in https://github.com/pytorch/rl/pull/930
* [Refactor] Minor changes in tensordict construction by vmoens in https://github.com/pytorch/rl/pull/1029
* [CI] Fix Brax 0.9.0 by vmoens in https://github.com/pytorch/rl/pull/1011
* [Feature] Multiagent API in vmas by matteobettini in https://github.com/pytorch/rl/pull/983
* [Feature] Benchmarking worflow by vmoens in https://github.com/pytorch/rl/pull/1028
* [Benchmark] Fix adv benchmark by vmoens in https://github.com/pytorch/rl/pull/1030
* [Doc] Refactor DDPG and DQN tutos to narrow the scope by vmoens in https://github.com/pytorch/rl/pull/979
* Revert "[Doc] Refactor DDPG and DQN tutos to narrow the scope" by vmoens in https://github.com/pytorch/rl/pull/1032
* [BugFix] Advantage normalisation in ClipPPOLoss is done after computing gain1 by albertbou92 in https://github.com/pytorch/rl/pull/1033
* [BugFix] Codecov SHA error by vmoens in https://github.com/pytorch/rl/pull/1035
* [Doc] DDPG and DQN refactoring -- Doc cleaning by vmoens in https://github.com/pytorch/rl/pull/1036
* [BugFix,CI] Fix macos codecov install by vmoens in https://github.com/pytorch/rl/pull/1039
* [BugFix] kwargs update in distributed collectors by vmoens in https://github.com/pytorch/rl/pull/1040
* [Feature] `make_composite_from_td` by vmoens in https://github.com/pytorch/rl/pull/1042
* [Refactor] Import envpool locally to avoid importing gym at root level by vmoens in https://github.com/pytorch/rl/pull/1041
* [Minor] Fix a typo by FrankTianTT in https://github.com/pytorch/rl/pull/1046
* [BugFix] Fix param tying in loss modules by vmoens in https://github.com/pytorch/rl/pull/1037
* [Refactor] less ad-hoc disable_env_checker check by vmoens in https://github.com/pytorch/rl/pull/1047
* [Refactor] Improve distributed collectors by vmoens in https://github.com/pytorch/rl/pull/1044
* [Doc] Document tensordict modules by vmoens in https://github.com/pytorch/rl/pull/1053
* [Doc] Minor changes to contributing.md by vmoens in https://github.com/pytorch/rl/pull/1054
* [Doc] A bit more doc on modules by vmoens in https://github.com/pytorch/rl/pull/1056
* [Refactor] Import enum and interaction_type utils by Goldspear in https://github.com/pytorch/rl/pull/1055
* [Feature] Deduplicate calls to common layers in PPO by vmoens in https://github.com/pytorch/rl/pull/1057
* [BugFix] CompositeSpec nested key deletion by btx0424 in https://github.com/pytorch/rl/pull/1059
* [Feature] Add MaskedCategorical distribution by xiaomengy in https://github.com/pytorch/rl/pull/1012
* [Refactor] resetting envs in collectors always passes the _reset entry by vmoens in https://github.com/pytorch/rl/pull/1061
* [Refactor] Better integration of QValue tools by vmoens in https://github.com/pytorch/rl/pull/1063
* MUJOCO_INSTALLATION.md: Fix typo by traversaro in https://github.com/pytorch/rl/pull/1064
* [Refactor] Removes "reward" from root tensordicts by vmoens in https://github.com/pytorch/rl/pull/1065
* [Test] Fix tests for older pytorch versions by vmoens in https://github.com/pytorch/rl/pull/1066
* [Feature] Reward2go Transform by BY571 in https://github.com/pytorch/rl/pull/1038
* [CI] Reduce tests by vmoens in https://github.com/pytorch/rl/pull/1071
* [Feature] Skip existing for advantage modules by vmoens in https://github.com/pytorch/rl/pull/1070
* [BugFix] Fix parallel env data passing on cuda by vmoens in https://github.com/pytorch/rl/pull/1024
* [Refactor] Deprecate interaction_mode by vmoens in https://github.com/pytorch/rl/pull/1067
* [Doc] Update KB: cannot find -lGL by vmoens in https://github.com/pytorch/rl/pull/1073
* [Doc] fix figures display issues in documentation of actors.py by DamienAllonsius in https://github.com/pytorch/rl/pull/1074
* [Example] PPO simplified example by albertbou92 in https://github.com/pytorch/rl/pull/1004
* [Feature] Update td in step (not overwrite) by vmoens in https://github.com/pytorch/rl/pull/1075
* [CI] Remove migrated CircleCI macOS jobs by seemethere in https://github.com/pytorch/rl/pull/1069
* [Feature] Target Return Transform by BY571 in https://github.com/pytorch/rl/pull/1045
* [Test] Fix tensorboard tests with ImageIO 2.26 by vmoens in https://github.com/pytorch/rl/pull/1083
* [Feature] LSTMModule by vmoens in https://github.com/pytorch/rl/pull/1084
* [BugFix] Change default of skip_existing to None by tcbegley in https://github.com/pytorch/rl/pull/1082
* [Example] A2C simplified example by albertbou92 in https://github.com/pytorch/rl/pull/1076
* [BugFix] Fix output_spec transform calls by vmoens in https://github.com/pytorch/rl/pull/1091
* [Feature] Indexing Discrete and OneHot specs by remidomingues in https://github.com/pytorch/rl/pull/1081
* [Refactor] Refactor DQN by vmoens in https://github.com/pytorch/rl/pull/1085
* [Feature] Auto-init updaters and raise a warning if not present by vmoens in https://github.com/pytorch/rl/pull/1092
* [BugFix] Remove false warnings in losses by vmoens in https://github.com/pytorch/rl/pull/1096
* [CI, BugFix] Fix CI warnings and errors by vmoens in https://github.com/pytorch/rl/pull/1100
* [Refactor] Update vmap imports to torch by vmoens in https://github.com/pytorch/rl/pull/1102
* [Refactor] Make advantages non-differentiable by default (except in losses) by vmoens in https://github.com/pytorch/rl/pull/1104
* [Feature] Indexing specs by remidomingues in https://github.com/pytorch/rl/pull/1105
* [BugFix] Fix EnvPoool by vmoens in https://github.com/pytorch/rl/pull/1106
* [Feature,Doc] QValue refactoring and QNet + RNN tuto by vmoens in https://github.com/pytorch/rl/pull/1060
* [BugFix] Fix Gym imports by vmoens in https://github.com/pytorch/rl/pull/1023
* [CI] pytest should not skip tests for dependencies by rohitnig in https://github.com/pytorch/rl/pull/1048
* [BugFix, Doc] Fix tutos by vmoens in https://github.com/pytorch/rl/pull/1107
* [CI] Fix tutos (2) by vmoens in https://github.com/pytorch/rl/pull/1109
* [Doc] Fix doc rendering by vmoens in https://github.com/pytorch/rl/pull/1112
* Added the entry for skip-tests in the environment.yml by rohitnig in https://github.com/pytorch/rl/pull/1113
* [CI] Upgrade ubuntu version in GHA by vmoens in https://github.com/pytorch/rl/pull/1116
* Fix in windows unit test by mischab in https://github.com/pytorch/rl/pull/1099
* Revert "Fix in windows unit test" by mischab in https://github.com/pytorch/rl/pull/1117
* [Nova] Lint job on GHA by osalpekar in https://github.com/pytorch/rl/pull/1114
* [Nova] Remove CircleCI Wheels Builds by osalpekar in https://github.com/pytorch/rl/pull/1121
* [BugFix] Set exploration mode to MODE in all losses by default by vmoens in https://github.com/pytorch/rl/pull/1123
* [BugFix] Instruct the value key to PPOLoss by vmoens in https://github.com/pytorch/rl/pull/1124
* [Feature] CatFrames for offline data by vmoens in https://github.com/pytorch/rl/pull/1122
* [CI] Fix windows CI by vmoens in https://github.com/pytorch/rl/pull/1128
* [Refactor] Buffers tensorclass compat and tutorial by vmoens in https://github.com/pytorch/rl/pull/1101
* [Feature] Marking the time dimension by vmoens in https://github.com/pytorch/rl/pull/1095
* [Doc] Add tuto and time dim info in docs by vmoens in https://github.com/pytorch/rl/pull/1130
* [Doc] Fix locked samples from RBs and ccl of tuto by vmoens in https://github.com/pytorch/rl/pull/1132
* [BugFix] Fix unlock in RB by vmoens in https://github.com/pytorch/rl/pull/1135
* [BugFix] extract the info dict from a list by xmaples in https://github.com/pytorch/rl/pull/1131
* [Feature] Added support for vector-based rewards from environments in MO-Gymnasium by dennismalmgren in https://github.com/pytorch/rl/pull/992
* [Versioning] v0.1.1 by vmoens in https://github.com/pytorch/rl/pull/1137

New Contributors
* FrankTianTT made their first contribution in https://github.com/pytorch/rl/pull/1046
* Goldspear made their first contribution in https://github.com/pytorch/rl/pull/1055
* btx0424 made their first contribution in https://github.com/pytorch/rl/pull/1059
* traversaro made their first contribution in https://github.com/pytorch/rl/pull/1064
* DamienAllonsius made their first contribution in https://github.com/pytorch/rl/pull/1074
* seemethere made their first contribution in https://github.com/pytorch/rl/pull/1069
* remidomingues made their first contribution in https://github.com/pytorch/rl/pull/1081
* rohitnig made their first contribution in https://github.com/pytorch/rl/pull/1048
* mischab made their first contribution in https://github.com/pytorch/rl/pull/1099
* osalpekar made their first contribution in https://github.com/pytorch/rl/pull/1114
* xmaples made their first contribution in https://github.com/pytorch/rl/pull/1131
* dennismalmgren made their first contribution in https://github.com/pytorch/rl/pull/992

**Full Changelog**: https://github.com/pytorch/rl/compare/v0.1.0...v0.1.1

0.1.0

First official beta release of the library!

What's Changed
* QuickFix Versioning by fedebotu in https://github.com/pytorch/rl/pull/958
* Version 0.0.5 by vmoens in https://github.com/pytorch/rl/pull/957
* [Minor] Warning when loading memmap storage on uninitialized td by vmoens in https://github.com/pytorch/rl/pull/961
* [Refactor] Defaults split_trajs to False by vmoens in https://github.com/pytorch/rl/pull/947
* [Feature] InitTracker transform by vmoens in https://github.com/pytorch/rl/pull/962
* [Feature] RenameTransform by vmoens in https://github.com/pytorch/rl/pull/964
* [Feature] Implicit Q-Learning (IQL) by BY571 in https://github.com/pytorch/rl/pull/933
* [Refactor] Refactor data collectors constructors by vmoens in https://github.com/pytorch/rl/pull/970
* [Feature, Refactor] Iterable replay buffers by vmoens in https://github.com/pytorch/rl/pull/968
* [Doc] README rewrite by vmoens in https://github.com/pytorch/rl/pull/971
* [Refactor] A less verbose torchrl by vmoens in https://github.com/pytorch/rl/pull/973
* [Feature] `torch.distributed` collectors by vmoens in https://github.com/pytorch/rl/pull/934
* [Feature] Offline datasets: D4RL by vmoens in https://github.com/pytorch/rl/pull/928


**Full Changelog**: https://github.com/pytorch/rl/compare/v0.0.5...v0.1.0

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.