Tianshou

Latest version: v1.0.0

Safety actively analyzes 619212 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 5

1.0.0

This release focuses on updating and improving Tianshou internals (in particular, code quality) while creating relatively few breaking changes (apart from things like the python and dependencies' version),

We view it as a signficant step for transforming Tianshou into the go-to place both for RL researchers, as well as for RL practitioners working on industry projects.

This is the first release after the [appliedAI Institute](https://www.appliedai-institute.de/en/) (the [TransferLab](https://transferlab.ai/) division) has decided to futher develop provide long-term support to tianshou.

Breaking Changes
- dropped support of python<3.11
- dropped support of gym, from now on only Gymnasium envs are supported
- removed functions like `offpolicy_trainer` in favor of `OffpolicyTrainer(...).run()` (this affects all example scripts)
- several breaking changes related to removing `**kwargs` from signatures, renamings of internal attributes (like `critic1` -> `critic`)
- Outputs of training methods are now dataclasses instead of dicts

Functionality Extensions
Major
- High level interfaces for experiments, demonstrated by the new example scripts with names ending in `_hl.py`
Minor
- Method to compute action directly from a policy's observation, can be used for unrolling
- Support for custom keys in ReplayBuffer
- Support for CalQL as part of CQL
- Support for explicit setting of multiprocessing context for SubprocEnvWorker
- `critic2` no longer has to be explicitly constructed and passed if it is supposed to be the same network as `critic` (formerly `critic1`)

Internal Improvements
Build and Docs
- Completely changed the build pipeline. Tianshou now uses poetry, black, ruff, poethepoet, nbqa and other niceties.
- Notebook tutorials are now part of the repository (previously they were in a drive). They were fixed and are executed during the build as integration tests, in addition to serving as documentation. Parts of the content have been improved.
- Documentation is now built with jupyter book. Javacsript code has been slightly improved, JS dependencies are included as part of the repository.
- Many improvements in docstrings
Typing
- Adding BatchPrototypes to cover the fields needed and returned by methods relying on batches in a backwards compatible way
- Removing `**kwargs` from policies' constructors
- Overall, much stricter and more correct typing. Removing `kwargs` and replacing dicts by dataclasses in several places.
- Making use of `Generic` to express different kinds of stats that can be returned by `learn` and `update`
- Improved typing in `tests` and `examples`, close to passing mypy
General
- Reduced duplication, improved readability and simplified code in several places
- Use `dist.mode` instead of inferring `loc` or `argmax` from the `dist_fn` input

Contributions
The OG creators
- Trinkle23897 participated in almost all aspects of the coordination and reviewed most of the merged PRs
- nuance1979 participated in several discussions
From appliedAI
The team working on this release of Tianshou consisted of opcode81 MischaPanch maxhuettenrauch carlocagnetta bordeauxred
External contributions
- BFAnas participated in several discussions and contributed the CalQL implementation, extending the pre-processing logic.
- dantp-ai fixed many mypy issues and improved the tests
- arnaujc91 improved the logic of computing deterministic actions
- Several other contributors, among them many new ones participated in this release. The Tianshou team is very grateful for your contributions!
* Quoding made their first contribution in https://github.com/thu-ml/tianshou/pull/840
* KwanWaiChung made their first contribution in https://github.com/thu-ml/tianshou/pull/849
* ligengen made their first contribution in https://github.com/thu-ml/tianshou/pull/860
* zhaozj89 made their first contribution in https://github.com/thu-ml/tianshou/pull/861
* blazejosinski made their first contribution in https://github.com/thu-ml/tianshou/pull/894
* smarianimore made their first contribution in https://github.com/thu-ml/tianshou/pull/980
* mturnshek made their first contribution in https://github.com/thu-ml/tianshou/pull/994
* spacefarers made their first contribution in https://github.com/thu-ml/tianshou/pull/1011
* ashok-arora made their first contribution in https://github.com/thu-ml/tianshou/pull/1062

0.5.0

Enhancement

1. Gymnasium Integration (789, Markus28)
2. Implement args/kwargs for init of norm_layers and activation (788, janofsun)
3. Add "act" to preprocess_fn call in collector. (801, jamartinh)
4. Various update (803, 826, Trinkle23897)

Bug fix

1. Fix a bug in batch._is_batch_set (825, zbenmo)
2. Fix a bug in HERReplayBuffer (817, sunkafei)

0.4.11

Enhancement

1. Hindsight Experience Replay as a replay buffer (753, Juno-T)
1. Fix Atari PPO example (780, nuance1979)
2. Update experiment details of MuJoCo benchmark (779, ChenDRAG)
3. Tiny change since the tests are more than unit tests (765, fzyzcjy)

Bug Fix

1. Multi-agent: gym->gymnasium; render() update (769, WillDudley)
1. Updated atari wrappers (781, Markus28)
1. Fix info not pass issue in PGPolicy (787, Trinkle23897)

0.4.10

Enhancement

1. Changes to support Gym 0.26.0 (748, Markus28)
2. Added pre-commit (752, Markus28)
3. Added support for new PettingZoo API (751, Markus28)
4. Fix docs tictactoc dummy vector env (749, 5cat)

Bug fix

1. Fix 2 bugs and refactor RunningMeanStd to support dict obs norm (695, Trinkle23897)
5. Do not allow async simulation for test collector (705, CWHer)
6. Fix venv wrapper reset retval error with gym env (712, Trinkle23897)

0.4.9

Bug Fix

1. Fix save_checkpoint_fn return value to checkpoint_path (659, Trinkle23897)
2. Fix an off-by-one bug in trainer iterator (659, Trinkle23897)
3. Fix a bug in Discrete SAC evaluation; default to deterministic mode (657, nuance1979)
4. Fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting (660, nuance1979)
5. Fix exception with watching pistonball environments (663, ycheng517)
6. Use `env.np_random.integers` instead of `env.np_random.randint` in Atari examples (613, ycheng517)

API Change

1. Upgrade gym to `>=0.23.1`, support `seed` and `return_info` arguments for reset (613, ycheng517)

New Features

1. Add BranchDQN for large discrete action spaces (618, BFAnas)
7. Add show_progress option for trainer (641, michalgregor)
8. Added support for clipping to DQNPolicy (642, michalgregor)
9. Implement TD3+BC for offline RL (660, nuance1979)
10. Add multiDiscrete to discrete gym action space wrapper (664, BFAnas)

Enhancement

1. Use envpool in vizdoom example (634, Trinkle23897)
2. Add Atari (discrete) SAC examples (657, nuance1979)

0.4.8

Bug fix

1. Fix action scaling bug in SAC (591, ChenDRAG)

Enhancement

1. Add write_flush in two loggers, fix argument passing in WandbLogger (581, Trinkle23897)
2. Update Multi-agent RL docs and upgrade pettingzoo (595, ycheng517)
3. Add learning rate scheduler to BasePolicy (598, alexnikulkov)
4. Add Jupyter notebook tutorials using Google Colaboratory (599, ChenDRAG)
5. Unify `utils.network`: change action_dim to action_shape (602, Squeemos)
6. Update Mujoco bemchmark's webpage (606, ChenDRAG)
7. Add Atari results (600, gogoduan) (616, ChenDRAG)
8. Convert RL Unplugged Atari datasets to tianshou ReplayBuffer (621, nuance1979)
9. Implement REDQ (623, Jimenius)
10. Improve data loading from D4RL and convert RL Unplugged to D4RL format (624, nuance1979)
11. Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (628, Trinkle23897)

Page 1 of 5

Releases

Has known vulnerabilities

Tianshou

Page 1 of 5

1.0.0

0.5.0

0.4.11

0.4.10

0.4.9

0.4.8

Page 1 of 5

Links

Releases