Tianshou

Latest version: v1.0.0

Safety actively analyzes 628924 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

0.4.2

Enhancement

1. Add model-free dqn family: IQN (371), FQF (376)
1. Add model-free on-policy algorithm: NPG (344, 347), TRPO (337, 340)
1. Add offline-rl algorithm: CQL (359), CRR (367)
1. Support deterministic evaluation for onpolicy algorithms (354)
1. Make trainer resumable (350)
1. Support different state size and fix exception in venv.\_\_del\_\_ (352, 384)
1. Add vizdoom example (384)
1. Add numerical analysis tool and interactive plot (335, 341)

0.4.1

API Change

1. Add observation normalization in BaseVectorEnv (`norm_obs`, `obs_rms`, `update_obs_rms` and `RunningMeanStd`) (308)
2. Add `policy.map_action` to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (313)
3. Add `lr_scheduler` in on-policy algorithms, typically for `LambdaLR` (318)

Note

To adapt with this version, you should change the `action_range=...` to `action_space=env.action_space` in policy initialization.

Bug Fix

1. Fix incorrect behaviors (error when `n/ep==0` and reward shown in tqdm) with on-policy algorithm (306, 328)
2. Fix q-value mask_action error for obs_next (310)

Enhancement

1. Release SOTA Mujoco benchmark (DDPG/TD3/SAC: 305, REINFORCE: 320, A2C: 325, PPO: 330) and add corresponding notes in [/examples/mujoco/README.md](/examples/mujoco/README.md)
2. Fix `numpy>=1.20` typing issue (323)
3. Add cross-platform unittest (331)
4. Add a test on how to deal with finite env (324)
5. Add value normalization in on-policy algorithms (319, 321)
6. Separate advantage normalization and value normalization in PPO (329)

0.4.0

This release contains several API and behavior changes.

API Change

Buffer

1. Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (278, 280);
2. Change `buffer.add` API from `buffer.add(obs, act, rew, done, obs_next, info, policy, ...)` to `buffer.add(batch, buffer_ids)` in order to add data more efficient (280);
2. Add `set_batch` method in buffer (278);
3. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data (278);
4. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose `done==False`) (278);
5. Add internal method `_alloc_by_keys_diff` in batch to support any form of keys pop up (280);

Collector

1. Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (280);
2. Drop `collector.collect(n_episode=List[int])` because the new collector can collect episodes without bias (280);
3. Move `reward_metric` from Collector to trainer (280);
4. Change `Collector.collect` logic: `AsyncCollector.collect`'s semantic is the same as previous version, where `collect(n_step or n_episode)` will not collect exact n_step or n_episode transitions; `Collector.collect(n_step or n_episode)`'s semantic now changes to **exact** n_step or n_episode collect (280);

Policy

1. Add `policy.exploration_noise(action, batch) -> action` method instead of implemented in `policy.forward()` (280);
2. Add `Timelimit.truncate` handler in `compute_*_returns` (296);
3. remove `ignore_done` flag (296);
4. remove `reward_normalization` option in offpolicy-algorithm (will raise Error if set to True) (298);

Trainer

1. Change `collect_per_step` to `step_per_collect` (293);
2. Add `update_per_step` and `episode_per_collect` (293);
3. `onpolicy_trainer` now supports either step_collect or episode_collect (293)
4. Add BasicLogger and LazyLogger to log data more conveniently (295)

Bug Fix

1. Fix VectorEnv action_space seed randomness -- when call `env.seed(seed)`, it will call `env.action_space.seed(seed)`; otherwise using `Collector.collect(..., random=True)` will produce different result each time (300, 303).

0.3.2

Bug Fix

1. fix networks under `utils/discrete` and `utils/continuous` cannot work well under CUDA+torch<=1.6.0 (289)
2. fix 2 bugs of Batch: creating keys in `Batch.__setitem__` now throws `ValueError` instead of `KeyError`; `_create_value` now allows placeholder with `stack=False` option (284)

Enhancement

1. Add QR-DQN algorithm (276)

0.3.1

API Change

1. change `utils.network` args to support any form of MLP by default (275), remove `layer_num` and `hidden_layer_size`, add `hidden_sizes` (a list of int indicate the network architecture)
2. add HDF5 save/load method for ReplayBuffer (261)
3. add offline_trainer (263)
4. move Atari-related network to `examples/atari/atari_network.py` (275)

Bug Fix

1. fix a potential bug in discrete behavior cloning policy (263)

Enhancement

1. update SAC mujoco result (246)
2. add C51 algorithm with benchmark result (266)
3. enable type checking in `utils.network` (275)

0.3.0.post1

Several bug fix (trainer, test and docs)

Page 3 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.