Stable-baselines

Latest version: v2.10.2

Safety actively analyzes 629855 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 5

1.0.1

- refactored A2C, ACER, ACTKR, DDPG, DeepQ, GAIL, TRPO, PPO1 and PPO2 under a single constant class
- added callback to refactored algorithm training
- added saving and loading to refactored algorithms
- refactored ACER, DDPG, GAIL, PPO1 and TRPO to fit with A2C, PPO2 and ACKTR policies
- added new policies for most algorithms (Mlp, MlpLstm, MlpLnLstm, Cnn, CnnLstm and CnnLnLstm)
- added dynamic environment switching (so continual RL learning is now feasible)
- added prediction from observation and action probability from observation for all the algorithms
- fixed graphs issues, so models wont collide in names
- fixed behavior_clone weight loading for GAIL
- fixed Tensorflow using all the GPU VRAM
- fixed models so that they are all compatible with vectorized environments
- fixed set_global_seed to update gym.spaces's random seed
- fixed PPO1 and TRPO performance issues when learning identity function
- added new tests for loading, saving, continuous actions and learning the identity function
- fixed DQN wrapping for atari
- added saving and loading for Vecnormalize wrapper
- added automatic detection of action space (for the policy network)
- fixed ACER buffer with constant values assuming n_stack=4
- fixed some RL algorithms not clipping the action to be in the action_space, when using gym.spaces.Box
- refactored algorithms can take either a gym.Environment or a str ([if the environment name is registered](https://github.com/openai/gym/wiki/Environments))
- Hoftix in ACER (compared to v1.0.0)

Future Work :
- Finish refactoring HER
- Refactor ACKTR and ACER for continuous implementation

1.0

**First Major Version**

Blog post: https://araffin.github.io/post/sb3/

100+ pre-trained models in the zoo: https://github.com/DLR-RM/rl-baselines3-zoo

Breaking Changes:

- Removed `stable_baselines3.common.cmd_util` (already deprecated), please use `env_util` instead

<div class="warning">

<div class="admonition-title">

**Warning**

</div>

A refactoring of the `HER` algorithm is planned together with support for dictionary observations (see [PR 243](https://github.com/DLR-RM/stable-baselines3/pull/243) and
[351](https://github.com/DLR-RM/stable-baselines3/pull/351))
This will be a backward incompatible change (model trained with previous version of `HER` won't work with the new version).

</div>

New Features:

- Added support for `custom_objects` when loading models

Bug Fixes:

- Fixed a bug with `DQN` predict method when using `deterministic=False` with image space

Documentation:

- Fixed examples
- Added new project using SB3: rl\_reach (PierreExeter)
- Added note about slow-down when switching to PyTorch
- Add a note on continual learning and resetting environment
- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
- Added images to illustrate the training loop and custom policies (created with <https://excalidraw.com/>)
- Updated the custom policy section

1.0.0

Do not use: bug in ACER, fixed in v1.0.1

1.0rc1

Second release candidate

1.0rc0

0.11.1

Breaking Changes:

- ``evaluate_policy`` now returns rewards/episode lengths from a ``Monitor`` wrapper if one is present,
this allows to return the unnormalized reward in the case of Atari games for instance.
- Renamed ``common.vec_env.is_wrapped`` to ``common.vec_env.is_vecenv_wrapped`` to avoid confusion
with the new ``is_wrapped()`` helper
- Renamed ``_get_data()`` to ``_get_constructor_parameters()`` for policies (this affects independent saving/loading of policies)
- Removed ``n_episodes_rollout`` and merged it with ``train_freq``, which now accepts a tuple ``(frequency, unit)``:
- ``replay_buffer`` in ``collect_rollout`` is no more optional

python

SB3 < 0.11.0
model = SAC("MlpPolicy", env, n_episodes_rollout=1, train_freq=-1)
SB3 >= 0.11.0:
model = SAC("MlpPolicy", env, train_freq=(1, "episode"))

New Features:

- Add support for ``VecFrameStack`` to stack on first or last observation dimension, along with
automatic check for image spaces.
- ``VecFrameStack`` now has a ``channels_order`` argument to tell if observations should be stacked
on the first or last observation dimension (originally always stacked on last).
- Added ``common.env_util.is_wrapped`` and ``common.env_util.unwrap_wrapper`` functions for checking/unwrapping
an environment for specific wrapper.
- Added ``env_is_wrapped()`` method for ``VecEnv`` to check if its environments are wrapped
with given Gym wrappers.
- Added ``monitor_kwargs`` parameter to ``make_vec_env`` and ``make_atari_env``
- Wrap the environments automatically with a ``Monitor`` wrapper when possible.
- ``EvalCallback`` now logs the success rate when available (``is_success`` must be present in the info dict)
- Added new wrappers to log images and matplotlib figures to tensorboard. (zampanteymedio)
- Add support for text records to ``Logger``. (lorenz-h)

Bug Fixes:

- Fixed bug where code added VecTranspose on channel-first image environments (thanks qxcv)
- Fixed ``DQN`` predict method when using single ``gym.Env`` with ``deterministic=False``
- Fixed bug that the arguments order of ``explained_variance()`` in ``ppo.py`` and ``a2c.py`` is not correct (thisray)
- Fixed bug where full ``HerReplayBuffer`` leads to an index error. (megan-klaiber)
- Fixed bug where replay buffer could not be saved if it was too big (> 4 Gb) for python<3.8 (thanks hn2)
- Added informative ``PPO`` construction error in edge-case scenario where ``n_steps * n_envs = 1`` (size of rollout buffer),
which otherwise causes downstream breaking errors in training (decodyng)
- Fixed discrete observation space support when using multiple envs with A2C/PPO (thanks ardabbour)
- Fixed a bug for TD3 delayed update (the update was off-by-one and not delayed when ``train_freq=1``)
- Fixed numpy warning (replaced ``np.bool`` with ``bool``)
- Fixed a bug where ``VecNormalize`` was not normalizing the terminal observation
- Fixed a bug where ``VecTranspose`` was not transposing the terminal observation
- Fixed a bug where the terminal observation stored in the replay buffer was not the right one for off-policy algorithms
- Fixed a bug where ``action_noise`` was not used when using ``HER`` (thanks ShangqunYu)
- Fixed a bug where ``train_freq`` was not properly converted when loading a saved model

Others:

- Add more issue templates
- Add signatures to callable type annotations (ernestum)
- Improve error message in ``NatureCNN``
- Added checks for supported action spaces to improve clarity of error messages for the user
- Renamed variables in the ``train()`` method of ``SAC``, ``TD3`` and ``DQN`` to match SB3-Contrib.
- Updated docker base image to Ubuntu 18.04
- Set tensorboard min version to 2.2.0 (earlier version are apparently not working with PyTorch)
- Added warning for ``PPO`` when ``n_steps * n_envs`` is not a multiple of ``batch_size`` (last mini-batch truncated) (decodyng)
- Removed some warnings in the tests

Documentation:

- Updated algorithm table
- Minor docstring improvements regarding rollout (stheid)
- Fix migration doc for ``A2C`` (epsilon parameter)
- Fix ``clip_range`` docstring
- Fix duplicated parameter in ``EvalCallback`` docstring (thanks tfederico)
- Added example of learning rate schedule
- Added SUMO-RL as example project (LucasAlegre)
- Fix docstring of classes in atari_wrappers.py which were inside the constructor (LucasAlegre)
- Added SB3-Contrib page
- Fix bug in the example code of DQN (AptX395)
- Add example on how to access the tensorboard summary writer directly. (lorenz-h)
- Updated migration guide
- Updated custom policy doc (separate policy architecture recommended)
- Added a note about OpenCV headless version
- Corrected typo on documentation (mschweizer)
- Provide the environment when loading the model in the examples (lorepieri8)

Page 4 of 5

Releases

Has known vulnerabilities

Previous Next

Stable-baselines

Page 4 of 5

1.0.1

1.0

1.0.0

1.0rc1

1.0rc0

0.11.1

Page 4 of 5

Links

Releases