Stable-baselines

Latest version: v2.10.2

Safety actively analyzes 629811 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 5

0.10.0

Breaking Changes

- **Warning:** Renamed ``common.cmd_util`` to ``common.env_util`` for clarity (affects ``make_vec_env`` and ``make_atari_env`` functions)

New Features

- Allow custom actor/critic network architectures using ``net_arch=dict(qf=[400, 300], pi=[64, 64])`` for off-policy algorithms (SAC, TD3, DDPG)
- Added Hindsight Experience Replay ``HER``. (megan-klaiber)
- ``VecNormalize`` now supports ``gym.spaces.Dict`` observation spaces
- Support logging videos to Tensorboard (SwamyDev)
- Added ``share_features_extractor`` argument to ``SAC`` and ``TD3`` policies

Bug Fixes

- Fix GAE computation for on-policy algorithms (off-by one for the last value) (thanks Wovchena)
- Fixed potential issue when loading a different environment
- Fix ignoring the exclude parameter when recording logs using json, csv or log as logging format (SwamyDev)
- Make ``make_vec_env`` support the ``env_kwargs`` argument when using an env ID str (ManifoldFR)
- Fix model creation initializing CUDA even when `device="cpu"` is provided
- Fix ``check_env`` not checking if the env has a Dict actionspace before calling ``_check_nan`` (wmmc88)
- Update the check for spaces unsupported by Stable Baselines 3 to include checks on the action space (wmmc88)
- Fixed feature extractor bug for target network where the same net was shared instead
of being separate. This bug affects ``SAC``, ``DDPG`` and ``TD3`` when using ``CnnPolicy`` (or custom feature extractor)
- Fixed a bug when passing an environment when loading a saved model with a ``CnnPolicy``, the passed env was not wrapped properly
(the bug was introduced when implementing ``HER`` so it should not be present in previous versions)

Others

- Improved typing coverage
- Improved error messages for unsupported spaces
- Added ``.vscode`` to the gitignore

Documentation

- Added first draft of migration guide
- Added intro to [imitation](https://github.com/HumanCompatibleAI/imitation) library (shwang)
- Enabled doc for ``CnnPolicies``
- Added advanced saving and loading example
- Added base doc for exporting models
- Added example for getting and setting model parameters

0.9.0

Breaking Changes:

- Removed ``device`` keyword argument of policies; use ``policy.to(device)`` instead. (qxcv)
- Rename ``BaseClass.get_torch_variables`` -> ``BaseClass._get_torch_save_params`` and
``BaseClass.excluded_save_params`` -> ``BaseClass._excluded_save_params``
- Renamed saved items ``tensors`` to ``pytorch_variables`` for clarity
- ``make_atari_env``, ``make_vec_env`` and ``set_random_seed`` must be imported with (and not directly from ``stable_baselines3.common``):

python
from stable_baselines3.common.cmd_util import make_atari_env, make_vec_env
from stable_baselines3.common.utils import set_random_seed

New Features:

- Added ``unwrap_vec_wrapper()`` to ``common.vec_env`` to extract ``VecEnvWrapper`` if needed
- Added ``StopTrainingOnMaxEpisodes`` to callback collection (xicocaio)
- Added ``device`` keyword argument to ``BaseAlgorithm.load()`` (liorcohen5)
- Callbacks have access to rollout collection locals as in SB2. (PartiallyTyped)
- Added ``get_parameters`` and ``set_parameters`` for accessing/setting parameters of the agent
- Added actor/critic loss logging for TD3. (mloo3)

Bug Fixes:

- Fixed a bug where the environment was reset twice when using ``evaluate_policy``
- Fix logging of ``clip_fraction`` in PPO (diditforlulz273)
- Fixed a bug where cuda support was wrongly checked when passing the GPU index, e.g., ``device="cuda:0"`` (liorcohen5)
- Fixed a bug when the random seed was not properly set on cuda when passing the GPU index

Others:

- Improve typing coverage of the ``VecEnv``
- Fix type annotation of ``make_vec_env`` (ManifoldFR)
- Removed ``AlreadySteppingError`` and ``NotSteppingError`` that were not used
- Fixed typos in SAC and TD3
- Reorganized functions for clarity in ``BaseClass`` (save/load functions close to each other, private
functions at top)
- Clarified docstrings on what is saved and loaded to/from files
- Simplified ``save_to_zip_file`` function by removing duplicate code
- Store library version along with the saved models
- DQN loss is now logged

Documentation:

- Added ``StopTrainingOnMaxEpisodes`` details and example (xicocaio)
- Updated custom policy section (added custom feature extractor example)
- Re-enable ``sphinx_autodoc_typehints``
- Updated doc style for type hints and remove duplicated type hints

0.8.0

Breaking Changes:

- ``AtariWrapper`` and other Atari wrappers were updated to match SB2 ones
- ``save_replay_buffer`` now receives as argument the file path instead of the folder path (tirafesi)
- Refactored ``Critic`` class for ``TD3`` and ``SAC``, it is now called ``ContinuousCritic``
and has an additional parameter ``n_critics``
- ``SAC`` and ``TD3`` now accept an arbitrary number of critics (e.g. ``policy_kwargs=dict(n_critics=3)``)
instead of only 2 previously

New Features:

- Added ``DQN`` Algorithm (Artemis-Skade)
- Buffer dtype is now set according to action and observation spaces for ``ReplayBuffer``
- Added warning when allocation of a buffer may exceed the available memory of the system
when ``psutil`` is available
- Saving models now automatically creates the necessary folders and raises appropriate warnings (PartiallyTyped)
- Refactored opening paths for saving and loading to use strings, pathlib or io.BufferedIOBase (PartiallyTyped)
- Added ``DDPG`` algorithm as a special case of ``TD3``.
- Introduced ``BaseModel`` abstract parent for ``BasePolicy``, which critics inherit from.

Bug Fixes:

- Fixed a bug in the ``close()`` method of ``SubprocVecEnv``, causing wrappers further down in the wrapper stack to not be closed. (NeoExtended)
- Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states
- Use ``cloudpickle.load`` instead of ``pickle.load`` in ``CloudpickleWrapper``. (shwang)
- Fixed a bug with orthogonal initialization when `bias=False` in custom policy (rk37)
- Fixed approximate entropy calculation in PPO and A2C. (andyshih12)
- Fixed DQN target network sharing feature extractor with the main network.
- Fixed storing correct ``dones`` in on-policy algorithm rollout collection. (andyshih12)
- Fixed number of filters in final convolutional layer in NatureCNN to match original implementation.

Others:

- Refactored off-policy algorithm to share the same ``.learn()`` method
- Split the ``collect_rollout()`` method for off-policy algorithms
- Added ``_on_step()`` for off-policy base class
- Optimized replay buffer size by removing the need of ``next_observations`` numpy array
- Optimized polyak updates (1.5-1.95 speedup) through inplace operations (PartiallyTyped)
- Switch to ``black`` codestyle and added ``make format``, ``make check-codestyle`` and ``commit-checks``
- Ignored errors from newer pytype version
- Added a check when using ``gSDE``
- Removed codacy dependency from Dockerfile
- Added ``common.sb2_compat.RMSpropTFLike`` optimizer, which corresponds closer to the implementation of RMSprop from Tensorflow.

Documentation:

- Updated notebook links
- Fixed a typo in the section of Enjoy a Trained Agent, in RL Baselines3 Zoo README. (blurLake)
- Added Unity reacher to the projects page (koulakis)
- Added PyBullet colab notebook
- Fixed typo in PPO example code (joeljosephjin)
- Fixed typo in custom policy doc (RaphaelWag)

0.7.0

Breaking Changes:

- ``render()`` method of ``VecEnvs`` now only accept one argument: ``mode``
- Created new file common/torch_layers.py, similar to SB refactoring

- Contains all PyTorch network layer definitions and feature extractors: ``MlpExtractor``, ``create_mlp``, ``NatureCNN``

- Renamed ``BaseRLModel`` to ``BaseAlgorithm`` (along with offpolicy and onpolicy variants)
- Moved on-policy and off-policy base algorithms to ``common/on_policy_algorithm.py`` and ``common/off_policy_algorithm.py``, respectively.
- Moved ``PPOPolicy`` to ``ActorCriticPolicy`` in common/policies.py
- Moved ``PPO`` (algorithm class) into ``OnPolicyAlgorithm`` (``common/on_policy_algorithm.py``), to be shared with A2C
- Moved following functions from ``BaseAlgorithm``:

- ``_load_from_file`` to ``load_from_zip_file`` (save_util.py)
- ``_save_to_file_zip`` to ``save_to_zip_file`` (save_util.py)
- ``safe_mean`` to ``safe_mean`` (utils.py)
- ``check_env`` to ``check_for_correct_spaces`` (utils.py. Renamed to avoid confusion with environment checker tools)

- Moved static function ``_is_vectorized_observation`` from common/policies.py to common/utils.py under name ``is_vectorized_observation``.
- Removed ``{save,load}_running_average`` functions of ``VecNormalize`` in favor of ``load/save``.
- Removed ``use_gae`` parameter from ``RolloutBuffer.compute_returns_and_advantage``.

Bug Fixes:

- Fixed ``render()`` method for ``VecEnvs``
- Fixed ``seed()`` method for ``SubprocVecEnv``
- Fixed loading on GPU for testing when using gSDE and ``deterministic=False``
- Fixed ``register_policy`` to allow re-registering same policy for same sub-class (i.e. assign same value to same key).
- Fixed a bug where the gradient was passed when using ``gSDE`` with ``PPO``/``A2C``, this does not affect ``SAC``

Others:

- Re-enable unsafe ``fork`` start method in the tests (was causing a deadlock with tensorflow)
- Added a test for seeding ``SubprocVecEnv`` and rendering
- Fixed reference in NatureCNN (pointed to older version with different network architecture)
- Fixed comments saying "CxWxH" instead of "CxHxW" (same style as in torch docs / commonly used)
- Added bit further comments on register/getting policies ("MlpPolicy", "CnnPolicy").
- Renamed ``progress`` (value from 1 in start of training to 0 in end) to ``progress_remaining``.
- Added ``policies.py`` files for A2C/PPO, which define MlpPolicy/CnnPolicy (renamed ActorCriticPolicies).
- Added some missing tests for ``VecNormalize``, ``VecCheckNan`` and ``PPO``.

Documentation:

- Added a paragraph on "MlpPolicy"/"CnnPolicy" and policy naming scheme under "Developer Guide"
- Fixed second-level listing in changelog

0.6.0

Breaking Changes:

- Remove State-Dependent Exploration (SDE) support for ``TD3``
- Methods were renamed in the logger:
- ``logkv`` -> ``record``, ``writekvs`` -> ``write``, ``writeseq`` -> ``write_sequence``,
- ``logkvs`` -> ``record_dict``, ``dumpkvs`` -> ``dump``,
- ``getkvs`` -> ``get_log_dict``, ``logkv_mean`` -> ``record_mean``,

New Features:

- Added env checker (Sync with Stable Baselines)
- Added ``VecCheckNan`` and ``VecVideoRecorder`` (Sync with Stable Baselines)
- Added determinism tests
- Added ``cmd_util`` and ``atari_wrappers``
- Added support for ``MultiDiscrete`` and ``MultiBinary`` observation spaces (rolandgvc)
- Added ``MultiCategorical`` and ``Bernoulli`` distributions for PPO/A2C (rolandgvc)
- Added support for logging to tensorboard (rolandgvc)
- Added ``VectorizedActionNoise`` for continuous vectorized environments (PartiallyTyped)
- Log evaluation in the ``EvalCallback`` using the logger

Bug Fixes:

- Fixed a bug that prevented model trained on cpu to be loaded on gpu
- Fixed version number that had a new line included
- Fixed weird seg fault in docker image due to FakeImageEnv by reducing screen size
- Fixed ``sde_sample_freq`` that was not taken into account for SAC
- Pass logger module to ``BaseCallback`` otherwise they cannot write in the one used by the algorithms

Others:

- Renamed to Stable-Baseline3
- Added Dockerfile
- Sync ``VecEnvs`` with Stable-Baselines
- Update requirement: ``gym>=0.17``
- Added ``.readthedoc.yml`` file
- Added ``flake8`` and ``make lint`` command
- Added Github workflow
- Added warning when passing both ``train_freq`` and ``n_episodes_rollout`` to Off-Policy Algorithms

Documentation:

- Added most documentation (adapted from Stable-Baselines)
- Added link to CONTRIBUTING.md in the README (kinalmehta)
- Added gSDE project and update docstrings accordingly
- Fix ``TD3`` example code block

0.1.6

- Fixed tf.session().__enter__() being used, rather than sess = tf.session() and passing the session to the objects
- Fixed uneven scoping of TensorFlow Sessions throughout the code
- Fixed rolling vecwrapper to handle observations that are not only grayscale images
- Fixed deepq saving the environment when trying to save itself
- Fixed ValueError: Cannot take the length of Shape with unknown rank. in acktr, when running run_atari.py script.
- Fixed calling baselines sequentially no longer creates graph conflicts
- Fixed mean on empty array warning with deepq
- Fixed kfac eigen decomposition not cast to float64, when the parameter use_float64 is set to True
- Fixed Dataset data loader, not correctly resetting id position if shuffling is disabled
- Fixed EOFError when reading from connection in the worker in subproc_vec_env.py
- Fixed behavior_clone weight loading and saving for GAIL
- Avoid taking root square of negative number in `trpo_mpi.py`
- Removed some duplicated code (a2cpolicy, trpo_mpi)
- Removed unused, undocumented and crashing function reset_task in subproc_vec_env.py
- Reformated code to PEP8 style
- Documented all the codebase
- Added atari tests
- Added logger tests

Missing: tests for acktr continuous (+ HER, gail but they rely on mujoco...)

Page 5 of 5

Releases

Has known vulnerabilities

Stable-baselines

Page 5 of 5

0.10.0

0.9.0

0.8.0

0.7.0

0.6.0

0.1.6

Page 5 of 5

Links

Releases