Mlagents

Latest version: v1.0.0

Safety actively analyzes 629723 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 12 of 14

0.3.1b

Fixes

* Behavioral cloning fix (use stored info rather than previous info)
* Value Bootstrap fixed for ppo

0.3.1a

Fixes

* Remove links to out of date Unity Packages
* Fix to the CoreInternalBrain for discrete vector observations
* Retraining of the Basic Environment
* Fixed the normalization of images in the internal brain

0.3.0

Environments

To learn more about new and improved environments, see our [Example Environments page](../master/docs/Learning-Environment-Examples.md).

New

* **Soccer Twos** - Multi-agent competitive and cooperative environment where behavior comes about because of reward function. Used to demonstrate multi-brain training.

* **Banana Collectors** - Multi-agent resource collection environment where competitive or cooperative behavior comes about dynamically based on available resources. Used to demonstrate Imitation Learning.

* **Hallway** - Single agent environment in which an agent must explore a room, remember the object within the room, and use that information to navigate to the correct goal. Used to demonstrate LSTM models.

* **Bouncer** - Single agent environment provided as an example of our new On-Demand Decision-Making feature. In this environment, an agent can apply force to itself in order to bounce around a platform, and attempt to collide with floating bananas.

Improved

* All environments have been visually refreshed with a consistent color pallet and design language.
* Revamped GridWorld to only use visual observations and a 5x5 grid by default.
* Revamped Tennis to use continuous actions.
* Revamped Push Block to use local perception.
* Revamped Wall Jump to use local perception.
* Added Hard version of 3DBall which doesn’t contain velocity information in observations.

New Features

* **[Unity]** On Demand Decision Making - It is now possible to have agents only request decisions from their brains when necessary, using `RequestDecision()` and `RequestAction()`. For more information, see [here](../master/docs/Learning-Environment-Design-Agents.mdon-demand-decision-making).
* **[Unity]** Added vector-observation stacking - The past n vector observations for each agent can now be stored and used as input to a Brain for decision making.
* **[Python]** Added Behavioral Cloning (Imitation Learning) algorithm - Train a neural network to imitate either player behavior or a hand-coded game bot using behavioral cloning. For more info, see [here](../master/docs/Training-Imitation-Learning.md).
* **[Python]** Support for training multiple brains simultaneously - Two or more different brains can now be trained simultaneously using the provided PPO algorithm.
* **[Python]** Added LSTM models - We now support training and embedding recurrent neural networks using the PPO algorithm. This allows for learning temporal dependencies between observations.
* **[Unity] [Python]** Added Docker Image for RL-training - We now provide a Docker image which allows users to train their brains in an isolated environment without the need to install Python, TensorFlow, and other dependencies. For more information, see [here](../master/docs/Using-Docker.md).
* **[Python]** Ability to provide random seed to training process and environment - Allows for reproducible experimentation. For more information, see [here](../master/docs/Python-API.mdloading-a-unity-environment). (Note: Unity Physics is non-deterministic, as such fully-reproducible experiments are currently not possible when using physics based interactions.)

Changes

* **[Unity]** Memory size has been removed as a user-facing brain parameter. It is now defined when creating models from `unitytrainers`.
* **[Unity] [Python]** The API as well as the general semantics used throughout ML-Agents has changed. See [here](../master/docs/Migrating-v0.3.md) for information on these changes, and how to easily adjust current projects to be compatible with these changes.
* **[Python]** Training hyperparameters are now defined in a `.yaml` file instead of via command line arguments.
* **[Python]** Training now takes place via learn.py, which launches trainers for multiple brains.
* **[Python]** Python 2 is no longer supported.

Documentation

Documentation has been significantly re-written to include many new sections, in addition to updated tutorials and guides. Check it out [here](../master/docs).

Fixes & Performance Improvements

* **[Unity]** Improved memory management - Reduced garbage collection memory usage by up to 5x when using External Brain.
* **[Unity]** Time.captureFramerate is now set by default to help sync Update and FixedUpdate frequencies.
* **[Unity]** Added tooltips to relevant inspector objects.
* **[Unity]** It is now possible to instantiate and destroy GameObjects which are Agents.
* **[Unity]** Improved visual observation inference time by 3x.
* **[Unity]** Tooltips added to Unity Inspector for ML-Agents variables and functions.
* **[Unity] [Python]** Epsilon is now a built-in part of PPO graph. It is no longer necessary to specify it additionally in “Graph Placeholders” from Unity.
* **[Python]** Changed value bootstrapping in PPO algorithm to properly calculate returns on episode time-out.
* **[Python]** The neural network graph is now automatically saved as a `.bytes` file when training is interrupted.

Acknowledgements

Thanks to everyone at Unity who contributed to v0.3, as well as:

asolano, LionH, MarcoMeter, srcnalt, wouterhardeman, 60days, floAr, Coac, Zamaroht, slightperturbation

0.2.1d
Fixes
* Fixes bug where visual observations could not be used with PPO.

0.3.0preview

0.3.0b

Fixes

* Fixes internal brain for Banana Imitation.
* Fixes Discrete Control training for Imitation Learning.
* Fixes Visual Observations in internal brain with non-square inputs.

0.3.0a

Fixes
Added the missing Ray Perception components to the agents in the BananaImitation scene.

Page 12 of 14

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.