Environments
To learn more about new and improved environments, see our [Example Environments page](../master/docs/Learning-Environment-Examples.md).
New
* **Walker** - Humanoid physics based agent. The agents must move its body toward the goal direction as quickly as possible without falling.
* **Pyramids** - Sparse reward environment. The agent must press a button, then topple a pyramid of blocks to get the golden brick at the top. Used to demonstrate Curiosity.
Improved
* Revamped the Crawler environment
* Added visual observation based scenes for :
* BananaCollector
* PushBlock
* Hallway
* Pyramids
* Added Imitation Learning based scenes for :
* Tennis
* Bouncer
* PushBlock
* Hallway
* Pyramids
New Features
* **[Unity]** In Editor Training - It is now possible to train agents directly in the editor without building the scene. For more information, see [here](../master/docs/Basic-Guide.mdtraining-the-brain-with-reinforcement-learning).
* **[Training]** Curiosity-Driven Exploration - Addition of curiosity-based intrinsic reward signal when using PPO. Enable by setting `use_curiosity` brain training hyperparameter to `true`.
* **[Unity]** Support for providing player input using axes within the Player Brain.
* **[Unity]** TensorFlowSharp Plugin has been upgraded to version 1.7.1.
Changes
* Main ML-Agents code now within `MLAgents` namespace. Ensure that the `MLAgents` namespace is added to necessary project scripts such as Agent classes.
* ASCII art added to `learn.py` script.
* Communication now uses gRPC and Protobuf. JSON libraries removed.
* TensorBoard now reports mean absolute loss as opposed to total loss update loop.
* PPO algorithm now uses wider gaussian output for Continuous Control models (increasing performance).
Documentation
* Added Quick Start and & FAQ sections to the documentation.
* Added documentation explaining how to use ML-Agents on Microsoft Azure.
* Added benchmark reward thresholds for example environments.
Fixes & Performance Improvements
* Episode length is now properly reported in TensorBoard in the first episode.
* Behavioral Cloning now works with LSTM models.
Known Issues
* Curiosity-driven exploration does not function with On-Demand Decision Making. Expect a fix in v0.4.0a.
Acknowledgements
Thanks to everyone at Unity who contributed to v0.4, as well as: sterlingcrispin, ChrisRisner, akmadian, animaleja32, LeighS, and 5665tm.