- Removed agent arguments `execution`, `buffer_observe`, `seed`
- Renamed agent arguments `baseline_policy`/`baseline_network`/`critic_network` to `baseline`/`critic`
- Renamed agent `reward_estimation` arguments `estimate_horizon` to `predict_horizon_values`, `estimate_actions` to `predict_action_values`, `estimate_terminal` to `predict_terminal_values`
- Renamed agent argument `preprocessing` to `state_preprocessing`
- Default agent preprocessing `linear_normalization`
- Moved agent arguments for reward/return/advantage processing from `preprocessing` to `reward_preprocessing` and `reward_estimation[return_/advantage_processing]`
- New agent argument `config` with values `buffer_observe`, `enable_int_action_masking`, `seed`
- Renamed PPO/TRPO/DPG argument `critic_network`/`_optimizer` to `baseline`/`baseline_optimizer`
- Renamed PPO argument `optimization_steps` to `multi_step`
- New TRPO argument `subsampling_fraction`
- Changed agent argument `use_beta_distribution` default to false
- Added double DQN agent (`double_dqn`)
- Removed `Agent.act()` argument `evaluation`
- Removed agent function arguments `query` (functionality removed)
- Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): `save`/`load` functions and `saver` argument changed
- Default behavior when specifying `saver` is not to load agent, unless agent is created via `Agent.load`
- Agent summarizer functionality changed: `summarizer` argument changed, some summary labels and other options removed
- Renamed RNN layers `internal_{rnn/lstm/gru}` to `rnn/lstm/gru` and `rnn/lstm/gru` to `input_{rnn/lstm/gru}`
- Renamed `auto` network argument `internal_rnn` to `rnn`
- Renamed `(internal_)rnn/lstm/gru` layer argument `length` to `horizon`
- Renamed `update_modifier_wrapper` to `optimizer_wrapper`
- Renamed `optimizing_step` to `linesearch_step`, and `UpdateModifierWrapper` argument `optimizing_iterations` to `linesearch_iterations`
- Optimizer `subsampling_step` accepts both absolute (int) and relative (float) fractions
- Objective `policy_gradient` argument `ratio_based` renamed to `importance_sampling`
- Added objectives `state_value` and `action_value`
- Added `Gaussian` distribution arguments `global_stddev` and `bounded_transform` (for improved bounded action space handling)
- Changed default memory `device` argument to `CPU:0`
- Renamed rewards summaries
- `Agent.create()` accepts act-function as `agent` argument for recording
- Singleton states and actions are now consistently handled as singletons
- Major change to policy handling and defaults, in particular `parametrized_distributions`, new default policies `parametrized_state/action_value`
- Combined `long` and `int` type
- Always wrap environment in `EnvironmentWrapper` class
- Changed `tune.py` arguments