Tensorforce

Latest version: v0.6.5

Safety actively analyzes 621008 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.6.5

Agents:
- Renamed agent argument `reward_preprocessing` to `reward_processing`, and in case of Tensorforce agent moved to `reward_estimation[reward_processing]`

Distributions:
- New `categorical` distribution argument `skip_linear` to not add the implicit linear logits layer

Environments:
- Support for multi-actor parallel environments via new function `Environment.num_actors()`
- `Runner` uses multi-actor parallelism by default if environment is multi-actor
- New optional `Environment` function `episode_return()` which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner display

Examples:
- New `vectorized_environment.py` and `multiactor_environment.py` script to illustrate how to setup a vectorized/multi-actor environment.

0.6.4

Agents:
- Agent argument `update_frequency` / `update[frequency]` now supports float values > 0.0, which specify the update-frequency relative to the batch-size
- Changed default value for argument `update_frequency` from `1.0` to `0.25` for DQN, DoubleDQN, DuelingDQN agents
- New argument `return_processing` and `advantage_processing` (where applicable) for all agent sub-types
- New function `Agent.get_specification()` which returns the agent specification as dictionary
- New function `Agent.get_architecture()` which returns a string representation of the network layer architecture

Modules:
- Improved and simplified module specification, for instance: `network=my_module` instead of `network=my_module.TestNetwork`, or `environment=envs.custom_env` instead of `environment=envs.custom_env.CustomEnvironment` (module file needs to be in the same directory or a sub-directory)

Networks:
- New argument `single_output=True` for some policy types which, if `False`, allows the specification of additional network outputs for some/all actions via registered tensors
- `KerasNetwork` argument `model` now supports arbitrary functions as long as they return a `tf.keras.Model`

Layers:
- New layer type `SelfAttention` (specification key: `self_attention`)

Parameters:
- Support tracking of non-constant parameter values

Runner:
- Rename attribute `episode_rewards` as `episode_returns`, and TQDM status `reward` as `return`
- Extend argument `agent` to support `Agent.load()` keyword arguments to load an existing agent instead of creating a new one.

Examples:
- Added `action_masking.py` example script to illustrate an environment implementation with built-in action masking.

Buxfixes:
- Customized device placement was not applied to most tensors

0.6.3

Agents:
- New agent argument `tracking` and corresponding function `tracked_tensors()` to track and retrieve the current value of predefined tensors, similar to `summarizer` for TensorBoard summaries
- New experimental value `trace_decay` and `gae_decay` for Tensorforce agent argument `reward_estimation`, soon for other agent types as well
- New options `"early"` and `"late"` for value `estimate_advantage` of Tensorforce agent argument `reward_estimation`
- Changed default value for `Agent.act()` argument `deterministic` from `False` to `True`

Networks:
- New network type `KerasNetwork` (specification key: `keras`) as wrapper for networks specified as Keras model
- Passing a Keras model class/object as policy/network argument is automatically interpreted as `KerasNetwork`

Distributions:
- Changed `Gaussian` distribution argument `global_stddev=False` to `stddev_mode='predicted'`
- New `Categorical` distribution argument `temperature_mode=None`

Layers:
- New option for `Function` layer argument `function` to pass string function expression with argument "x", e.g. "(x+1.0)/2.0"

Summarizer:
- New summary `episode-length` recorded as part of summary label "reward"

Environments:
- Support for vectorized parallel environments via new function `Environment.is_vectorizable()` and new argument `num_parallel` for `Environment.reset()`
- See `tensorforce/environments.cartpole.py` for a vectorizable environment example
- `Runner` uses vectorized parallelism by default if `num_parallel > 1`, `remote=None` and environment supports vectorization
- See `examples/act_observe_vectorized.py` for more details on act-observe interaction
- New extended and vectorizable custom CartPole environment via key `custom_cartpole` (work in progress)
- New environment argument `reward_shaping` to provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression

run.py script:
- New option for command line arguments `--checkpoints` and `--summaries` to add comma-separated checkpoint/summary filename in addition to directory
- Added episode lengths to logging plot besides episode returns

Buxfixes:
- Temporal horizon handling of RNN layers
- Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN
- GPU problems with scatter operations

0.6.2

Buxfixes:
- Critical bugfix for DQN variants and DPG agent

0.6.1

Agents:
- Removed default value `"adam"` for Tensorforce agent argument `optimizer` (since default optimizer argument `learning_rate` removed, see below)
- Removed option `"minimum"` for Tensorforce agent argument `memory`, use `None` instead
- Changed default value for `dqn`/`double_dqn`/`dueling_dqn` agent argument `huber_loss` from `0.0` to `None`

Layers:
- Removed default value `0.999` for `exponential_normalization` layer argument `decay`
- Added new layer `batch_normalization` (generally should only be used for the agent arguments `reward_processing[return_processing]` and `reward_processing[advantage_processing]`)
- Added `exponential/instance_normalization` layer argument `only_mean` with default `False`
- Added `exponential/instance_normalization` layer argument `min_variance` with default `1e-4`

Optimizers:
- Removed default value `1e-3` for optimizer argument `learning_rate`
- Changed default value for optimizer argument `gradient_norm_clipping` from `1.0` to `None` (no gradient clipping)
- Added new optimizer `doublecheck_step` and corresponding argument `doublecheck_update` for optimizer wrapper
- Removed `linesearch_step` optimizer argument `accept_ratio`
- Removed `natural_gradient` optimizer argument `return_improvement_estimate`

Saver:
- Added option to specify agent argument `saver` as string, which is interpreted as `saver[directory]` with otherwise default values
- Added default value for agent argument `saver[frequency]` as `10` (save model every 10 updates by default)
- Changed default value of agent argument `saver[max_checkpoints]` from `5` to `10`

Summarizer:
- Added option to specify agent argument `summarizer` as string, which is interpreted as `summarizer[directory]` with otherwise default values
- Renamed option of agent argument `summarizer` from `summarizer[labels]` to `summarizer[summaries]` (use of the term "label" due to earlier version, outdated and confusing by now)
- Changed interpretation of agent argument `summarizer[summaries] = "all"` to include only numerical summaries, so all summaries except "graph"
- Changed default value of agent argument `summarizer[summaries]` from `["graph"]` to `"all"`
- Changed default value of agent argument `summarizer[max_summaries]` from `5` to `7` (number of different colors in TensorBoard)
- Added option `summarizer[filename]` to agent argument `summarizer`

Recorder:
- Added option to specify agent argument `recorder` as string, which is interpreted as `recorder[directory]` with otherwise default values

run.py script:
- Added `--checkpoints`/`--summaries`/`--recordings` command line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration

Examples:
- Added `save_load_agent.py` example script to illustrate regular agent saving and loading

Buxfixes
- Fixed problem with optimizer argument `gradient_norm_clipping` not being applied correctly
- Fixed problem with `exponential_normalization` layer not updating moving mean and variance correctly
- Fixed problem with `recent` memory for timestep-based updates sometimes sampling invalid memory indices

0.6.0

- Removed agent arguments `execution`, `buffer_observe`, `seed`
- Renamed agent arguments `baseline_policy`/`baseline_network`/`critic_network` to `baseline`/`critic`
- Renamed agent `reward_estimation` arguments `estimate_horizon` to `predict_horizon_values`, `estimate_actions` to `predict_action_values`, `estimate_terminal` to `predict_terminal_values`
- Renamed agent argument `preprocessing` to `state_preprocessing`
- Default agent preprocessing `linear_normalization`
- Moved agent arguments for reward/return/advantage processing from `preprocessing` to `reward_preprocessing` and `reward_estimation[return_/advantage_processing]`
- New agent argument `config` with values `buffer_observe`, `enable_int_action_masking`, `seed`
- Renamed PPO/TRPO/DPG argument `critic_network`/`_optimizer` to `baseline`/`baseline_optimizer`
- Renamed PPO argument `optimization_steps` to `multi_step`
- New TRPO argument `subsampling_fraction`
- Changed agent argument `use_beta_distribution` default to false
- Added double DQN agent (`double_dqn`)
- Removed `Agent.act()` argument `evaluation`
- Removed agent function arguments `query` (functionality removed)
- Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): `save`/`load` functions and `saver` argument changed
- Default behavior when specifying `saver` is not to load agent, unless agent is created via `Agent.load`
- Agent summarizer functionality changed: `summarizer` argument changed, some summary labels and other options removed
- Renamed RNN layers `internal_{rnn/lstm/gru}` to `rnn/lstm/gru` and `rnn/lstm/gru` to `input_{rnn/lstm/gru}`
- Renamed `auto` network argument `internal_rnn` to `rnn`
- Renamed `(internal_)rnn/lstm/gru` layer argument `length` to `horizon`
- Renamed `update_modifier_wrapper` to `optimizer_wrapper`
- Renamed `optimizing_step` to `linesearch_step`, and `UpdateModifierWrapper` argument `optimizing_iterations` to `linesearch_iterations`
- Optimizer `subsampling_step` accepts both absolute (int) and relative (float) fractions
- Objective `policy_gradient` argument `ratio_based` renamed to `importance_sampling`
- Added objectives `state_value` and `action_value`
- Added `Gaussian` distribution arguments `global_stddev` and `bounded_transform` (for improved bounded action space handling)
- Changed default memory `device` argument to `CPU:0`
- Renamed rewards summaries
- `Agent.create()` accepts act-function as `agent` argument for recording
- Singleton states and actions are now consistently handled as singletons
- Major change to policy handling and defaults, in particular `parametrized_distributions`, new default policies `parametrized_state/action_value`
- Combined `long` and `int` type
- Always wrap environment in `EnvironmentWrapper` class
- Changed `tune.py` arguments

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.