This release contains several API and behavior changes.
API Change
Buffer
1. Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (278, 280);
2. Change `buffer.add` API from `buffer.add(obs, act, rew, done, obs_next, info, policy, ...)` to `buffer.add(batch, buffer_ids)` in order to add data more efficient (280);
2. Add `set_batch` method in buffer (278);
3. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data (278);
4. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose `done==False`) (278);
5. Add internal method `_alloc_by_keys_diff` in batch to support any form of keys pop up (280);
Collector
1. Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (280);
2. Drop `collector.collect(n_episode=List[int])` because the new collector can collect episodes without bias (280);
3. Move `reward_metric` from Collector to trainer (280);
4. Change `Collector.collect` logic: `AsyncCollector.collect`'s semantic is the same as previous version, where `collect(n_step or n_episode)` will not collect exact n_step or n_episode transitions; `Collector.collect(n_step or n_episode)`'s semantic now changes to **exact** n_step or n_episode collect (280);
Policy
1. Add `policy.exploration_noise(action, batch) -> action` method instead of implemented in `policy.forward()` (280);
2. Add `Timelimit.truncate` handler in `compute_*_returns` (296);
3. remove `ignore_done` flag (296);
4. remove `reward_normalization` option in offpolicy-algorithm (will raise Error if set to True) (298);
Trainer
1. Change `collect_per_step` to `step_per_collect` (293);
2. Add `update_per_step` and `episode_per_collect` (293);
3. `onpolicy_trainer` now supports either step_collect or episode_collect (293)
4. Add BasicLogger and LazyLogger to log data more conveniently (295)
Bug Fix
1. Fix VectorEnv action_space seed randomness -- when call `env.seed(seed)`, it will call `env.action_space.seed(seed)`; otherwise using `Collector.collect(..., random=True)` will produce different result each time (300, 303).