Algorithm Implementation
1. n_step returns for all Q-learning based algorithms; (51)
2. Auto alpha tuning in SAC (80)
3. Reserve `policy._state` to support saving hidden states in replay buffer (19)
4. Add `sample_avail` argument in ReplayBuffer to sample only available index in RNN training mode (19)
New Feature
1. Batch.cat (87), Batch.stack (93), Batch.empty (106, 110)
2. Advanced slicing method of Batch (106)
2. `Batch(kwargs, copy=True)` will perform a deep copy (110)
4. Add `random=True` argument in collector.collect to perform sampling with random policy (78)
API Change
1. `Batch.append` -> `Batch.cat`
2. Remove atari wrapper to examples, since it is not a key feature in tianshou (124)
3. Add some pre-defined nets in `tianshou.utils.net`. Since we only define API instead of a class, we do not present it in `tianshou.net`. (123)
Docs
Add cheatsheet: https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html