This release adds support for [DeepSpeed](https://github.com/microsoft/DeepSpeed). While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.
It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by `accelerate config` and then launching your script with `accelerate launch` is enough, there are no changes in the main API.
DeepSpeed support
- Add DeepSpeed support 82 (vasudevgupta7)
- DeepSpeed documentation 140 (sgugger)
Multinode CPU support
- Add distributed multi-node cpu only support (MULTI_CPU) 63 (ddkalamk)
Various fixes
- Fix batch_sampler error for IterableDataset 62 (ddkalamk)
- Honor namedtuples in inputs/outputs 67 (sgugger)
- Fix examples README 70 (cccntu)
- TPU not available in kaggle 73 (yuangan)
- Pass args in notebook_launcher for multi-GPU 78 (sgugger)
- Fix `accelerate test` with no config file 79 (cccntu)
- Use `optimizer` for consistency 81 (kumapo)
- Update README.md 87 (Separius)
- Add `unscale_gradients` method. 88 (sgugger)
- Add Accelerator.free_memory 89 (sgugger)
- [Feature] Add context manager to allow main process first. 98 (Guillem96)
- Pass along kwargs to backward 104 (sgugger)
- Add course banner 107 (sgugger)
- added closure argument to optimizer.step() 105 (pmelchior)
- Fix import error for torch 1.4.0 108 (sgugger)
- Unwrap optimizer before unscaling 115 (sgugger)
- Fix DataLoader length when split_batches=True 121 (sgugger)
- Fix `OptimWrapper` init 127 (sgugger)
- Fix fp16 by converting outputs back to FP32 134 (sgugger)
- Add caveat on weight-tying on TPUs 138 (sgugger)
- Add optimizer not stepped property 139 (sgugger)