Accelerate

Latest version: v0.29.3

Safety actively analyzes 621907 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 14

0.29.3

* Fixes issue with backend refactor not working on CPU-based distributed environments by jiqing-feng: https://github.com/huggingface/accelerate/pull/2670
* Fixes issue where `load_checkpoint_and_dispatch` needs a `strict` argument
* by SunMarc: https://github.com/huggingface/accelerate/pull/2641

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.29.2...v0.29.3

0.29.2

* Fixes xpu missing parenthesis https://github.com/huggingface/accelerate/pull/2639
* Fixes XLA and performance degradation on init with the state https://github.com/huggingface/accelerate/pull/2634

0.29.1

Fixed an import which would cause running accelerate CLI to fail if pytest wasn't installed

0.29.0

Core
* Accelerate can now optimize NUMA affinity, which can help increase throughput on NVIDIA multi-GPU systems. To enable it either follow the prompt during `accelerate config`, set the `ACCELERATE_CPU_AFFINITY=1` env variable, or manually using the following:
python
from accelerate.utils import set_numa_affinity

For GPU 0
set_numa_affinity(0)

Big thanks to stas00 for the recommendation, request, and feedback during development

* Allow for setting deterministic algorithms in `set_seed` by muellerzr in https://github.com/huggingface/accelerate/pull/2569
* Fixed the test script for TPU v2/v3 by vanbasten23 in https://github.com/huggingface/accelerate/pull/2542
* Cambricon MLU device support introduced by huismiling in https://github.com/huggingface/accelerate/pull/2552
* A big refactor was performed to the PartialState and AcceleratorState to allow for easier future-proofing and simplification of adding new devices by muellerzr in https://github.com/huggingface/accelerate/pull/2576
* Fixed a reproducibility issue in distributed environments with Dataloader shuffling when using `BatchSamplerShard` by universuen in https://github.com/huggingface/accelerate/pull/2584
* `notebook_launcher` can use multiple GPUs in Google Colab if using a custom instance that supports multiple GPUs by StefanTodoran in https://github.com/huggingface/accelerate/pull/2561

Big Model Inference
* Add log message for RTX 4000 series when performing multi-gpu inference with device_map which can lead to hanging by SunMarc in https://github.com/huggingface/accelerate/pull/2557
* Fix `load_checkpoint_in_model` behavior when unexpected keys are in the checkpoint by fxmarty in https://github.com/huggingface/accelerate/pull/2588

DeepSpeed
* Fix issue with the mapping of `main_process_ip` and `master_addr` when not using standard as deepspeed launcher by asdfry in https://github.com/huggingface/accelerate/pull/2495
* Improve deepspeed env gen by checking for bad keys, by muellerzr and ricklamers in https://github.com/huggingface/accelerate/pull/2565
* We now support custom deepspeed env files. Like normal `deepspeed`, set it with the `DS_ENV_FILE` environmental variable by muellerzr in https://github.com/huggingface/accelerate/pull/2566
* Resolve ZeRO-3 Initialization Failure in already-started distributed environments by sword865 in https://github.com/huggingface/accelerate/pull/2578

What's Changed
* Fix test_script.py on TPU v2/v3 by vanbasten23 in https://github.com/huggingface/accelerate/pull/2542
* Add mapping `main_process_ip` and `master_addr` when not using standard as deepspeed launcher by asdfry in https://github.com/huggingface/accelerate/pull/2495
* split_between_processes for Dataset by geronimi73 in https://github.com/huggingface/accelerate/pull/2433
* Include working driver check by muellerzr in https://github.com/huggingface/accelerate/pull/2558
* 🚨🚨🚨Move to using tags rather than latest for docker images and consolidate image repos 🚨 🚨🚨 by muellerzr in https://github.com/huggingface/accelerate/pull/2554
* Add Cambricon MLU accelerator support by huismiling in https://github.com/huggingface/accelerate/pull/2552
* Add NUMA affinity control for NVIDIA GPUs by muellerzr in https://github.com/huggingface/accelerate/pull/2535
* Add log message for RTX 4000 series when performing multi-gpu inference with device_map by SunMarc in https://github.com/huggingface/accelerate/pull/2557
* Improve deepspeed env gen by muellerzr in https://github.com/huggingface/accelerate/pull/2565
* Allow for setting deterministic algorithms by muellerzr in https://github.com/huggingface/accelerate/pull/2569
* Unpin deepspeed by muellerzr in https://github.com/huggingface/accelerate/pull/2570
* Rm uv install by muellerzr in https://github.com/huggingface/accelerate/pull/2577
* Allow for custom deepspeed env files by muellerzr in https://github.com/huggingface/accelerate/pull/2566
* [docs] Missing functions from API by stevhliu in https://github.com/huggingface/accelerate/pull/2580
* Update data_loader.py to Ensure Reproducibility in Multi-Process Environments with Dataloader Shuffle by universuen in https://github.com/huggingface/accelerate/pull/2584
* Refactor affinity and make it stateful by muellerzr in https://github.com/huggingface/accelerate/pull/2579
* Refactor and improve model estimator tool by muellerzr in https://github.com/huggingface/accelerate/pull/2581
* Fix `load_checkpoint_in_model` behavior when unexpected keys are in the checkpoint by fxmarty in https://github.com/huggingface/accelerate/pull/2588
* Guard stateful objects by muellerzr in https://github.com/huggingface/accelerate/pull/2572
* Expound PartialState docstring by muellerzr in https://github.com/huggingface/accelerate/pull/2589
* [docs] Fix kwarg docstring by stevhliu in https://github.com/huggingface/accelerate/pull/2590
* Allow notebook_launcher to launch to multiple GPUs from Colab by StefanTodoran in https://github.com/huggingface/accelerate/pull/2561
* Fix warning log for unused checkpoint keys by fxmarty in https://github.com/huggingface/accelerate/pull/2594
* Resolve ZeRO-3 Initialization Failure in Pre-Set Torch Distributed Environments (huggingface/transformers28803) by sword865 in https://github.com/huggingface/accelerate/pull/2578
* Refactor PartialState and AcceleratorState by muellerzr in https://github.com/huggingface/accelerate/pull/2576
* Allow for force unwrapping by muellerzr in https://github.com/huggingface/accelerate/pull/2595
* Pin hub for tests by muellerzr in https://github.com/huggingface/accelerate/pull/2608
* Default false for trust_remote_code by muellerzr in https://github.com/huggingface/accelerate/pull/2607
* fix llama example for pippy by SunMarc in https://github.com/huggingface/accelerate/pull/2616
* Fix links in Quick Tour by muellerzr in https://github.com/huggingface/accelerate/pull/2617
* Link to bash in env reporting by muellerzr in https://github.com/huggingface/accelerate/pull/2623
* Unpin hub by muellerzr in https://github.com/huggingface/accelerate/pull/2625

New Contributors
* asdfry made their first contribution in https://github.com/huggingface/accelerate/pull/2495
* geronimi73 made their first contribution in https://github.com/huggingface/accelerate/pull/2433
* huismiling made their first contribution in https://github.com/huggingface/accelerate/pull/2552
* universuen made their first contribution in https://github.com/huggingface/accelerate/pull/2584
* StefanTodoran made their first contribution in https://github.com/huggingface/accelerate/pull/2561
* sword865 made their first contribution in https://github.com/huggingface/accelerate/pull/2578

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.28.0...v0.29.0

0.28.0

Core
* Introduce a `DataLoaderConfiguration` and begin deprecation of arguments in the `Accelerator`
diff
+from accelerate import DataLoaderConfiguration
+dl_config = DataLoaderConfiguration(split_batches=True, dispatch_batches=True)
-accelerator = Accelerator(split_batches=True, dispatch_batches=True)
+accelerator = Accelerator(dataloader_config=dl_config)

* Allow gradients to be synced each data batch while performing gradient accumulation, useful when training in FSDP by fabianlim in https://github.com/huggingface/accelerate/pull/2531
diff
from accelerate import GradientAccumulationPlugin
plugin = GradientAccumulationPlugin(
+ num_steps=2,
sync_each_batch=sync_each_batch
)
accelerator = Accelerator(gradient_accumulation_plugin=plugin)

Torch XLA
* Support for XLA on the GPU by anw90 in https://github.com/huggingface/accelerate/pull/2176
* Enable gradient accumulation on TPU in https://github.com/huggingface/accelerate/pull/2453

FSDP
* Support downstream FSDP + QLORA support through tweaks by allowing configuration of buffer precision by pacman100 in https://github.com/huggingface/accelerate/pull/2544

`launch` changes
* Support `mpirun` for multi-cpu training by dmsuehir in https://github.com/huggingface/accelerate/pull/2493

What's Changed
* Fix model metadata issue check by muellerzr in https://github.com/huggingface/accelerate/pull/2435
* Use py 3.9 by muellerzr in https://github.com/huggingface/accelerate/pull/2436
* Fix seedable sampler logic and expound docs by muellerzr in https://github.com/huggingface/accelerate/pull/2434
* Fix tied_pointers_to_remove type by fxmarty in https://github.com/huggingface/accelerate/pull/2439
* Make test assertions more idiomatic by akx in https://github.com/huggingface/accelerate/pull/2420
* Prefer `is_torch_tensor` over `hasattr` for torch.compile. by PhilJd in https://github.com/huggingface/accelerate/pull/2387
* Enable more Ruff lints & fix issues by akx in https://github.com/huggingface/accelerate/pull/2419
* Fix warning when dispatching model by SunMarc in https://github.com/huggingface/accelerate/pull/2442
* Make torch xla available on GPU by anw90 in https://github.com/huggingface/accelerate/pull/2176
* Include pippy_file_path by muellerzr in https://github.com/huggingface/accelerate/pull/2444
* [Big deprecation] Introduces a `DataLoaderConfig` by muellerzr in https://github.com/huggingface/accelerate/pull/2441
* Check for None by muellerzr in https://github.com/huggingface/accelerate/pull/2452
* Fix the pytest version to be less than 8.0.1 by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2461
* Fix wrong `is_namedtuple` implementation by fxmarty in https://github.com/huggingface/accelerate/pull/2475
* Use grad-accum on TPU by muellerzr in https://github.com/huggingface/accelerate/pull/2453
* Add pre-commit configuration by akx in https://github.com/huggingface/accelerate/pull/2451
* Replace `os.path.sep.join` path manipulations with a helper by akx in https://github.com/huggingface/accelerate/pull/2446
* DOC: Fixes to Accelerator docstring by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2443
* Context manager fixes by akx in https://github.com/huggingface/accelerate/pull/2450
* Fix TPU with new `XLA` device type by will-cromar in https://github.com/huggingface/accelerate/pull/2467
* Free mps memory by SunMarc in https://github.com/huggingface/accelerate/pull/2483
* [FIX] allow `Accelerator` to detect distributed type from the "LOCAL_RANK" env variable for XPU by faaany in https://github.com/huggingface/accelerate/pull/2473
* Fix CI tests due to pathlib issues by muellerzr in https://github.com/huggingface/accelerate/pull/2491
* Remove all cases of torchrun in tests and centralize as `accelerate launch` by muellerzr in https://github.com/huggingface/accelerate/pull/2498
* Fix link typo by SunMarc in https://github.com/huggingface/accelerate/pull/2503
* [docs] Accelerator API by stevhliu in https://github.com/huggingface/accelerate/pull/2465
* Docstring fixup by muellerzr in https://github.com/huggingface/accelerate/pull/2504
* [docs] Divide training and inference by stevhliu in https://github.com/huggingface/accelerate/pull/2466
* add custom dtype INT2 by SunMarc in https://github.com/huggingface/accelerate/pull/2505
* quanto compatibility for cpu/disk offload by SunMarc in https://github.com/huggingface/accelerate/pull/2481
* [docs] Quicktour by stevhliu in https://github.com/huggingface/accelerate/pull/2456
* Check if hub down by muellerzr in https://github.com/huggingface/accelerate/pull/2506
* Remove offline stuff by muellerzr in https://github.com/huggingface/accelerate/pull/2509
* Fixed 0MiB bug in convert_file_size_to_int by StoyanStAtanasov in https://github.com/huggingface/accelerate/pull/2507
* Fix edge case in infer_auto_device_map when dealing with buffers by SunMarc in https://github.com/huggingface/accelerate/pull/2511
* [docs] Fix typos by omahs in https://github.com/huggingface/accelerate/pull/2490
* fix typo in launch.py (`----main_process_port` to `--main_process_port`) by DerrickWang005 in https://github.com/huggingface/accelerate/pull/2516
* Add copyright + some ruff lint things by muellerzr in https://github.com/huggingface/accelerate/pull/2523
* Don't manage `PYTORCH_NVML_BASED_CUDA_CHECK` when calling `accelerate.utils.imports.is_cuda_available()` by luiscape in https://github.com/huggingface/accelerate/pull/2524
* Quanto compatibility with QBitsTensor by SunMarc in https://github.com/huggingface/accelerate/pull/2526
* Remove unnecessary `env=os.environ.copy()`s by akx in https://github.com/huggingface/accelerate/pull/2449
* Launch mpirun from accelerate launch for multi-CPU training by dmsuehir in https://github.com/huggingface/accelerate/pull/2493
* Enable using dash or underscore for CLI args by muellerzr in https://github.com/huggingface/accelerate/pull/2527
* Update the default behavior of `zero_grad(set_to_none=None)` to align with PyTorch by yongchanghao in https://github.com/huggingface/accelerate/pull/2472
* Update link to dynamo/compile doc by WarmongeringBeaver in https://github.com/huggingface/accelerate/pull/2533
* Check if the buffers fit GPU memory after device map auto inferred by notsyncing in https://github.com/huggingface/accelerate/pull/2412
* [Refactor] Refactor send_to_device to treat tensor-like first by vmoens in https://github.com/huggingface/accelerate/pull/2438
* Overdue email change... by muellerzr in https://github.com/huggingface/accelerate/pull/2534
* [docs] Troubleshoot by stevhliu in https://github.com/huggingface/accelerate/pull/2538
* Remove extra double-dash in error message by drscotthawley in https://github.com/huggingface/accelerate/pull/2541
* Allow Gradients to be Synced Each Data Batch While Performing Gradient Accumulation by fabianlim in https://github.com/huggingface/accelerate/pull/2531
* Update FSDP mixed precision setter to enable fsdp+qlora by pacman100 in https://github.com/huggingface/accelerate/pull/2544
* Use uv instead of pip install for github CI by muellerzr in https://github.com/huggingface/accelerate/pull/2546

New Contributors
* anw90 made their first contribution in https://github.com/huggingface/accelerate/pull/2176
* StoyanStAtanasov made their first contribution in https://github.com/huggingface/accelerate/pull/2507
* omahs made their first contribution in https://github.com/huggingface/accelerate/pull/2490
* DerrickWang005 made their first contribution in https://github.com/huggingface/accelerate/pull/2516
* luiscape made their first contribution in https://github.com/huggingface/accelerate/pull/2524
* dmsuehir made their first contribution in https://github.com/huggingface/accelerate/pull/2493
* yongchanghao made their first contribution in https://github.com/huggingface/accelerate/pull/2472
* WarmongeringBeaver made their first contribution in https://github.com/huggingface/accelerate/pull/2533
* vmoens made their first contribution in https://github.com/huggingface/accelerate/pull/2438
* drscotthawley made their first contribution in https://github.com/huggingface/accelerate/pull/2541
* fabianlim made their first contribution in https://github.com/huggingface/accelerate/pull/2531

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.27.2...v0.28.0

0.27.0

PyTorch 2.2.0 Support

With the latest release of PyTorch 2.2.0, we've guaranteed that there are no breaking changes regarding it

PyTorch-Native Pipeline Parallel Inference

With this release we are excited to announce support for pipeline-parallel inference by integrating PyTorch's [PiPPy](https://github.com/pytorch/PiPPy) framework (so no need to use Megatron or DeepSpeed)! This supports automatic model-weight splitting to each device using a similar API to `device_map="auto"`. This is still under heavy development, however the inference side is stable enough that we are ready for a release. Read more about it in [our docs](https://huggingface.co/docs/accelerate/usage_guides/distributed_inference#memory-efficient-pipeline-parallelism-experimental) and check out the [example zoo](https://github.com/huggingface/accelerate/tree/main/examples/inference).

Requires `pippy` of version 0.2.0 or later (`pip install torchpippy -U`)

Example usage (combined with `accelerate launch` or `torchrun`):

python
from accelerate import PartialState, prepare_pippy
model = AutoModelForSequenceClassification.from_pretrained("gpt2")
model = prepare_pippy(model, split_points="auto", example_args=(input,))
input = input.to("cuda:0")
with torch.no_grad():
output = model(input)
The outputs are only on the final process by default
You can pass in `gather_outputs=True` to prepare_pippy to
make them available on all processes
if PartialState().is_last_process:
output = torch.stack(tuple(output[0]))
print(output.shape)

DeepSpeed

This release provides support for utilizing DeepSpeed on XPU devices thanks to faaany

What's Changed
* Convert model.hf_device_map back to Dict by SunMarc in https://github.com/huggingface/accelerate/pull/2326
* Fix model memory issue by muellerzr in https://github.com/huggingface/accelerate/pull/2327
* Fixed typos in readme files of docs folder. by rishit5 in https://github.com/huggingface/accelerate/pull/2329
* Disable P2P in *just* the 4000 series by muellerzr in https://github.com/huggingface/accelerate/pull/2332
* Avoid duplicating memory for tied weights in `dispatch_model`, and in forward with offloading by fxmarty in https://github.com/huggingface/accelerate/pull/2330
* Show DeepSpeed option when multi-XPU is selected in `accelerate config` by faaany in https://github.com/huggingface/accelerate/pull/2346
* FIX: add oneCCL environment variable for non-MPI launcher (accelerate launch) by faaany in https://github.com/huggingface/accelerate/pull/2339
* device agnostic test_accelerator/test_multigpu by wangshuai09 in https://github.com/huggingface/accelerate/pull/2343
* Fix mpi4py/failing deepspeed test issues by muellerzr in https://github.com/huggingface/accelerate/pull/2353
* Fix `block_size` picking in `megatron_lm_gpt_pretraining` example. by nilq in https://github.com/huggingface/accelerate/pull/2342
* Fix dispatch_model with tied weights test on T4 by fxmarty in https://github.com/huggingface/accelerate/pull/2354
* bugfix to allow usage of TE or MSAMP in `FP8RecipeKwargs` by sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2355
* Pin DeepSpeed until patch by muellerzr in https://github.com/huggingface/accelerate/pull/2366
* Remove init_hook_kwargs by fxmarty in https://github.com/huggingface/accelerate/pull/2365
* device agnostic optimizer testing by statelesshz in https://github.com/huggingface/accelerate/pull/2363
* `add_hook_to_module` and `remove_hook_from_module` compatibility with fx.GraphModule by fxmarty in https://github.com/huggingface/accelerate/pull/2369
* Adding `requires_grad` to `kwargs` when registering empty parameters. by BlackSamorez in https://github.com/huggingface/accelerate/pull/2376
* Add `adapter_only` option to `save_fsdp_model` and `load_fsdp_model` to only save/load PEFT weights by AjayP13 in https://github.com/huggingface/accelerate/pull/2321
* device agnostic cli/data_loader/grad_sync/kwargs_handlers/memory_utils testing by wangshuai09 in https://github.com/huggingface/accelerate/pull/2356
* Fix batch_size sanity check logic for `split_batches ` by izhx in https://github.com/huggingface/accelerate/pull/2344
* Pin Torch version to <2.2.0 by Rocketknight1 in https://github.com/huggingface/accelerate/pull/2394
* Address PIP-632 deprecation of distutils by AieatAssam in https://github.com/huggingface/accelerate/pull/2388
* [don't merge yet] unpin torch by ydshieh in https://github.com/huggingface/accelerate/pull/2406
* Revert "[don't merge yet] unpin torch" by muellerzr in https://github.com/huggingface/accelerate/pull/2407
* Fix CI due to pytest by muellerzr in https://github.com/huggingface/accelerate/pull/2408
* Added activateEnviroment.sh to readme by TJ-Solergibert in https://github.com/huggingface/accelerate/pull/2409
* Fix XPU inference by notsyncing in https://github.com/huggingface/accelerate/pull/2383
* Fix the size of int and bool type when computing module size by notsyncing in https://github.com/huggingface/accelerate/pull/2411
* Adding Local SGD support for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/2415
* Unpin torch by muellerzr in https://github.com/huggingface/accelerate/pull/2418
* Use Ruff for formatting too by akx in https://github.com/huggingface/accelerate/pull/2400
* torch-native pipeline parallelism for big models by muellerzr in https://github.com/huggingface/accelerate/pull/2345
* Update FSDP docs by pacman100 in https://github.com/huggingface/accelerate/pull/2430
* Make output end up on all GPUs at the end by muellerzr in https://github.com/huggingface/accelerate/pull/2423
* Migrate pippy examples over and run tests by muellerzr in https://github.com/huggingface/accelerate/pull/2424
* [FIX] fix the wrong `nproc_per_node` in the multi gpu test by faaany in https://github.com/huggingface/accelerate/pull/2422
* Fix fp8 things by muellerzr in https://github.com/huggingface/accelerate/pull/2403
* [FIX] allow `Accelerator` to prepare models in eval mode for XPU&CPU by faaany in https://github.com/huggingface/accelerate/pull/2426
* [Fix] make all tests pass on XPU by faaany in https://github.com/huggingface/accelerate/pull/2427

New Contributors
* rishit5 made their first contribution in https://github.com/huggingface/accelerate/pull/2329
* faaany made their first contribution in https://github.com/huggingface/accelerate/pull/2346
* wangshuai09 made their first contribution in https://github.com/huggingface/accelerate/pull/2343
* nilq made their first contribution in https://github.com/huggingface/accelerate/pull/2342
* BlackSamorez made their first contribution in https://github.com/huggingface/accelerate/pull/2376
* AjayP13 made their first contribution in https://github.com/huggingface/accelerate/pull/2321
* Rocketknight1 made their first contribution in https://github.com/huggingface/accelerate/pull/2394
* AieatAssam made their first contribution in https://github.com/huggingface/accelerate/pull/2388
* ydshieh made their first contribution in https://github.com/huggingface/accelerate/pull/2406
* notsyncing made their first contribution in https://github.com/huggingface/accelerate/pull/2383
* akx made their first contribution in https://github.com/huggingface/accelerate/pull/2400

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.26.1...v0.27.0

Page 1 of 14

Releases

Has known vulnerabilities

Accelerate

Page 1 of 14

0.29.3

0.29.2

0.29.1

0.29.0

0.28.0

0.27.0

Page 1 of 14

Links

Releases