Accelerate

Latest version: v0.30.1

Safety actively analyzes 630406 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 14

0.28.0

Core
* Introduce a `DataLoaderConfiguration` and begin deprecation of arguments in the `Accelerator`
diff
+from accelerate import DataLoaderConfiguration
+dl_config = DataLoaderConfiguration(split_batches=True, dispatch_batches=True)
-accelerator = Accelerator(split_batches=True, dispatch_batches=True)
+accelerator = Accelerator(dataloader_config=dl_config)

* Allow gradients to be synced each data batch while performing gradient accumulation, useful when training in FSDP by fabianlim in https://github.com/huggingface/accelerate/pull/2531
diff
from accelerate import GradientAccumulationPlugin
plugin = GradientAccumulationPlugin(
+ num_steps=2,
sync_each_batch=sync_each_batch
)
accelerator = Accelerator(gradient_accumulation_plugin=plugin)

Torch XLA
* Support for XLA on the GPU by anw90 in https://github.com/huggingface/accelerate/pull/2176
* Enable gradient accumulation on TPU in https://github.com/huggingface/accelerate/pull/2453

FSDP
* Support downstream FSDP + QLORA support through tweaks by allowing configuration of buffer precision by pacman100 in https://github.com/huggingface/accelerate/pull/2544

`launch` changes
* Support `mpirun` for multi-cpu training by dmsuehir in https://github.com/huggingface/accelerate/pull/2493

What's Changed
* Fix model metadata issue check by muellerzr in https://github.com/huggingface/accelerate/pull/2435
* Use py 3.9 by muellerzr in https://github.com/huggingface/accelerate/pull/2436
* Fix seedable sampler logic and expound docs by muellerzr in https://github.com/huggingface/accelerate/pull/2434
* Fix tied_pointers_to_remove type by fxmarty in https://github.com/huggingface/accelerate/pull/2439
* Make test assertions more idiomatic by akx in https://github.com/huggingface/accelerate/pull/2420
* Prefer `is_torch_tensor` over `hasattr` for torch.compile. by PhilJd in https://github.com/huggingface/accelerate/pull/2387
* Enable more Ruff lints & fix issues by akx in https://github.com/huggingface/accelerate/pull/2419
* Fix warning when dispatching model by SunMarc in https://github.com/huggingface/accelerate/pull/2442
* Make torch xla available on GPU by anw90 in https://github.com/huggingface/accelerate/pull/2176
* Include pippy_file_path by muellerzr in https://github.com/huggingface/accelerate/pull/2444
* [Big deprecation] Introduces a `DataLoaderConfig` by muellerzr in https://github.com/huggingface/accelerate/pull/2441
* Check for None by muellerzr in https://github.com/huggingface/accelerate/pull/2452
* Fix the pytest version to be less than 8.0.1 by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2461
* Fix wrong `is_namedtuple` implementation by fxmarty in https://github.com/huggingface/accelerate/pull/2475
* Use grad-accum on TPU by muellerzr in https://github.com/huggingface/accelerate/pull/2453
* Add pre-commit configuration by akx in https://github.com/huggingface/accelerate/pull/2451
* Replace `os.path.sep.join` path manipulations with a helper by akx in https://github.com/huggingface/accelerate/pull/2446
* DOC: Fixes to Accelerator docstring by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2443
* Context manager fixes by akx in https://github.com/huggingface/accelerate/pull/2450
* Fix TPU with new `XLA` device type by will-cromar in https://github.com/huggingface/accelerate/pull/2467
* Free mps memory by SunMarc in https://github.com/huggingface/accelerate/pull/2483
* [FIX] allow `Accelerator` to detect distributed type from the "LOCAL_RANK" env variable for XPU by faaany in https://github.com/huggingface/accelerate/pull/2473
* Fix CI tests due to pathlib issues by muellerzr in https://github.com/huggingface/accelerate/pull/2491
* Remove all cases of torchrun in tests and centralize as `accelerate launch` by muellerzr in https://github.com/huggingface/accelerate/pull/2498
* Fix link typo by SunMarc in https://github.com/huggingface/accelerate/pull/2503
* [docs] Accelerator API by stevhliu in https://github.com/huggingface/accelerate/pull/2465
* Docstring fixup by muellerzr in https://github.com/huggingface/accelerate/pull/2504
* [docs] Divide training and inference by stevhliu in https://github.com/huggingface/accelerate/pull/2466
* add custom dtype INT2 by SunMarc in https://github.com/huggingface/accelerate/pull/2505
* quanto compatibility for cpu/disk offload by SunMarc in https://github.com/huggingface/accelerate/pull/2481
* [docs] Quicktour by stevhliu in https://github.com/huggingface/accelerate/pull/2456
* Check if hub down by muellerzr in https://github.com/huggingface/accelerate/pull/2506
* Remove offline stuff by muellerzr in https://github.com/huggingface/accelerate/pull/2509
* Fixed 0MiB bug in convert_file_size_to_int by StoyanStAtanasov in https://github.com/huggingface/accelerate/pull/2507
* Fix edge case in infer_auto_device_map when dealing with buffers by SunMarc in https://github.com/huggingface/accelerate/pull/2511
* [docs] Fix typos by omahs in https://github.com/huggingface/accelerate/pull/2490
* fix typo in launch.py (`----main_process_port` to `--main_process_port`) by DerrickWang005 in https://github.com/huggingface/accelerate/pull/2516
* Add copyright + some ruff lint things by muellerzr in https://github.com/huggingface/accelerate/pull/2523
* Don't manage `PYTORCH_NVML_BASED_CUDA_CHECK` when calling `accelerate.utils.imports.is_cuda_available()` by luiscape in https://github.com/huggingface/accelerate/pull/2524
* Quanto compatibility with QBitsTensor by SunMarc in https://github.com/huggingface/accelerate/pull/2526
* Remove unnecessary `env=os.environ.copy()`s by akx in https://github.com/huggingface/accelerate/pull/2449
* Launch mpirun from accelerate launch for multi-CPU training by dmsuehir in https://github.com/huggingface/accelerate/pull/2493
* Enable using dash or underscore for CLI args by muellerzr in https://github.com/huggingface/accelerate/pull/2527
* Update the default behavior of `zero_grad(set_to_none=None)` to align with PyTorch by yongchanghao in https://github.com/huggingface/accelerate/pull/2472
* Update link to dynamo/compile doc by WarmongeringBeaver in https://github.com/huggingface/accelerate/pull/2533
* Check if the buffers fit GPU memory after device map auto inferred by notsyncing in https://github.com/huggingface/accelerate/pull/2412
* [Refactor] Refactor send_to_device to treat tensor-like first by vmoens in https://github.com/huggingface/accelerate/pull/2438
* Overdue email change... by muellerzr in https://github.com/huggingface/accelerate/pull/2534
* [docs] Troubleshoot by stevhliu in https://github.com/huggingface/accelerate/pull/2538
* Remove extra double-dash in error message by drscotthawley in https://github.com/huggingface/accelerate/pull/2541
* Allow Gradients to be Synced Each Data Batch While Performing Gradient Accumulation by fabianlim in https://github.com/huggingface/accelerate/pull/2531
* Update FSDP mixed precision setter to enable fsdp+qlora by pacman100 in https://github.com/huggingface/accelerate/pull/2544
* Use uv instead of pip install for github CI by muellerzr in https://github.com/huggingface/accelerate/pull/2546

New Contributors
* anw90 made their first contribution in https://github.com/huggingface/accelerate/pull/2176
* StoyanStAtanasov made their first contribution in https://github.com/huggingface/accelerate/pull/2507
* omahs made their first contribution in https://github.com/huggingface/accelerate/pull/2490
* DerrickWang005 made their first contribution in https://github.com/huggingface/accelerate/pull/2516
* luiscape made their first contribution in https://github.com/huggingface/accelerate/pull/2524
* dmsuehir made their first contribution in https://github.com/huggingface/accelerate/pull/2493
* yongchanghao made their first contribution in https://github.com/huggingface/accelerate/pull/2472
* WarmongeringBeaver made their first contribution in https://github.com/huggingface/accelerate/pull/2533
* vmoens made their first contribution in https://github.com/huggingface/accelerate/pull/2438
* drscotthawley made their first contribution in https://github.com/huggingface/accelerate/pull/2541
* fabianlim made their first contribution in https://github.com/huggingface/accelerate/pull/2531

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.27.2...v0.28.0

0.27.0

PyTorch 2.2.0 Support

With the latest release of PyTorch 2.2.0, we've guaranteed that there are no breaking changes regarding it

PyTorch-Native Pipeline Parallel Inference

With this release we are excited to announce support for pipeline-parallel inference by integrating PyTorch's [PiPPy](https://github.com/pytorch/PiPPy) framework (so no need to use Megatron or DeepSpeed)! This supports automatic model-weight splitting to each device using a similar API to `device_map="auto"`. This is still under heavy development, however the inference side is stable enough that we are ready for a release. Read more about it in [our docs](https://huggingface.co/docs/accelerate/usage_guides/distributed_inference#memory-efficient-pipeline-parallelism-experimental) and check out the [example zoo](https://github.com/huggingface/accelerate/tree/main/examples/inference).

Requires `pippy` of version 0.2.0 or later (`pip install torchpippy -U`)

Example usage (combined with `accelerate launch` or `torchrun`):

python
from accelerate import PartialState, prepare_pippy
model = AutoModelForSequenceClassification.from_pretrained("gpt2")
model = prepare_pippy(model, split_points="auto", example_args=(input,))
input = input.to("cuda:0")
with torch.no_grad():
output = model(input)
The outputs are only on the final process by default
You can pass in `gather_outputs=True` to prepare_pippy to
make them available on all processes
if PartialState().is_last_process:
output = torch.stack(tuple(output[0]))
print(output.shape)

DeepSpeed

This release provides support for utilizing DeepSpeed on XPU devices thanks to faaany

What's Changed
* Convert model.hf_device_map back to Dict by SunMarc in https://github.com/huggingface/accelerate/pull/2326
* Fix model memory issue by muellerzr in https://github.com/huggingface/accelerate/pull/2327
* Fixed typos in readme files of docs folder. by rishit5 in https://github.com/huggingface/accelerate/pull/2329
* Disable P2P in *just* the 4000 series by muellerzr in https://github.com/huggingface/accelerate/pull/2332
* Avoid duplicating memory for tied weights in `dispatch_model`, and in forward with offloading by fxmarty in https://github.com/huggingface/accelerate/pull/2330
* Show DeepSpeed option when multi-XPU is selected in `accelerate config` by faaany in https://github.com/huggingface/accelerate/pull/2346
* FIX: add oneCCL environment variable for non-MPI launcher (accelerate launch) by faaany in https://github.com/huggingface/accelerate/pull/2339
* device agnostic test_accelerator/test_multigpu by wangshuai09 in https://github.com/huggingface/accelerate/pull/2343
* Fix mpi4py/failing deepspeed test issues by muellerzr in https://github.com/huggingface/accelerate/pull/2353
* Fix `block_size` picking in `megatron_lm_gpt_pretraining` example. by nilq in https://github.com/huggingface/accelerate/pull/2342
* Fix dispatch_model with tied weights test on T4 by fxmarty in https://github.com/huggingface/accelerate/pull/2354
* bugfix to allow usage of TE or MSAMP in `FP8RecipeKwargs` by sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2355
* Pin DeepSpeed until patch by muellerzr in https://github.com/huggingface/accelerate/pull/2366
* Remove init_hook_kwargs by fxmarty in https://github.com/huggingface/accelerate/pull/2365
* device agnostic optimizer testing by statelesshz in https://github.com/huggingface/accelerate/pull/2363
* `add_hook_to_module` and `remove_hook_from_module` compatibility with fx.GraphModule by fxmarty in https://github.com/huggingface/accelerate/pull/2369
* Adding `requires_grad` to `kwargs` when registering empty parameters. by BlackSamorez in https://github.com/huggingface/accelerate/pull/2376
* Add `adapter_only` option to `save_fsdp_model` and `load_fsdp_model` to only save/load PEFT weights by AjayP13 in https://github.com/huggingface/accelerate/pull/2321
* device agnostic cli/data_loader/grad_sync/kwargs_handlers/memory_utils testing by wangshuai09 in https://github.com/huggingface/accelerate/pull/2356
* Fix batch_size sanity check logic for `split_batches ` by izhx in https://github.com/huggingface/accelerate/pull/2344
* Pin Torch version to <2.2.0 by Rocketknight1 in https://github.com/huggingface/accelerate/pull/2394
* Address PIP-632 deprecation of distutils by AieatAssam in https://github.com/huggingface/accelerate/pull/2388
* [don't merge yet] unpin torch by ydshieh in https://github.com/huggingface/accelerate/pull/2406
* Revert "[don't merge yet] unpin torch" by muellerzr in https://github.com/huggingface/accelerate/pull/2407
* Fix CI due to pytest by muellerzr in https://github.com/huggingface/accelerate/pull/2408
* Added activateEnviroment.sh to readme by TJ-Solergibert in https://github.com/huggingface/accelerate/pull/2409
* Fix XPU inference by notsyncing in https://github.com/huggingface/accelerate/pull/2383
* Fix the size of int and bool type when computing module size by notsyncing in https://github.com/huggingface/accelerate/pull/2411
* Adding Local SGD support for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/2415
* Unpin torch by muellerzr in https://github.com/huggingface/accelerate/pull/2418
* Use Ruff for formatting too by akx in https://github.com/huggingface/accelerate/pull/2400
* torch-native pipeline parallelism for big models by muellerzr in https://github.com/huggingface/accelerate/pull/2345
* Update FSDP docs by pacman100 in https://github.com/huggingface/accelerate/pull/2430
* Make output end up on all GPUs at the end by muellerzr in https://github.com/huggingface/accelerate/pull/2423
* Migrate pippy examples over and run tests by muellerzr in https://github.com/huggingface/accelerate/pull/2424
* [FIX] fix the wrong `nproc_per_node` in the multi gpu test by faaany in https://github.com/huggingface/accelerate/pull/2422
* Fix fp8 things by muellerzr in https://github.com/huggingface/accelerate/pull/2403
* [FIX] allow `Accelerator` to prepare models in eval mode for XPU&CPU by faaany in https://github.com/huggingface/accelerate/pull/2426
* [Fix] make all tests pass on XPU by faaany in https://github.com/huggingface/accelerate/pull/2427

New Contributors
* rishit5 made their first contribution in https://github.com/huggingface/accelerate/pull/2329
* faaany made their first contribution in https://github.com/huggingface/accelerate/pull/2346
* wangshuai09 made their first contribution in https://github.com/huggingface/accelerate/pull/2343
* nilq made their first contribution in https://github.com/huggingface/accelerate/pull/2342
* BlackSamorez made their first contribution in https://github.com/huggingface/accelerate/pull/2376
* AjayP13 made their first contribution in https://github.com/huggingface/accelerate/pull/2321
* Rocketknight1 made their first contribution in https://github.com/huggingface/accelerate/pull/2394
* AieatAssam made their first contribution in https://github.com/huggingface/accelerate/pull/2388
* ydshieh made their first contribution in https://github.com/huggingface/accelerate/pull/2406
* notsyncing made their first contribution in https://github.com/huggingface/accelerate/pull/2383
* akx made their first contribution in https://github.com/huggingface/accelerate/pull/2400

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.26.1...v0.27.0

0.26.1

What's Changed
* Raise error when using batches of different sizes with `dispatch_batches=True` by SunMarc in https://github.com/huggingface/accelerate/pull/2325

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.26.0...v0.26.1

0.26.0

Support for MS-AMP

This release adds support for the [MS-AMP](https://github.com/Azure/MS-AMP) (Microsoft Automatic Mixed Precision Library) into Accelerate as an alternative backend for doing FP8 training on appropriate hardware. It is the default backend of choice. Read more in the docs [here](https://huggingface.co/docs/accelerate/concept_guides/low_precision_training). Introduced in https://github.com/huggingface/accelerate/pull/2232 by muellerzr

Core

In the prior release a new sampler for the `DataLoader` was introduced that while across seeds does not show statistical differences in the results, repeating the same seed would result in a different end-accuracy that was scary to some users. We have now disabled this behavior by default as it required some additional setup, and brought back the original implementation. To have the new sampling technique (which can provide more accurate repeated results) pass `use_seedable_sampler=True` to the `Accelerator`. We will be propagating this up to the `Trainer` soon.

Big Model Inference

* NPU support was added thanks to statelesshz in https://github.com/huggingface/accelerate/pull/2222
* When generating an automatic `device_map` we've made it possible to not returned grouped key results if desired in https://github.com/huggingface/accelerate/pull/2233
* We now handle corner cases better when users pass `device_map="cuda"` etc thanks to younesbelkada in https://github.com/huggingface/accelerate/pull/2254

FSDP and DeepSpeed

* Many improvements to the docs have been made thanks to stass. Along with this we've made it easier to adjust the config for the sharding strategy and other config values thanks to pacman100 in https://github.com/huggingface/accelerate/pull/2288

* A regression in Accelerate 0.23.0 occurred that showed learning is much slower on multi-GPU setups compared to a single GPU. https://github.com/huggingface/accelerate/pull/2304 has now fixed this thanks to pacman100

* The DeepSpeed integration now also handles `auto` values better when making a configuration in https://github.com/huggingface/accelerate/pull/2313

Bits and Bytes
* `Params4bit` added to bnb classes in set_module_tensor_to_device() by poedator in https://github.com/huggingface/accelerate/pull/2315

Device Agnostic Testing

For developers, we've made it much easier to run the *tests* on different devices with no change to the code thanks to statelesshz in https://github.com/huggingface/accelerate/pull/2123 and https://github.com/huggingface/accelerate/pull/2235

Bug Fixes
* Check notebook launcher for 3090+ by muellerzr in https://github.com/huggingface/accelerate/pull/2212
* Fix dtype bug when `offload_state_dict=True` and `dtype` is specified by fxmarty in https://github.com/huggingface/accelerate/pull/2116
* fix tqdm wrapper to print when process id ==0 by kashif in https://github.com/huggingface/accelerate/pull/2223
* fix BFloat16 is not supported on MPS (2226) by jxysoft in https://github.com/huggingface/accelerate/pull/2227
* Fix MpDeviceLoaderWrapper not having attribute batch_sampler by vanbasten23 in https://github.com/huggingface/accelerate/pull/2242
* [deepspeed] fix setting `auto` values for comm buffers by stas00 in https://github.com/huggingface/accelerate/pull/2295
* Fix infer_auto_device_map when tied weights share the same prefix name by fxmarty in https://github.com/huggingface/accelerate/pull/2324
* Fixes bug in swapping weights when replacing with Transformer-Engine layers by sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2305
* Fix breakpoint API in test_script.py on TPU. by vanbasten23 in https://github.com/huggingface/accelerate/pull/2263
* Bring old seed technique back by muellerzr in https://github.com/huggingface/accelerate/pull/2319

Major Contributors

* statelesshz for their work on device-agnostic testing and NPU support
* stas00 for many docfixes when it comes to DeepSpeed and FSDP

General Changelog
* add missing whitespace by stas00 in https://github.com/huggingface/accelerate/pull/2206
* MNT Delete the delete doc workflows by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2217
* Update docker images by muellerzr in https://github.com/huggingface/accelerate/pull/2213
* Add allgather check for xpu by abhilash1910 in https://github.com/huggingface/accelerate/pull/2199
* Check notebook launcher for 3090+ by muellerzr in https://github.com/huggingface/accelerate/pull/2212
* Fix dtype bug when `offload_state_dict=True` and `dtype` is specified by fxmarty in https://github.com/huggingface/accelerate/pull/2116
* fix tqdm wrapper to print when process id ==0 by kashif in https://github.com/huggingface/accelerate/pull/2223
* [data_loader] expand the error message by stas00 in https://github.com/huggingface/accelerate/pull/2221
* Update the 'Frameworks using Accelerate' section to include Amphion by RMSnow in https://github.com/huggingface/accelerate/pull/2225
* [Docs] Add doc for cpu/disk offload by SunMarc in https://github.com/huggingface/accelerate/pull/2231
* device agnostic testing by statelesshz in https://github.com/huggingface/accelerate/pull/2123
* Make cleaning optional for device map by muellerzr in https://github.com/huggingface/accelerate/pull/2233
* Add npu support to big model inference by statelesshz in https://github.com/huggingface/accelerate/pull/2222
* fix the DS failing test by pacman100 in https://github.com/huggingface/accelerate/pull/2237
* Fix nb tests by muellerzr in https://github.com/huggingface/accelerate/pull/2230
* fix BFloat16 is not supported on MPS (2226) by jxysoft in https://github.com/huggingface/accelerate/pull/2227
* Fix MpDeviceLoaderWrapper not having attribute batch_sampler by vanbasten23 in https://github.com/huggingface/accelerate/pull/2242
* [`Big-Modeling`] Harmonize device check to handle corner cases by younesbelkada in https://github.com/huggingface/accelerate/pull/2254
* Support `log_images` for aim tracker by Justin900429 in https://github.com/huggingface/accelerate/pull/2257
* Integrate MS-AMP Support for FP8 as a seperate backend by muellerzr in https://github.com/huggingface/accelerate/pull/2232
* refactor deepspeed dataloader prepare logic by pacman100 in https://github.com/huggingface/accelerate/pull/2238
* device agnostic deepspeed&fsdp testing by statelesshz in https://github.com/huggingface/accelerate/pull/2235
* Solve CUDA issues by muellerzr in https://github.com/huggingface/accelerate/pull/2272
* Uninstall DVC in the Trainer tests by muellerzr in https://github.com/huggingface/accelerate/pull/2271
* Rm DVCLive from test reqs as latest version causes failures by muellerzr in https://github.com/huggingface/accelerate/pull/2279
* typo fix by stas00 in https://github.com/huggingface/accelerate/pull/2276
* Add condition before using `check_tied_parameters_on_same_device` by SunMarc in https://github.com/huggingface/accelerate/pull/2218
* [doc] FSDP improvements by stas00 in https://github.com/huggingface/accelerate/pull/2274
* [deepspeed docs] auto-values aren't being covered by stas00 in https://github.com/huggingface/accelerate/pull/2286
* Improve FSDP config usability by pacman100 in https://github.com/huggingface/accelerate/pull/2288
* [doc] language fixes by stas00 in https://github.com/huggingface/accelerate/pull/2292
* Bump tj-actions/changed-files from 22.2 to 41 in /.github/workflows by dependabot in https://github.com/huggingface/accelerate/pull/2300
* add back dvclive to tests by dberenbaum in https://github.com/huggingface/accelerate/pull/2280
* Fixes bug in swapping weights when replacing with Transformer-Engine layers by sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2305
* Fix breakpoint API in test_script.py on TPU. by vanbasten23 in https://github.com/huggingface/accelerate/pull/2263
* make test_state_checkpointing device agnostic by statelesshz in https://github.com/huggingface/accelerate/pull/2290
* [deepspeed] documentation by stas00 in https://github.com/huggingface/accelerate/pull/2296
* Add more missing items by muellerzr in https://github.com/huggingface/accelerate/pull/2309
* Update docs: Add warning for device_map=None for load_checkpoint_and_dispatch by PhilJd in https://github.com/huggingface/accelerate/pull/2308
* [deepspeed] fix setting `auto` values for comm buffers by stas00 in https://github.com/huggingface/accelerate/pull/2295
* DeepSpeed refactoring by pacman100 in https://github.com/huggingface/accelerate/pull/2313
* Fix DeepSpeed related regression by pacman100 in https://github.com/huggingface/accelerate/pull/2304
* Update test_deepspeed.py by pacman100 in https://github.com/huggingface/accelerate/pull/2323
* Bring old seed technique back by muellerzr in https://github.com/huggingface/accelerate/pull/2319
* Fix batch_size sanity check in `prepare_data_loader` by izhx in https://github.com/huggingface/accelerate/pull/2310
* `Params4bit` added to bnb classes in set_module_tensor_to_device() by poedator in https://github.com/huggingface/accelerate/pull/2315
* Fix infer_auto_device_map when tied weights share the same prefix name by fxmarty in https://github.com/huggingface/accelerate/pull/2324

New Contributors
* fxmarty made their first contribution in https://github.com/huggingface/accelerate/pull/2116
* RMSnow made their first contribution in https://github.com/huggingface/accelerate/pull/2225
* jxysoft made their first contribution in https://github.com/huggingface/accelerate/pull/2227
* vanbasten23 made their first contribution in https://github.com/huggingface/accelerate/pull/2242
* Justin900429 made their first contribution in https://github.com/huggingface/accelerate/pull/2257
* dependabot made their first contribution in https://github.com/huggingface/accelerate/pull/2300
* sudhakarsingh27 made their first contribution in https://github.com/huggingface/accelerate/pull/2305
* PhilJd made their first contribution in https://github.com/huggingface/accelerate/pull/2308
* izhx made their first contribution in https://github.com/huggingface/accelerate/pull/2310
* poedator made their first contribution in https://github.com/huggingface/accelerate/pull/2315

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.25.0...v0.26.0

0.25.0

Safetensors default

As of this release, `safetensors` will be the default format saved when applicable! To read more about safetensors and why it's best to use it for safety (and not pickle/torch.save), check it out [here](https://github.com/huggingface/safetensors)

New Experiment Trackers

This release has two new experiment trackers, ClearML and DVCLive!

To use them, just pass `clear_ml` or `dvclive` to `log_with` in the `Accelerator` init. h/t to eugen-ajechiloae-clearml and dberenbaum

DeepSpeed

* Accelerate's DeepSpeed integration now supports NPU devices, h/t to statelesshz
* DeepSpeed can now be launched via accelerate on single GPU setups

FSDP

FSDP had a huge refactoring so that the interface when using FSDP is the exact same as every other scenario when using `accelerate`. No more needing to call `accelerator.prepare()` twice!

Other useful enhancements

* We now raise and try to disable P2P communications on consumer GPUs for the 3090 series and beyond. Without this users were seeing timeout issues and the like as NVIDIA dropped P2P support. If using `accelerate launch` we will automatically disable, and if we sense that it is still enabled on distributed setups using 3090's +, we will raise an error.

* When doing `.gather()`, if tensors are on different devices we explicitly will raise an error (for now only valid on CUDA)

Bug fixes

* Fixed a bug that caused dataloaders to not shuffle despite `shuffle=True` when using multiple GPUs and the new `SeedableRandomSampler`.

General Changelog
* Add logs offloading by SunMarc in https://github.com/huggingface/accelerate/pull/2075
* Add ClearML tracker by eugen-ajechiloae-clearml in https://github.com/huggingface/accelerate/pull/2034
* CRITICAL: fix failing ci by muellerzr in https://github.com/huggingface/accelerate/pull/2088
* Fix flag typo by kuza55 in https://github.com/huggingface/accelerate/pull/2090
* Fix batch sampler by muellerzr in https://github.com/huggingface/accelerate/pull/2097
* fixed ip address typo by Fluder-Paradyne in https://github.com/huggingface/accelerate/pull/2099
* Fix memory leak in fp8 causing OOM (and potentially 3x vRAM usage) by muellerzr in https://github.com/huggingface/accelerate/pull/2089
* fix warning when offload by SunMarc in https://github.com/huggingface/accelerate/pull/2105
* Always use SeedableRandomSampler by muellerzr in https://github.com/huggingface/accelerate/pull/2110
* Fix issue with tests by muellerzr in https://github.com/huggingface/accelerate/pull/2111
* Make SeedableRandomSampler the default always by muellerzr in https://github.com/huggingface/accelerate/pull/2117
* Use "and" instead of comma in Bibtex citation by qgallouedec in https://github.com/huggingface/accelerate/pull/2119
* Add explicit error if empty batch received by YuryYakhno in https://github.com/huggingface/accelerate/pull/2115
* Allow for ACCELERATE_SEED env var by muellerzr in https://github.com/huggingface/accelerate/pull/2126
* add DeepSpeed support for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/2054
* Sync states for npu fsdp by jq460494839 in https://github.com/huggingface/accelerate/pull/2113
* Fix import error when torch>=2.0.1 and torch.distributed is disabled by natsukium in https://github.com/huggingface/accelerate/pull/2121
* Make safetensors the default by muellerzr in https://github.com/huggingface/accelerate/pull/2120
* Raise error when saving with param on meta device by SunMarc in https://github.com/huggingface/accelerate/pull/2132
* Leave native `save` as `False` by muellerzr in https://github.com/huggingface/accelerate/pull/2138
* fix retie_parameters by SunMarc in https://github.com/huggingface/accelerate/pull/2137
* Deal with shared memory scenarios by muellerzr in https://github.com/huggingface/accelerate/pull/2136
* specify config file path on README by kwonmha in https://github.com/huggingface/accelerate/pull/2140
* Fix safetensors contiguous by SunMarc in https://github.com/huggingface/accelerate/pull/2145
* Fix more tests by muellerzr in https://github.com/huggingface/accelerate/pull/2146
* [docs] fixed a couple of broken links by MKhalusova in https://github.com/huggingface/accelerate/pull/2147
* [docs] troubleshooting guide by MKhalusova in https://github.com/huggingface/accelerate/pull/2133
* [Docs] fix doc typos by kashif in https://github.com/huggingface/accelerate/pull/2150
* Add note about GradientState being in-sync with the dataloader by default by muellerzr in https://github.com/huggingface/accelerate/pull/2134
* Deprecated runner stuff by muellerzr in https://github.com/huggingface/accelerate/pull/2152
* Add examples to tests by muellerzr in https://github.com/huggingface/accelerate/pull/2131
* Disable pypi for merge workflows + fix trainer tests by muellerzr in https://github.com/huggingface/accelerate/pull/2153
* Adds dvclive tracker by dberenbaum in https://github.com/huggingface/accelerate/pull/2139
* check port availability only in main deepspeed/torchrun launcher by Jingru in https://github.com/huggingface/accelerate/pull/2078
* Do not attempt to pad nested tensors by frankier in https://github.com/huggingface/accelerate/pull/2041
* Add warning for problematic libraries by muellerzr in https://github.com/huggingface/accelerate/pull/2151
* Add ZeRO++ to DeepSpeed usage docs by SumanthRH in https://github.com/huggingface/accelerate/pull/2166
* Fix Megatron-LM Arguments Bug by yuanenming in https://github.com/huggingface/accelerate/pull/2168
* Fix non persistant buffer dispatch by SunMarc in https://github.com/huggingface/accelerate/pull/1941
* Updated torchrun instructions by TJ-Solergibert in https://github.com/huggingface/accelerate/pull/2096
* New CI Runners by muellerzr in https://github.com/huggingface/accelerate/pull/2087
* Revert "New CI Runners" by muellerzr in https://github.com/huggingface/accelerate/pull/2172
* [Working again] New CI by muellerzr in https://github.com/huggingface/accelerate/pull/2173
* fsdp refactoring by pacman100 in https://github.com/huggingface/accelerate/pull/2177
* Pin DVC by muellerzr in https://github.com/huggingface/accelerate/pull/2196
* Apply DVC warning to Accelerate by muellerzr in https://github.com/huggingface/accelerate/pull/2197
* Explicitly disable P2P using `launch`, and pick up in `state` if a user will face issues. by muellerzr in https://github.com/huggingface/accelerate/pull/2195
* Better error when device mismatches when calling gather() on CUDA by muellerzr in https://github.com/huggingface/accelerate/pull/2180
* unpins dvc by dberenbaum in https://github.com/huggingface/accelerate/pull/2200
* Assemble state dictionary for offloaded models by blbadger in https://github.com/huggingface/accelerate/pull/2156
* Allow deepspeed without distributed launcher by pacman100 in https://github.com/huggingface/accelerate/pull/2204

New Contributors
* eugen-ajechiloae-clearml made their first contribution in https://github.com/huggingface/accelerate/pull/2034
* kuza55 made their first contribution in https://github.com/huggingface/accelerate/pull/2090
* Fluder-Paradyne made their first contribution in https://github.com/huggingface/accelerate/pull/2099
* YuryYakhno made their first contribution in https://github.com/huggingface/accelerate/pull/2115
* jq460494839 made their first contribution in https://github.com/huggingface/accelerate/pull/2113
* kwonmha made their first contribution in https://github.com/huggingface/accelerate/pull/2140
* dberenbaum made their first contribution in https://github.com/huggingface/accelerate/pull/2139
* Jingru made their first contribution in https://github.com/huggingface/accelerate/pull/2078
* frankier made their first contribution in https://github.com/huggingface/accelerate/pull/2041
* yuanenming made their first contribution in https://github.com/huggingface/accelerate/pull/2168
* TJ-Solergibert made their first contribution in https://github.com/huggingface/accelerate/pull/2096
* blbadger made their first contribution in https://github.com/huggingface/accelerate/pull/2156

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.24.1...v0.25.0

0.24.1

- Fixes https://github.com/huggingface/accelerate/issues/2091 by changing how checking for custom samplers is done

Page 2 of 14

Releases

Has known vulnerabilities

Previous Next

Accelerate

Page 2 of 14

0.28.0

0.27.0

0.26.1

0.26.0

0.25.0

0.24.1

Page 2 of 14

Links

Releases