Accelerate

Latest version: v0.30.1

Safety actively analyzes 630390 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 14

0.12.0

New documentation

The whole documentation has been revamped, just go look at it [here](https://huggingface.co/docs/accelerate)!

* Complete revamp of the docs by muellerzr in 495

New gather_for_metrics method

When doing distributed evaluation, the dataloader loops back at the beginning of the dataset to make batches that have a round multiple of the number of processes. This causes the predictions to be slightly bigger than the length of the dataset, which used to require some truncating. This is all done behind the scenes now if you replace the `gather` your did in evaluation by `gather_for_metrics`.

* Reenable Gather for Metrics by muellerzr in 590
* Fix gather_for_metrics by muellerzr in 578
* Add a gather_for_metrics capability by muellerzr in 540

Balanced device maps

When loading big models for inference, `device_map="auto"` used to fill the GPUs sequentially, making it hard to use a batch size > 1. It now balances the weights evenly on the GPUs so if you have more GPU space than the model size, you can do predictions with a bigger batch size!

M1 GPU support

Accelerate now supports M1 GPUs, to learn more about how to setup your environment, see the [documentation](https://huggingface.co/docs/accelerate/v0.12.0/en/usage_guides/mps#accelerated-pytorch-training-on-mac).

* M1 GPU `mps` device integration by pacman100 in 596

What's new?

* Small fixed for balanced device maps by sgugger in 583
* Add balanced option for auto device map creation by sgugger in 534
* fixing deepspeed slow tests issue by pacman100 in 604
* add more conditions on casting by younesbelkada in 606
* Remove redundant `.run` in `WandBTracker`. by zh-plus in 605
* Fix some typos + wordings by muellerzr in 603
* reorg of test scripts and minor changes to tests by pacman100 in 602
* Move warning by muellerzr in 598
* Shorthand way to grab a tracker by muellerzr in 594
* Pin deepspeed by muellerzr in 595
* Improve docstring by muellerzr in 591
* TESTS! by muellerzr in 589
* Fix DispatchDataloader by sgugger in 588
* Use main_process_first in the examples by muellerzr in 581
* Skip and raise NotImplementedError for gather_for_metrics for now by muellerzr in 580
* minor FSDP launcher fix by pacman100 in 579
* Refine test in set_module_tensor_to_device by sgugger in 577
* Fix `set_module_tensor_to_device` by sgugger in 576
* Add 8 bit support - chapter II by younesbelkada in 539
* Fix tests, add wandb to gitignore by muellerzr in 573
* Fix step by muellerzr in 572
* Speed up main CI by muellerzr in 571
* ccl version check and import different module according to version by sywangyi in 567
* set default num_cpu_threads_per_process to improve oob performance by sywangyi in 562
* Add a tqdm helper by muellerzr in 564
* Rename actions to be a bit more accurate by muellerzr in 568
* Fix clean by muellerzr in 569
* enhancements and fixes for FSDP and DeepSpeed by pacman100 in 532
* fix: saving model weights by csarron in 556
* add on_main_process decorators by ZhiyuanChen in 488
* Update imports.py by KimBioInfoStudio in 554
* unpin `datasets` by lhoestq in 563
* Create good defaults in `accelerate launch` by muellerzr in 553
* Fix a few minor issues with example code in docs by BenjaminBossan in 551
* deepspeed version `0.6.7` fix by pacman100 in 544
* Rename test extras to testing by muellerzr in 545
* Add production testing + fix failing CI by muellerzr in 547
* Add a gather_for_metrics capability by muellerzr in 540
* Allow for kwargs to be passed to trackers by muellerzr in 542
* Add support for downcasting bf16 on TPUs by muellerzr in 523
* Add more documentation for device maps computations by sgugger in 530
* Restyle prepare one by muellerzr in 531
* Pick a better default for offload_state_dict by sgugger in 529
* fix some parameter setting does not work for CPU DDP and bf16 fail in… by sywangyi in 527
* Fix accelerate tests command by sgugger in 528

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sywangyi
* ccl version check and import different module according to version (567)
* set default num_cpu_threads_per_process to improve oob performance (562)
* fix some parameter setting does not work for CPU DDP and bf16 fail in… (527)
* ZhiyuanChen
* add on_main_process decorators (488)

0.11.0

Gradient Accumulation

Accelerate now handles gradient accumulation if you want, just pass along `gradient_accumulation_steps=xxx` when instantiating the `Accelerator` and put all your training loop step under a `with accelerator.accumulate(model):`. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the [documentation](https://huggingface.co/docs/accelerate/gradient_accumulation#letting-accelerate-handle-gradient-accumulation).

* Add gradient accumulation doc by muellerzr in 511
* Make gradient accumulation work with dispatched dataloaders by muellerzr in 510
* Introduce automatic gradient accumulation wrapper + fix a few test issues by muellerzr in 484

Support for SageMaker Data parallelism

Accelerate now support SageMaker specific brand of data parallelism.

* SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by pacman100 in 504
* SageMaker DP Support by pacman100 in 494

What's new?

* Fix accelerate tests command by sgugger in 528
* FSDP integration enhancements and fixes by pacman100 in 522
* Warn user if no trackers are installed by muellerzr in 524
* Fixup all example CI tests and properly fail by muellerzr in 517
* fixing deepspeed multi-node launcher by pacman100 in 514
* Add special Parameters modules support by younesbelkada in 519
* Don't unwrap in save_state() by cccntu in 489
* Fix a bug when reduce a tensor. by wwhio in 513
* Add benchmarks by sgugger in 506
* Fix DispatchDataLoader length when `split_batches=True` by sgugger in 509
* Fix scheduler in gradient accumulation example by muellerzr in 500
* update dataloader wrappers to have `total_batch_size` attribute by pacman100 in 493
* Introduce automatic gradient accumulation wrapper + fix a few test issues by muellerzr in 484
* add use_distributed property by ZhiyuanChen in 487
* fixing fsdp autowrap functionality by pacman100 in 475
* Use datasets 2.2.0 for now by muellerzr in 481
* Rm gradient accumulation on TPU by muellerzr in 479
* Revert "Pin datasets for now by muellerzr in 477)"
* Pin datasets for now by muellerzr in 477
* Some typos and cosmetic fixes by douwekiela in 472
* Fix when TPU device check is ran by muellerzr in 469
* Refactor Utility Documentation by muellerzr in 467
* Add docbuilder to quality by muellerzr in 468
* Expose some is_*_available utils in docs by muellerzr in 466
* Cleanup CI Warnings by muellerzr in 465
* Link CI slow runners to the commit by muellerzr in 464
* Fix subtle bug in BF16 by muellerzr in 463
* Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 by muellerzr in 462
* Handle bfloat16 weights in disk offload without adding memory overhead by noamwies in 460)
* Handle bfloat16 weights in disk offload by sgugger in 460
* Raise a clear warning if a user tries to modify the AcceleratorState by muellerzr in 458
* Right step point by muellerzr in 459
* Better checks for if a TPU device exists by muellerzr in 456
* Offload and modules with unused submodules by sgugger in 442

0.10.0

This release adds two major new features: the DeepSpeed integration has been revamped to match the one in Transformers Trainer, with multiple new options unlocked, and the TPU integration has been sped up.

This version also officially stops supporting Python 3.6 and requires Python 3.7+

DeepSpeed integration revamp

Users can now specify a DeepSpeed config file when they want to use DeepSpeed, which unlocks many new options. More details in the new [documentation](https://huggingface.co/docs/accelerate/deepspeed).

* Migrate HFDeepSpeedConfig from trfrs to accelerate by pacman100 in 432
* DeepSpeed Revamp by pacman100 in 405

TPU speedup

If you're using TPUs we have sped up the dataloaders and models quite a bit, on top of a few bug fixes.

* Revamp TPU internals to be more efficient + enable mixed precision types by muellerzr in 441

What's new?

* Fix docstring by muellerzr in 447
* Add psutil as depenedency by sgugger in 445
* fix fsdp torch version dependency by pacman100 in 437
* Create Gradient Accumulation Example by muellerzr in 431
* init by muellerzr in 429
* Introduce `no_sync` context wrapper + clean up some more warnings for DDP by muellerzr in 428
* updating tests to resolve runner failures wrt deepspeed revamp by pacman100 in 427
* Fix secrets in Docker workflow by muellerzr in 426
* Introduce a Dependency Checker to trigger new Docker Builds on main by muellerzr in 424
* Enable slow tests nightly by muellerzr in 421
* Push out python 3.6 + fix all tests related to the upgrade by muellerzr in 420
* Speedup main CI by muellerzr in 419
* Switch to evaluate for metrics by sgugger in 417
* Create an issue template for Accelerate by muellerzr in 415
* Introduce post-merge runners by muellerzr in 416
* Fix debug_launcher issues by muellerzr in 413
* Use main egg by muellerzr in 414
* Introduce nightly runners by muellerzr in 410
* Update requirements to pin tensorboard and include psutil by muellerzr in 408
* Fix CUDA examples tests by muellerzr in 407
* Move datasets and transformers to under func by muellerzr in 411
* Fix CUDA Dockerfile by muellerzr in 409
* Hotfix all failing GPU tests by muellerzr in 401
* improve metrics logged in examples by pacman100 in 399
* Refactor offload_state_dict and fix in offload_weight by sgugger in 398
* Refactor version checking into a utility by muellerzr in 395
* Include fastai in frameworks by muellerzr in 396
* Add packaging to requirements by muellerzr in 394
* Better dispatch for submodules by sgugger in 392
* Build Docker Images nightly by muellerzr in 391
* Small bugfix for the stalebot workflow by muellerzr in 390
* Introduce stalebot by muellerzr in 387
* Create Dockerfiles for Accelerate by muellerzr in 377
* Mix precision -> Mixed precision by muellerzr in 388
* Fix OneCycle step length when in multiprocess by muellerzr in 385

0.9.0

This release offers no significant new API, it is just needed to have access to some utils in Transformers.

* Handle deprication errors in launch by muellerzr in 360
* Update launchers.py by tmabraham in 363
* fix tracking by pacman100 in 361
* Remove tensor call by muellerzr in 365
* Add a utility for writing a barebones config file by muellerzr in 371
* fix deepspeed model saving by pacman100 in 370
* deepspeed save model temp fix by pacman100 in 374
* Refactor tests to use accelerate launch by muellerzr in 373
* fix zero stage-1 by pacman100 in 378
* fix shuffling for ShufflerIterDataPipe instances by loubnabnl in 376
* Better check for deepspeed availability by sgugger in 379
* Refactor some parts in utils by sgugger in 380

0.8.0

Big model inference

To handle very large models, new functionality has been added in Accelerate:
- a context manager to initalize empty models
- a function to load a sharded checkpoint directly on the right devices
- a set of custom hooks that allow execution of a model split on different devices, as well as CPU or disk offload
- a magic method that auto-determines a device map for a given model, maximizing the GPU spaces, available RAM before using disk offload as a last resort.
- a function that wraps the last three blocks in one simple call (`load_checkpoint_and_dispatch`)

See more in the [documentation](https://huggingface.co/docs/accelerate/main/en/big_modeling)

* Big model inference by sgugger in 345

What's new

* Create peak_memory_uasge_tracker.py by pacman100 in 336
* Fixed a typo to enable running accelerate correctly by Idodox in 339
* Introduce multiprocess logger by muellerzr in 337
* Refactor utils into its own module by muellerzr in 340
* Improve num_processes question in CLI by muellerzr in 343
* Handle Manual Wrapping in FSDP. Minor fix of fsdp example. by pacman100 in 342
* Better prompt for number of training devices by muellerzr in 344
* Fix prompt for num_processes by pacman100 in 347
* Fix sample calculation in examples by muellerzr in 352
* Fixing metric eval in distributed setup by pacman100 in 355
* DeepSpeed and FSDP plugin support through script by pacman100 in 356

0.7.1

- Fix fdsp config in cluster [331](https://github.com/huggingface/accelerate/pull/331)
- Add guards for batch size finder [334](https://github.com/huggingface/accelerate/pull/334)
- Patchfix infinite loop [335](https://github.com/huggingface/accelerate/pull/335)

Page 6 of 14

Releases

Has known vulnerabilities

Previous Next

Accelerate

Page 6 of 14

0.12.0

0.11.0

0.10.0

0.9.0

0.8.0

0.7.1

Page 6 of 14

Links

Releases