Transformers

Latest version: v4.39.3

Safety actively analyzes 618874 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 26

4.39.3

The `AWQ` issue persisted, and there was a regression reported with beam search and input embeddings.

Changes
- Fix BC for AWQ quant 29965
- generate fix breaking change for patch 29976

4.39.2

Series of fixes for backwards compatibility (AutoAWQ and other quantization libraries, imports from `trainer_pt_utils`) and functionality (LLaMA tokenizer conversion)

* Safe import of LRScheduler 29919
* [`BC`] Fix BC for other libraries 29934
* [`LlamaSlowConverter`] Slow to Fast better support 29797

4.39.1

Patch release to fix some breaking changes to LLaVA model, fixes/cleanup for Cohere & Gemma and broken doctest

* Correct llava mask & fix missing setter for `vocab_size` 29389
* [`cleanup`] vestiges of causal mask 29806
* [`SuperPoint`] Fix doc example (https://github.com/huggingface/transformers/pull/29816)

4.39.0

🚨 VRAM consumption 🚨
The `Llama`, `Cohere` and the `Gemma` model both no longer cache the triangular causal mask unless `static` cache is used. This was reverted by 29753, which fixes the BC issues w.r.t speed , and memory consumption, while still supporting compile and static cache. Small note, `fx` is not supported for both models, a patch will be brought very soon!

New model addition

Cohere open-source model

Command-R is a generative model optimized for long context tasks such as retrieval augmented generation (RAG) and using external APIs and tools. It is designed to work in concert with Cohere's industry-leading Embed and Rerank models to provide best-in-class integration for RAG applications and excel at enterprise use cases. As a model built for companies to implement at scale, Command-R boasts:

- Strong accuracy on RAG and Tool Use
- Low latency, and high throughput
- Longer 128k context and lower pricing
- Strong capabilities across 10 key languages
- Model weights available on HuggingFace for research and evaluation

* Cohere Model Release by saurabhdash2512 in 29622

4.38.2

Fix backward compatibility issues with Llama and Gemma:

We mostly made sure that performances are not affected by the new change of paradigm with ROPE. Fixed the ROPE computation (should always be in float32) and the `causal_mask` dtype was set to bool to take less RAM.

YOLOS had a regression, and Llama / T5Tokenizer had a warning popping for random reasons

- FIX [Gemma] Fix bad rebase with transformers main (29170)
- Improve _update_causal_mask performance (29210)
- [T5 and Llama Tokenizer] remove warning (29346)
- [Llama ROPE] Fix torch export but also slow downs in forward (29198)
- RoPE loses precision for Llama / Gemma + Gemma logits.float() (29285)
- Patch YOLOS and others (29353)
- Use torch.bool instead of torch.int64 for non-persistant causal mask buffer (29241)

4.38.1

Fix eager attention in Gemma!

- [Gemma] Fix eager attention 29187 by sanchit-gandhi

TLDR:
diff
- attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+ attn_output = attn_output.view(bsz, q_len, -1)

Page 1 of 26

Releases

Has known vulnerabilities

Transformers

Page 1 of 26

4.39.3

4.39.2

4.39.1

4.39.0

4.38.2

4.38.1

Page 1 of 26

Links

Releases