Mlx

Latest version: v0.13.1

Safety actively analyzes 630254 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.9.0

Highlights:
- Fast partial RoPE (used by Phi-2)
- Fast gradients for RoPE, RMSNorm, and LayerNorm
- Up to 7x faster, [benchmarks](https://github.com/ml-explore/mlx/pull/883#issue-2204137982)

Core
- More overhead reductions
- Partial fast RoPE (fast Phi-2)
- Better buffer donation for copy
- Type hierarchy and issubdtype
- Fast VJPs for RoPE, RMSNorm, and LayerNorm

NN
- `Module.set_dtype`
- Chaining in `nn.Module` (`model.freeze().update(…)`)

Bugfixes
- Fix set item bugs
- Fix scatter vjp
- Check shape integer overlow on array construction
- Fix bug with module attributes
- Fix two bugs for odd shaped QMV
- Fix GPU sort for large sizes
- Fix bug in negative padding for convolutions
- Fix bug in multi-stream race condition for graph evaluation
- Fix random normal generation for half precision

0.8.0

Highlights

- More perf!
- `mx.fast.rms_norm` and `mx.fast.layer_norm`
- Switch to Nanobind [substantially reduces overhead](https://github.com/ml-explore/mlx/pull/839#issuecomment-2002659144)
- Up to 4x faster `__setitem__` (e.g. `a[...] = b`)

Core

- `mx.inverse`, CPU only
- vmap over `mx.matmul` and `mx.addmm`
- Switch to nanobind from pybind11
- Faster __setitem__ indexing
- [Benchmarks](https://github.com/ml-explore/mlx/pull/861#issuecomment-2010791492)
- `mx.fast.rms_norm`, [token generation benchmark](https://github.com/ml-explore/mlx/pull/862)
- `mx.fast.layer_norm`, [token generation benchmark](https://github.com/ml-explore/mlx/pull/870#issuecomment-2013707376)
- vmap for inverse and svd
- Faster non-overlapping pooling

Optimizers
- Set minimum value in cosine decay scheduler

Bugfixes
- Fix bug in multi-dimensional reduction

0.7.0

Highlights
- Perf improvements for attention ops:
- No copy broadcast matmul ([benchmarks](https://github.com/ml-explore/mlx/pull/801#issuecomment-1989548617))
- Fewer copies in reshape

Core

- Faster broadcast + gemm
- [benchmarks](https://github.com/ml-explore/mlx/pull/801#issuecomment-1989548617)
- `mx.linalg.svd` (CPU only)
- Fewer copies in reshape
- Faster small reductions
- [benchmarks](https://github.com/ml-explore/mlx/pull/826#issue-2182833003)

NN
- `nn.RNN`, `nn.LSTM`, `nn.GRU`

Bugfixes
- Fix bug in depth traversal ordering
- Fix two edge case bugs in compilation
- Fix bug with modules with dictionaries of weights
- Fix bug with scatter which broke MOE training
- Fix bug with compilation kernel collision

0.6.0

Highlights:
- Faster quantized matrix-vector multiplies
- [Benchmarks](https://github.com/ml-explore/mlx/pull/786#issue-2168529706)
- `mx.fast.scaled_dot_product_attention` fused op

Core
- Memory allocation API improvements
- Faster GPU reductions for smaller sizes (between 2 and 7x)
- [Benchmarks](https://github.com/ml-explore/mlx/pull/783#issuecomment-1977460059)
- `mx.fast.scaled_dot_product_attention` fused op
- Faster quantized matrix-vector multiplications
- Pickle support for `mx.array`

NN
- Dilation on convolution layers

Bugfixes
- Fix `mx.topk`
- Fix reshape for zero sizes

0.5.0

Highlights:

- Faster convolutions.
- Up to 14x faster for some common sizes.
- See [benchmarks](https://github.com/ml-explore/mlx/pull/651#issuecomment-1969759609)

Core
- `mx.where` properly handles `inf`
- Faster and more general convolutions
- Input and kernel dilation
- Asymmetric padding
- Support for cross-correlation and convolution
- `atleast_{1,2,3}d` accept any number of arrays


NN
- `nn.Upsample` layer
- Supports nearest neighbor and linear interpolation
- Any number of dimensions

Optimizers
- Linear schedule and schedule joiner:
- Use for e.g. linear warmup + cosine decay

Bugfixes
- `arange` throws on `inf` inputs
- Fix Cmake build with MLX
- Fix `logsumexp` `inf` edge case
- Fix grad of power w.r.t. to exponent edge case
- Fix compile with `inf` constants
- Bug temporary bug in convolution

0.4.0

Highlights:

- Partial shapeless compilation
- Default shapeless compilation for all activations
- Can be more than 5x faster than uncompiled versions
- CPU kernel fusion
- Some functions can be up to [10x faster](https://github.com/ml-explore/mlx/pull/691#issuecomment-1947708207)

Core

- CPU compilation
- Shapeless compilation for some cases
- `mx.compile(function, shapeless=True)`
- Up to 10x faster scatter: [benchmarks](https://github.com/ml-explore/mlx/pull/709#issuecomment-1951733610)
- `mx.atleast_1d`, `mx.atleast_2d`, `mx.atleast_3d`

Bugfixes

- Bug with `tolist` with `bfloat16` and `float16`
- Bug with `argmax` on M3

Page 2 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.