Accera

Latest version: v1.2.29

Safety actively analyzes 629855 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 6

1.2.17

What's Changed

- Merged PR 3029: Work around constraint resolution issues with dynamic
split size 1. [Mason Remy]

Work around constraint resolution issues with dynamic split size 1

**Full Changelog**: https://github.com/microsoft/Accera/compare/v1.2.16...v1.2.17

1.2.16

What's Changed
- Merged PR 3027: Hack required to use Array as output element argument
(Dimension) [Captain Jack Sparrow]

- Merged PR 3025: Add arg name and size string required for hat
metadata. [Captain Jack Sparrow]

Add arg name and size string required for hat metadata

- Merged PR 3017: Output array supports gather function. [Denny Sun]

Add the dsl test for gather function.

**Full Changelog**: https://github.com/microsoft/Accera/compare/v1.2.15...v1.2.16

1.2.15

What's Changed
- Merged PR 3018: Use VS 17.4.3-built binaries. This is in a separate
channel to allow older ve... [Mason Remy]

Use VS 17.4.3-built binaries. This is in a separate channel to allow older versions to keep working
- Merged PR 3012: Correctness check for output array support for range
node. [Denny Sun]

Successful correctness check means output array support can work end to end.
- Merged PR 3015: Update hatlib version to support floating type as
function arg. [Denny Sun]

Update hatlib version to support floating type as function arg
- Merged PR 3010: Disable BinOp simplification for floating types.
[Captain Jack Sparrow]

Disable BinOp simplification for floating types
- Merged PR 3013: Apply major version in docs. [Lisa Ong]

Removes the need to update docs versions every time we release
- Merged PR 2981: Prologue and Epilogue op support with tensorization
and caching. [Captain Jack Sparrow]

- Add optional prologue and epilogue support for tensorization
- Supported gemm parameters with fragment ops are: {alpha: 1, beta: any} and {alpha: >1, beta: 0}
- ReLU, SET, SCALE added a standard fragment op

Related work items: 3704

**Full ChangeLog** https://github.com/microsoft/Accera/compare/v1.2.14...v1.2.15

1.2.14

- Merged PR 3001: [test] Expect failures on macos for x86 intrinsics
tests. [Lisa Ong]

macos does not support x86 and x86 avx intrinsicts
- Merged PR 3000: Expect failures for macos in vpmaddwd tests. [Lisa
Ong]
- Merged PR 2994: Bump hatlib to 0.0.32. [Lisa Ong]
- Merged PR 2997: Support more casting cases in vpmaddwd matcher. [Mason
Remy]

Support more casting cases in vpmaddwd matcher
- Merged PR 2996: [release] bump docs to 1.2.14 for next release. [Lisa
Ong]

bump docs to 1.2.14 for next release

**Full Changelog**: https://github.com/microsoft/Accera/compare/v1.2.13...v1.2.14

1.2.13

- Merged PR 2987: Add support for max/min/round ops and vectorizing
those ops. [Mason Remy]

Add support for max/min/round ops and vectorizing those ops
- Merged PR 2963: Control TEMP array allocation location. [Mason Remy]

Control TEMP array allocation location
- Merged PR 2962: Expand vpmaddwd matching and add intrinsic call.
[Mason Remy]

Expand vpmaddwd matching and add intrinsic call

Matches more vpmaddwd cases and creates a pathway to invoking the LLVM
intrinsic directly.
- Merged PR 2961: Match more vectorization patterns and support
vectorized cast. [Mason Remy]

Match more vectorization patterns and support vectorized cast

Tries to match and rewrite vectorization patterns:
- 2-loop interleaving store -> vector shuffle and store
- simple horizontal reductions (not always efficient currently)
- vectorized casts

Makes vectorization of non-innermost loops do a per-op "inplace" unroll and
vectorize the innermost loop
TODO : update documentation to describe this behavior better
- Merged PR 2960: Enable marking functions as no-inline-into. [Mason
Remy]

Enable marking functions as no-inline-into

Functions marked no-inline-into
won't inline calls to other functions within their body. This
is a useful compiler performance (not emitted code performance)
optimization when we have many nested functions calls
- Merged PR 2986: [output array] Emit range function with input_output
type arguments. [Denny Sun]

Instead of using output type, we use input_output instead to generate two functions for the Range function.
Now Accera can successfully generate code for range function.

Generate functions like:
get_size(float start, float limit, float delta, int64_t* output_dim);
get_array(int64_t input_dim, float* output, float start, float delta);

- Merged PR 2959: Improved affine for op range simplification. [Mason
Remy]

Improved affine for op range simplification

Add range value / constant-cmp-result patterns and affine for op range
simplifications to the affine simplification pass and run it after
inlining functions.
When inlining a dynamically-sized function into a statically-sized
function, this change is useful for resolving the dynamic ranges to
constants and pruning dynamic-range loops that are not needed given the
specific constant value being used.
- Merged PR 2958: Hack to erase loops in a nest to support nest-of-nest
or overfused. [Mason Remy]

Hack to erase loops in a nest to support nest-of-nest or overfused
scenarios

This change enables an action plan to erase loops. Typically this would
be used when an outer nest traverses tiles and invokes an inner nest (or
multiple nests) which operate within each tile. The outer nest still
needs to cover the full iteration space, however after splitting by the
tile sizes a user will not want the outer nest to perform the inner
loops
- Merged PR 2985: [release] Rev docs to 1.2.13. [Lisa Ong]
- Merged PR 2983: Increase timeouts of GPU benchmarks. [Captain Jack
Sparrow]

Increase timeouts of GPU benchmarks
- Merged PR 2982: Work around bug with redundant splits of dynamic
dimensions. [Mason Remy]

Work around bug with redundant splits of dynamic dimensions
- Merged PR 2972: Build both static and dynamic binaries by default, put
both in aux dependencies. [Kern Handa]
- Merged PR 2975: Updates llc/opt build flags to enable more
optimizations by default. [Kern Handa]

Updates llc/opt build flags to enable more optimizations by default
- Merged PR 2977: Updates CMake to do FindPython before pybind11 config.
[Kern Handa]

Updates CMake to do FindPython before pybind11 config
- Merged PR 2955: Reduce Linux PR runtime to under 60mins. [Lisa Ong]

Filter DEV_MODE reruns to dsl_tests.py, this is not comprehensive and is a best effort.

**Full Changelog**: https://github.com/microsoft/Accera/compare/v1.2.12...v1.2.13

1.2.12

What's Changed

- Merged PR 2953: Workaround debug mode failures with dimension argument
ordering. [Lisa Ong]

* Order dimension arguments after Array args to avoid this lowering issue in Debug mode (until Debug mode is fixed)

test_all_dynamic_sizes_static_unroll_matmul_llvm.mlir:236:28: error: use of value '%7' expects different type than prior uses: 'i64' vs '!llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>'
%42 = llvm.insertvalue %7, %41[3, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>
^
/Users/lisaong/work/staging/Accera/build/lib.macosx-11.1-arm64-3.10/test_acccgen/test_all_dynamic_sizes_static_unroll_matmul/_tmp/test_all_dynamic_sizes_static_unroll_matmul/test_all_dynamic_sizes_static_unroll_matmul_llvm.mlir:201:5: note: prior use here
%7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>
^

* Enable DEV_MODE tests in one CI pipeline so that we can catch these in the future
- Merged PR 2950: [Release] Rev docs to v1.2.12. [Lisa Ong]

In preparation for 1.2.12 release EOW
- Merged PR 2946: Fix hierarchical partial fusing. [Mason Remy]

Fix hierarchical partial fusing

Index attributes in fragment predicate ops weren't getting updated as
part of fusion mapping old indices to new fused indices. This fix is a
quick change to recursively walk predicates and update their index
attributes manually.
In the future we could use SymbolicIndexOps and rely on
BlockAndValueMapping replacements in clone, however this will also
require that we don't create as many duplicate SymbolicIndexOps for the
same Index
- Merged PR 2942: Hold onto intermediate split indices when fusing.
[Mason Remy]

Hold onto intermediate split indices when fusing

When we split a loop multiple times, the outer index references the
inner intermediate split indices in affine expressions, even if those
indices get further split and are no longer loop indices. We have been
discarding them because they aren't loop indices or dimension indices,
but they wound up getting re-added to the transformed domain by
serialization and this led to fusion bugs.
- Merged PR 2834: match and rewrite a pattern to vectorize int16 matmul.
[JUBI TANEJA]

This rewrite rule matches the jj and kk loops in int16 matmul, where outer loop `jj` `{0..8}` is followed by an inner loop `kk` `{0..2}`. It vectorizes the `jj` and `kk` loop and replaces each affine op by a vectorized op. At the end, it generates `vpmaddwd` instruction for MatMul.
- Merged PR 2918: Support vectorization and static size caching for
split dynamic range. [Mason Remy]

Support vectorization and static size caching for split dynamic range
loops
- Merged PR 2914: Support static loop splits of dynamic sized ranges.
[Mason Remy]

Support static loop splits of dynamic sized ranges

This change creates a specialization of the AffineConstraintsHelper that
works with Loopnest concepts and uses that in LoopNestBuilder to update
the loop split generation
- Merged PR 2911: Support dynamic ranges in ScheduledLoopOp. [Mason
Remy]

Support dynamic ranges in ScheduledLoopOp
- Merged PR 2907: Implement initial affine constraint helper for dynamic
size loop. [Mason Remy]

Implement initial affine constraint helper for dynamic size loop
handling

Implements a wrapper around mlir::FlatAffineValueConstraints and a set
of low-level tests using it that enable static-sized splitting of
dynamic loop ranges
- Merged PR 2935: Remove thread coarsening factor > 4 from GPU
benchmarks. [Captain Jack Sparrow]

Remove thread coarsening factor > 4 from GPU benchmarks
- Merged PR 2932: Upgrade to CUDA 11.8. [Captain Jack Sparrow]

Upgrade to CUDA 11.8
- Merged PR 2931: Update to ROCm 5.3. [Captain Jack Sparrow]

Update to ROCm 5.3
- Merged PR 2926: Plumb parameter usages to emitted HAT files. [Lisa
Ong]
- Merged PR 2927: Reduce benchmark configs using thread coarsening.
[Captain Jack Sparrow]

Reduce benchmark configs using thread coarsening
- Merged PR 2925: Add optional optimization hint for number of thread
blocks per SM. [Captain Jack Sparrow]

Add optional optimization hint for number of thread blocks per SM

Related work items: 3736

Page 3 of 6

Releases

Has known vulnerabilities

Previous Next

Accera

Page 3 of 6

1.2.17

1.2.16

1.2.15

1.2.14

1.2.13

1.2.12

Page 3 of 6

Links

Releases