This is the release note of v9.0.0a2. See [here](https://github.com/cupy/cupy/milestone/84?closed=1) for the complete list of solved issues and merged PRs.
**Update (2020-12-02):** Unfortunately, the Windows build of this release is not working. We have taken down Windows wheels from PyPI, but if you need one for reference purposes you can still download them from the Assets section below. We are working hard to resolve this issue towards the next v9.0.0b1 release.
Highlights
`cupy.vectorize` & Initial CUDA JIT support
With this release, we are including a very early version of a Python to CUDA transpiler that will allow users to write their own CUDA kernels in Python, similarly to what Numba does. However, while Numba works on the bytecode and directly outputs the PTX code using LLVM, our approach uses the Python AST to directly translate the source code to C-CUDA and compile it using the NVIDIA toolchain, aiming to achieve a higher performance in the long run.
py
import cupy
def f(x, y):
This code will be compiled to a CUDA kernel by our JIT
return x * x + y * y
x = cupy.linspace(0, 10, 6)
y = cupy.linspace(0, 20, 6)
func = cupy.vectorize(f)
out = func(x, y)
out is [ 0. 20. 80. 180. 320. 500.]
The initial version provides the limited support of primitive operators but we will be going forward in the upcoming releases. Check out 4290 if you are interested.
Jitify for raw kernels and modules
Thanks to leofang now it is possible to use headers and libraries that were not possible before in `RawKernel` or `RawModule` due to the NVRTC reliance. With the new `jitify=True` option, [Jitify](https://github.com/NVIDIA/jitify) is applied to your code so that you can use libraries such as the cuRAND device API, or CUB device routines in your raw kernels.
`cupyx.lapack` now as a public interface to cuBLAS
Until now, cuBLAS & cuSOLVER bindings were not publicly exposed in the API. However, with the introduction of `cupyx.lapack` by anaruse, now it is possible to use LAPACK compatible routines backed by cuBLAS & cuSOLVER with a much simpler interface.
Deprecations in upcoming releases
We are going to drop support for Python 3.5 and obsolete libraries such as CUDA 9.0 and NumPy 1.16. Leave a comment in 4300 if you have any concerns in your use-case.
Changes
New Features
- Add `cupy.cusolver.gesv` that uses `cusolverDn<t1><t2>gesv` (3917)
- Add `cupy.cusolver.gels` that uses `cusolverDn<t1><t2>gels` (4073)
- Add `cupy.vectorize` (4135)
- Support cuFFT callbacks (4141)
- Add `spline_filter1d` and `spline_filter` to `cupyx.scipy.ndimage.interpolation` (4145)
- Add `cupyx.scipy.sparse.linalg.svds` (4155)
- Improve coverage of cuBLAS L1 functions (4205)
- Change the wrapper arguments for cuBLAS L2 functions (4221)
- Add `cupyx.scipy.sparse.linalg.cg` (4222)
- Support Jitify (4228)
- Add `cupyx.lapack` (4235)
Enhancements
- Remove `cupy.testing.NumpyError` (4225)
- Fix an issue in `cupyx.scipy.sparse.linalg.eigsh` with CUDA 9.2 (4231)
- Add parallel build feature (4240)
- Support compile-time constants in CuPy JIT (4241)
- Detect the CC of the device when building (4242)
- Reduce the file size of cuFFT callback modules (4267)
- Bump cuDNN to v8.0.5 (4303)
- Detect Jitify version (4306)
Performance Improvements
- Improve `cupy.random.randint` (4160)
- Improve performance of `cupyx.scipy.sparse.linalg.eigsh` (4214)
- Improve `convolve`/`correlate` (4248)
- Improve Jitify performance (4277)
Bug Fixes
- Respect user-supplied output array in all binary morphology functions (4157)
- Refactor `AssertFunctionIsCalled` (4233)
- Fix possible redefinition of "-ccbin" in `cupy.fft._callback` (4276)
- Fix undefined symbols in `cupy.fft._callback` (4283)
- Fix issues when coo sparse matrix is created from dense matrix (4295)
- Fix `cupy.random.bytes` not working (4318)
Code Fixes
- Use `.imag = 0` at hipFFT workaround (4234)
- Use `assert` statement instead of `self.assert*` methods (4292)
Documentation
- Add a note in `free_all_blocks` reference (4196)
- Add cupy-cuda111 to README (4210)
- Add missing functions to the API reference (4215)
- Add description of env vars for parallel build and auto cc detection (4250)
- Add `spline_filter` functions to ndimage docs (4265)
- `cupy-cuda111` package now on PyPI (4333)
Installation
- Reset `extra_compile_args` for each module (4336)
- Tentatively hide `CUPY_NUM_BUILD_JOBS` option (4339)
HIP/ROCm
- ROCm: Fix filters in `cupyx.scipy.ndimage` - Part 1 (4271)
- ROCm: fix ndimage interpolation (4301)
Tests
- Remove unused features in `testing.parameterized` (4178)
- Add pytest backend implementation of `testing.parameterize` (4192)
- Use Python 3.6 in Travis CI (4206)
- Add repr of parameterized test and stop adding error message (4211)
- Fix `numpy_cupy_equal` for case that both numpy cupy raise errors (4244)
- Fix tests of `__bytes__` (4252)
- Use GitHub Actions (4261)
- Skip some failing tests for fp16 + CUDA 9.0 (4299)
- Assert the results are bool (4310)
- Add import test for ROCm (4326)
Others
- Bump version to v9.0.0a2 (4331)
- Add `equal_nan` toggle for `NaN` values in `array_equal` (4203)
- Fix output dtype of `linalg.norm` (4227)
- Warn non-tuple sequence for multidimensional indexing (4245)
- `DeprecationWaring` on truth value on empty array (4308)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
grlee77 aitikgupta anaruse leofang