This is the release note of v9.0.0rc1. See [here](https://github.com/cupy/cupy/pulls?q=is%3Apr+milestone%3Av9.0.0rc1+is%3Aclosed) for the complete list of solved issues and merged PRs.
We are planning to release the final v9.0.0 on April 22th. Please start testing your workload with this release. See the [Upgrade Guide](https://docs.cupy.dev/en/latest/upgrade.html#cupy-v9) for the list of possible breaking changes.
We are running a [Gitter chat](https://gitter.im/cupy/community) for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
CuPy JIT (4774)
Now creating raw kernels out of python functions is possible thanks to the introduction of the `cupyx.jit.rawkernel` decorator.
python
from cupyx import jit
jit.rawkernel()
def f(x, y, z, n):
tid = jit.threadIdx.x + jit.blockIdx.x * jit.blockDim.x
ntid = jit.blockDim.x * jit.gridDim.x
for i in range(tid, n, ntid):
z[i] = x[i] + y[i]
n = numpy.uint32(1024)
x = cupy.arange(n)
y = cupy.arange(n)
z = cupy.empty((n,), dtype='l')
f[16, 16](x, y, z, n)
Support for Generalized Universal Functions (4675)
We have added an interface to support Generalized Universal Functions based on the one in Dask. Currently, it is used in `matmul` to ensure compatibility with `__array_ufunc__` numpy dispatching.
cuTENSOR Support in Binary Packages (4600)
cuTENSOR support is now enabled in wheel packages. To use cuTENSOR features you will need to install the shared library using `python -m cupyx.tools.install_library --cuda 11.2 --library cutensor` after installing wheels.
New Sphinx Theme in Documentation (4351)
Following NumPy, we have adopted the `pydata_sphinx_theme` in our [documentation site](https://docs.cupy.dev/en/latest/) starting from this release.
CUDA 11.0 and 11.1 wheels not available yet in PyPI (4971)
In the meantime they can be downloaded from the Assets section below. See 4971 for the detailed instructions.
Changes without compatibility
`cupy.cuda.nccl` is hidden by default (4919)
NCCL wrapper is no longer imported in `cupy/cuda/__init__.py` requiring it to be explicitly imported from `cupy.cuda.nccl`.
Drop NCCL & cuDNN shared libraries from wheels (4850, 4932)
NCCL and cuDNN shared libraries are no longer bundled in all wheels. To activate features using NCCL / cuDNN in CuPy v9, you will need to install these libraries using `python -m cupyx.tools.install_library` tool after installing CuPy wheels. See the [Installation Guide](https://docs.cupy.dev/en/latest/install.html#installing-cupy) for details.
By eliminating the default bundling of cuDNN & NCCL we have achieved further reductions in the wheel size averaging 5x.
Deprecate `cupy.bool`, `cupy.int`, `cupy.float` and `cupy.complex` (4790)
Following NumPy 1.20 API, these aliases for the Python scalar types have been deprecated.
`cupy.bool_`, `cupy.int_`, `cupy.float_` and `cupy.complex_` should be used instead when required.
Docker image updated to CUDA 11.2 and Python 3.8
[The official Docker image](https://hub.docker.com/r/cupy/cupy) is now updated to use CUDA 11.2 and Python 3.8.
Changes
New Features
- LOBPCG solver - `cupyx.scipy.sparse.linalg.lobpcg` (4281)
- Add diagonal and setdiag methods for COO sparse matrices (4664)
- Support for Generalized Universal Functions (4675)
- Support batched `pinv` (4686)
- Add CuPy JIT Kernel definition (4774)
- Add `cupy.random.Generator.standard_normal` (4885)
- Support tuple in CuPy JIT (4890)
- Add exponential distribution to random API (4915)
- Support tuple indexing in CuPy JIT (4939)
- Support `__syncthreads()` in CuPy JIT (4941)
Enhancements
- Support `nvrtcGetSupportedArchs` (4691)
- Update DLPack support (4695)
- Bump cuDNN to v8.1.1 in library installer tool (4780)
- Support `norm='forward'`/`'backward'` in `cupy.fft` functions (4797)
- Fix for flake8 F541 (4803)
- Complete build only when all of the essential modules are available (4815)
- Support `norm='forward'`/`'backward'` in `cupyx.scipy.fft` functions (4816)
- Support cuSparse functions for matrix conversion added in CUDA 11.2 (4844)
- Add NCCL to library installer (4848)
- Improve cuTENSOR installer (4852)
- Support `cupy.ndarray` type `shift` in `cupy.roll` (4884)
- Fix uniform random generation interval (4894)
- Use NVCC `--threads` option when building CuPy (4908)
- Bump headers to CUDA 11.2.2 (4911)
- Update preload to look for `lib` directory to support cuTENSOR/NCCL (4912)
- Move the NCCL module to `cupy_backends.cuda.libs` (4919)
- Add `cupy/cuda/cutensor.py` (4920)
Performance Improvements
- Improve batched SVD (4731)
- Avoid evaluating PTDS environment variable every time (4842)
Bug Fixes
- Fix dtypes in `cupy.linalg` (4363)
- Fix: avoid redeclaring attributes (4764)
- Windows: Fix compiler error for CUB block reduction kernels (4771)
- Support int argument for Dirichlet shape (4772)
- Windows: Fix `histogram` test failures (4777)
- Windows: fix sparse matrix indexing type (4778)
- Unify linux/windows `randint` with NumPy (4808)
- Improve/fix csc/csr argmax/argmin (4813)
- ROCm: Fix sorting bug (4823)
- Fixed choice function for 0 samples from 0 candidates (4830)
- Fix redeclaration of sparse warning classes (4837)
- Fix cuFFT callback compilations - v2 (4853)
- Solve `UnboundLocalError` on `copy_from_host_async` (4900)
- Add `out` arg verifier in new random interface. (4904)
- Fix compilation error due to invalid complex-to-real casting in `_SimpleReductionKernel` (4909)
- Fix C++ compilation error (4922)
- Fix cutensor import (4933)
- Fix flaky `CUDAarray` tests (4946)
- Declare `CArray._indexing()` only in CuPy JIT mode (4951)
Code Fixes
- Rename submodules under `cupy.testing` package (3868)
- Fix: code quality issues (4587)
- Use newest versions of stylecheck packages (4694)
- Clean-up sparse max/min argmax/argmin (4860)
Documentation
- Use pydata_sphinx_theme in Sphinx (4351)
- Remove `cupy-cuda112` support from documentation (4761)
- Revert "Remove `cupy-cuda112` support from documentation" (4785)
- Fix broken Stream docs (4843)
- Reformat environment variables table (4845)
- Revert memory back to reference (4857)
- Update wheel list in README (4910)
- Merge ROCm installation guide (4928)
- Document that cuDNN and NCCL are no longer included (4932)
- Update install docs (4943)
Installation
- Support optional dependencies from Conda-Forge (4873)
- Bump version to v9.0.0rc1 (4953)
- Bump Docker image to use CUDA 11.2 (4972)
Tests
- Show config on Windows CI (4649)
- Windows: Fix test condition for CUB device kernels (4776)
- Xfail some tests for `cupyx.scipy.statistics.correlation` under ROCm/HIP (4781)
- Windows: fix vectorize tests (4794)
- Windows: fix OOM errors in the CI (4801)
- Windows: Fix `sepfir2d` tests (4804)
- Windows: Fix cuTENSOR tests (4806)
- Windows: Fix cuTENSOR tests (4818)
- Remove AppVeyor configurations (4836)
- Windows: Fix `test_poly1d_pow_scalar` (4854)
- Fix for flake8 E741 (4888)
- Windows: Skip failing cuDNN tests (4893)
- Add names for workflows (4913)
- Prioritize FlexCI daemon in Windows CI (4916)
- Fix to work with scheduled FlexCI job (4929)
- Change irfft tests tolerance (4937)
- Xfail tests for ndarray indexing under HIP (4653)
- Adjust tolerance of `TestPolyArithmeticDiffTypes` under HIP/ROCm (4657)
- Xfail tests in polynomial roots (4658)
- Xfail tests for manipulation dims under HIP/ROCm (4662)
- Xfail `TestPolyfitParametersCombinations` when `deg == 0` under ROCm/HIP (4758)
- Xfail `TestPolyfitCovMode` when `deg == 0` under ROCm/HIP (4759)
- Xfail `TestInvh` under ROCm/HIP (4760)
- ROCm: remove the need to set `HCC_AMDGPU_TARGET` at runtime (4766)
- Assert `MT19937` not implemented in `hipRAND` (4769)
- Xfail chi-squared test for some random functions under ROCm/HIP (4770)
- Remove duplicated typedef in example when HIP (4782)
- Xfail cuDNN version check test under ROCm/HIP (4791)
- Remove solved xfail mark for msort (4792)
- Fix to test checking HIP version (4859)
- Xfail test on sparse handle under ROCm/HIP (4861)
- Xfail some tests under ROCm/HIP (4868)
- Xfail some conditions of ndimage filter under ROCm/HIP (4877)
- Xfail some conditions of ndimage interpolation tests under ROCm/HIP (4878)
- Xfail some conditions of ndimage measurements under ROCm/HIP (4879)
- Xfail some conditions of signal tests under ROCm/HIP (4880)
Others
- Add `CODEOWNERS` file (4757)
- Add GitHub Actions workflow for automatic backport (4812)
- Fix pytest opts for Windows CI (4820)
- Use access token for automated backport (4833)
- Fix automated backport workflow (4835)
- Use pull_request_target trigger in backport automation (4841)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
anaruse aryamccarthy grlee77 leofang mattvend povinsahu1909 venkywonka viantirreau withshubh