This is the release note of v10.0.0a2. See [here](https://github.com/cupy/cupy/pulls?q=is%3Apr+is%3Aclosed+milestone%3Av10.0.0a2) for the complete list of solved issues and merged PRs.
We are running a [Gitter chat](https://gitter.im/cupy/community) for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
* CuPy now supports CUDA 11.3 (`cupy-cuda113`) and AMD ROCm 4.2 (`cupy-rocm-4-2`) and binary wheels are now available on PyPI.
* The following Python syntax and new APIs can now be used in JIT target functions.
- Calling `len`, `min`, `max` Python built-ins.
- `len(arr)`: Equivalent to `arr.shape[0]`.
- `min(scalar1, scalar2, ...)`: Returns the minimum value of the inputs.
- `max(scalar1, scalar2, ...)`: Returns the maximum value of the inputs.
- Accessing `.ndim`, `.size` attributes of `ndarray`.
- Unpacking nested tuples.
- `(x, y), z = ...`
- `jit.grid()` API, similar to [`numba.cuda.grid`](https://numba.pydata.org/numba-doc/latest/cuda-reference/kernel.html#numba.cuda.grid).
- `x, y, z = cupyx.jit.grid(3)` (`x` is equal to `threadIdx.x + blockIdx.x * blockDim.x`.)
- Warp shuffle and sync functions.
- `cupyx.jit.shfl_down_sync(mask, var, val_id)` (`__shfl_down_sync(mask, var, val_id)`)
* `cupyx.scipy.sparse.{coo,csr,csc}_matrix` now provides the `reshape` method.
Changes without compatibility
Drop CUDA 9.2 & NCCL 2.4 Support (5214)
CUDA 9.2 and NCCL 2.4 are no longer supported in CuPy v10.
Changes in Stream behavior (5251)
The same `cupy.cuda.Stream` instance can now safely be shared between multiple threads. To achieve this, CuPy v10 will not destroy the stream (i.e., call ``cudaStreamDestroy``) if the stream is the current stream of any thread.
Known Issues
* `cupy-cuda111` wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (5313).
* `cupy-cuda110` and `cupy-cuda111` wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See 4971 for detailed instructions.
Changes
New Features
- Add reshape method for COO, CSR and CSC matrices (5301)
- Support `len`, `min`, `max`, `.ndim`, `.size` in jit (5319)
- Support nested tuple unpack in CuPy JIT (5332)
- Support Numba-like `jit.grid()` syntax in CuPy JIT (5334)
- Support warp shuffle and sync functions in CuPy JIT (5335)
Enhancements
- Do not use handles unless requested in `cupy.show_config()` (5073)
- Fix to allow sharing a Stream instance between threads (5251)
- Adding GUFunc order, dtype and casting kwarg support (5260)
- Support `nan`, `posinf`, `neginf` in `cupy.nan_to_num` (5295)
- Use independent version of hipFFT for ROCm 4.1 and later (5318)
- Support cuTENSOR v1.3.1 (5338)
- Support cuDNN v8.2.1 (5357)
Performance Improvements
- Make cuTENSOR available in `cupy.einsum` (5203)
Bug Fixes
- Fix `check_availablity` for `cupy.cusolver` (5207)
- Fix `MemoryAsync` to keep a weakref to stream (5264)
- Fix cuFFT callback for `sm_61` etc (5304)
- Fix cuDNN preloading (5327)
- Fix large arrays assignment (5330)
- Ensure source array is C-contiguous before copying to `CUDAArray` (5342)
- Increase test coverage for Generalized Universal Functions (5344)
- Remove unnecessary print (5374)
Code Fixes
- Fix cub repository url (5236)
- Code and comment fixes for stream (5243)
- Use `cdef` instead of `cpdef` where appropriate (5274)
Documentation
- Fix `matmul` docstring (5174)
- Update list of wheels in README (5267)
- Add user guide for FFT (5272)
- Bump CuPy version in docs (5277)
- Add user guide for streams & events (5283)
- Fix deadlink to tutorial and reorder in README (5287)
- Document `ExternalStream` (5305)
- Add ROCm 4.2 support to install docs (5354)
- `user_guide/basic.rst`: various improvements (5356)
Installation
- Drop support for CUDA 9.2 & NCCL 2.4 (5214)
- Add upper restrictions to NumPy/SciPy versions (5225)
- Exclude Cython 3 from `setup_requires` (5273)
Tests
- Fix threading memory pool tests (5263)
- Temporarily remove the async pool test from `TestAllocator` (5308)
- Fix Windows CI kernel cache (5310)
- Tentatively skip unstable `MemoryPoolAsync` tests (5350)
- Xfail random generator tests for HIP (5355)
- Tentatively pin to SciPy 1.6 in Windows CI (5366)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
anaruse eternalphane leofang maxim-belkin povinsahu1909