This is the release note of v11.0.0a1. See [here](https://github.com/cupy/cupy/pulls?q=is%3Apr+is%3Aclosed+milestone%3Av11.0.0a1) for the complete list of solved issues and merged PRs.
We are running a [Gitter chat](https://gitter.im/cupy/community) for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Improved NumPy functions coverage (6078)
As series of NumPy routines have been proposed as a good-first-issue and as a result, an increasing number of contributors have sent pull requests to help increase the number of available APIs. An issue tracker with the currently implemented issues is available at 6078.
Add `cupyx.scipy.special` functions (5687)
Spherical harmonics, Legendre and Gamma functions are implemented using highly performant specific CUDA kernels. Thanks to grlee77!
Initial support for CUDA Graph API by means of stream capture API (4567)
This PR adds the ability of using the CUDA Graph API to greatly reduce the overhead of kernel launching. This is done by using the stream capture API, and example follows.
Thanks to leofang!
py
import cupy as cp
a = cp.random.randint(0, 10, 100, dtype=np.int32)
s = cp.cuda.Stream(non_blocking=True)
with s:
s.begin_capture()
a += 3
a = cp.abs(a)
g = s.end_capture() work is queued, but not yet launched
g.launch()
s.synchronize()
Support `__device__` function in CuPy JIT (6265)
The new interface `cupyx.jit.rawkernel(device=True)` is supported to define a CUDA device function.
py
from cupyx import jit
jit.rawkernel(device=True)
def getitem(x, tid):
return x[tid]
jit.rawkernel()
def elementwise_copy(x, y):
tid = jit.threadIdx.x + jit.blockDim.x * jit.blockIdx.x
y[tid] = getitem(x, tid)
The following CUDA code is generated from the above python code.
cpp
__device__ int getitem_1(CArray<int, 1, true, true> x, unsigned int tid) {
return x[tid];
}
extern "C" __global__ void elementwise_copy(CArray<int, 1, true, true> x, CArray<int, 1, true, true> y) {
unsigned int tid;
tid = (threadIdx.x + (blockDim.x * blockIdx.x));
y[tid] = getitem_1(x, tid);
}
Changes
New Features
- Support stream capture (4567)
- Add additional special functions (spherical harmonics, Legendre, Gamma functions) (5687)
- Add `cupy.asfarray` (6085)
- Add `cupy.trapz` (6107)
- Add `cupy.array_api.linalg` (6131)
- Add `cupy.mask_indices` (6156)
- Add `cupy.array_equiv` API. (6254)
- Add `cupy.cublas.syrk` and `cupy.cublas.sbmv` (6278)
- Add `cupy.vander` API. (6279)
- Add `cupy.ediff1d` API. (6280)
- Add `cupy.fabs` API. (6282)
- Add discrete cosine and sine transforms to `cupyx.scipy.fft` (6288)
- Add `logit`, `expit` and `log_expit` to `cupyx.scipy.special` (6300)
- Add `xlogy` and `xlog1py` to `cupyx.scipy.special`(6301)
- Add `tril_indices` and `tril_indices_from` API. (6305)
- Add `cupy.format_float_positional` (6308)
- Add `cupy.row_stack` API. (6312)
- Add `triu_indices` and `triu_indices_from` API. (6316)
Enhancements
- Raise better message when importing CPU array via DLPack (6051)
- Borrow more non-GPU APIs from NumPy (6074)
- Add more aliases for compatibility with NumPy (6075)
- Import more dtype aliases from NumPy (6076)
- Borrow indexing APIs from NumPy (6077)
- Apply upstream patch to `cupy.array_api` (6086)
- Compile cub/thrust with no unique symbol (6106)
- Support cuDNN 8.3.0 (6108)
- Support all advanced indexing (6127)
- Support CUDA 11.5.1 (6166)
- Support lambda function in `cupy.vectorize` (6170)
- Support eigenvalue solver 64bit API (6178)
- Support cuTENSOR 1.4.0 (6187)
- Make `matmul` support ufunc kwargs (6195)
- Alias NumPy error classes (6212)
- Support comparison to `None` and `Ellipsis` (6222)
- JIT: Fix if expr typing rule (6234)
- Support comparison with more objects (6250)
- JIT: Support `__device__` function (6265)
- More clear warning message (6283)
- Make streams hashable (6285)
- Check isinstance before comparison in `__eq__` (6287)
- Support cuDNN 8.3.2 (6314)
- Deprecate MachAr (support NumPy 1.22) (6188)
- Fix `cupy.linalg.qr` to align with NumPy 1.22 (6225)
- Change a parameter name in `percentile` and `quantile` to support NumPy 1.22 (6228)
Performance Improvements
- Avoid 64bit division for reduce register consumption (6019)
- Remove memory copy in matmul (6179)
Bug Fixes
- Detect repeated axis in reduction (5964)
- Fix `__all__` in `cupyx.scipy.fft` (6071)
- Fix `__getitem__` on Ellipsis and advanced indexing dimension (6081)
- Allow leading unit dimensions in copy source (6118)
- Always test broadcast in `copyto` (6121)
- Fix overloading ambiguity in ndimage filters (6162)
- Fix empty Cholesky (6164)
- Fix empty `solve` (6167)
- Allow `flip` ()-shaped array (6169)
- Handles infinities of the same sign in `logaddexp` and `logaddexp2` (6172)
- Fix 4675 on resolving TODO in 4198 (6197)
- Eigenvalue solver 64bit API on CUDA 11.1 (6201)
- Fix edge case compatibility in `cupy.eye()` (6208)
- Fix `linalg.eigh` and `linalg.eigvalsh` on empty inputs (6210)
- Fix overlapping `out` in `matmul` and `(tensor)dot` (6216)
- Fix `compile_with_cache` returning None (6232)
- Fixing index calculation for random constructor (6257)
- BUG: Fix the .T attribute in the `array_api` namespace (6289)
- Fix stream capture in ROCm (6296)
- Fix cuDNN installer not working (6337)
Code Fixes
- Remove `__all__` from `cupyx/scipy/*` (6149)
- Delete `from os import path` (6152)
- Remove legacy `cp.linalg.solve()` implementation (6161)
Documentation
- Add link to compatibility matrix (6055)
- Update upgrade guide (6058)
- Add v11 to compatibility matrix (6067)
- Exclude `kernel_version` from comparison table (6072)
- Doc: Add more footnotes to comparison table (6073)
- Add polynomial modules to comparison table (6082)
- Add CITATION.bib and update README (6091)
- Remove LLVM_PATH note on document (6093)
- Docs: Update linkcode implementation (6126)
- Update footnotes in comparison table (6142)
- Update conda-forge installation guide (6186)
- Revise Overview for CuPy v10 (6209)
- Docs: CentOS installation from source (6218)
- Fix `cupy.trapz` docstring (6239)
- Fix `eigsh` doc (6266)
- Add `cupy.positive` in API Reference (6274)
Installation
- Replace `distutils` with `setuptools` in Windows `cl.exe` detection (6025)
- Fix for cuDNN directory structure in Windows (6342)
Tests
- Fix `testing.multi_gpu` to add pytest marker (6015)
- CI: add link to ROCm projects in CI coverage matrix (6037)
- CI: use separate project for multi-GPU tests (6050)
- Fix CI result notification message format (6066)
- Fix CI cannot override cuSPARSELt/cuTENSOR version preinstalled (6084)
- Workaround DeprecationWarning raised from pkg_resources (6094)
- Fix missing `multi_gpu` annotation in tests (6098)
- Fix exception handling in cupyx.distributed (6114)
- Improve FlexCI test scripts (6117)
- CI: Add timeout to show_config (6120)
- Trigger FlexCI from GitHub Actions (6130)
- CI: Fix package override sometimes fails in CentOS (6141)
- CI: Need to update CUDA driver in cuda115.multi (6144)
- Add tests for `convolve2d` (6171)
- CI: Update limits to reduce cache size (6174)
- CI: Fix unquoted specifiers (6175)
- Support pre-release NumPy version in tests (6190)
- Remove XFAIL for XPASS tests on ROCm (6259)
- Tentatively pin to `setuptools<60` in Windows CI (6260)
- Fix cache key for github actions (6281)
- Use NVIDIA docker images for CUDA 11.5 (6303)
- Tentatively pin to CUDA Driver 495 (6310)
- Remove unused dtype parameterizing in `tril_indices` test (6322)
- Use `get_include` instead of `array_equiv` for fallback test (6333)
- CI: Add `cuda-slow` test in FlexCI (6335)
- CI: use CUDA docker images for CUDA Python CI (6336)
Others
- Add doc issue template (6294)
- Bump version to v11.0.0a1 (6344)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
akochepasov amanchhaparia asi1024 ColmTalbot emcastillo eternalphane grlee77 haesleinhuepf khushi-411 kmaehashi leofang okuta ptim0626 SauravMaheshkar shwina takagi thomasjpfan tom24d toslunar twmht WiseroOrb Yutaro-Sanada