This is the release note of v8.0.0a1. See [here](https://github.com/cupy/cupy/milestone/63?closed=1) for the complete list of solved issues and merged PRs.
Known packaging issues:
* CuPy build fails when using CUDA 8.0 on Windows (3076). Due to this issue, `cupy-cuda80` wheel packages for Windows are unavailable for this version. Linux or CUDA 9.0+ users are unaffected.
* ~Wheel packages for CUDA 10.2 (`cupy-cuda102`) are currently unavailable on PyPI. Packages will be published after getting [approval of the file size limit increase](https://github.com/pypa/pypi-support/issues/191).~ (resolved on 2020-02-21)
Highlights
This release adds support for CUDA 10.2 and NumPy 1.18.
CuPy 8.0.0a1 comes with several exciting new features such as better sparse matrix support, and for users who like to write their own CUDA kernels, there is the possibility of using grid synchronization in `RawKernel` and `RawModule` and allow to tune the block size for `ElementwiseKernels`. There are some noticeable performance improvements as well thanks to the extended support of CUB in several CuPy functions.
Changes without compatibility
- update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (2776)
- Fixed to follow Scipy returns empty slices are returned for such cases.
- Separate code and path arguments in `RawModule` (2784)
- Avoid device synchronization in `cupy.allclose` (2799)
- Changed `cupy.isclose` to return a 0-dim `cupy.ndarray` instead of a float value to avoid device synchronization.
- Remove `dtype` argument from `min`/`max` (2875)
- Rename arg of `isscalar` (2974)
- Renamed the argument of `cupy.isscalar` to `element`, previously named as `num`.
New Features
- Added min, max, argmin, argmax to sparse csr and csc matrices (2711, thanks dloney!)
- Add helpers to measure execution times (2740)
- Add `digitize` (2758)
- Support loading PTX in `cupy.RawModule` (2782, thanks leofang!)
- Fix `cupyx.scipy.ndimage.map_coordinates` for cases with coords > 2d (2813, thanks grlee77!)
- Detect synchronization (2819)
- Add `ptp` ndarray method and function (2859, thanks grlee77!)
- Add convex analysis ufuncs to `cupyx.scipy.special` (2861, thanks grlee77!)
- Allow `ElementwiseKernel` to set the block_size (2914)
- Support grid synchronization in `RawKernel` and `RawModule` (2925)
- Add `cupy.conjugate` and make `cupy.conj` its alias (2982)
- Add a keyword-only `plan` argument to `cupyx.scipy.fft.*` (2998, thanks leofang!)
Enhancements
- Support sorting complex arrays (2745, thanks leofang!)
- Fix slow import of cupy (2759, thanks cgohlke!)
- update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (2776, thanks grlee77!)
- Add `nogil` to CUB (2787, thanks y1r!)
- Avoid device synchronization in `cupy.allclose` (2799)
- Skip zero valued coefficients in cupyx.scipy.ndimage.convolve (2846, thanks grlee77!)
- Add CUB reduction support to `mean` (2860, thanks grlee77!)
- Sort type map in `_kernel.pyx` (2881)
- Make test helper decorators pdb-friendly (2888)
- Declare device synchronization at `runtime.free()` (2898)
- Ignore error when peer access is already enabled (2901, thanks leofang!)
- Add CUDA 10.2 support (2910, thanks ksangeek!)
- Show warning for cuFFT bug in `irfftn` (2922)
- Use cuTensor for `einsum` (2928)
- Improve error message for wrong number of arguments in elementwise kernels (2932)
- Use asynchronous copy in `cupy.copyto` (2942)
- `MemoryPointer.__repr__` (2981)
- Allow multiple axes in `expand_dims` (2992)
- Check size before accesing empty vectors data ptr (3025)
- Improve compatibility of `random.randint` (2828)
- Support 64 bit extent `randint` (2829)
- Disallow boolean subtraction (2874)
- Remove `dtype` argument from `min`/`max` (2875)
- Fix handling of dtypes in `cupy.mean` (2903, thanks grlee77!)
- Disallow boolean `negative` (2973)
- Rename arg of `isscalar` (2974)
- Fix `linspace(..., num=1, endpoint=False, retstep=True)` (2975)
Performance Improvements
- Avoid `numpy.can_cast` call to improve guess routine (2673)
- Improve caching in `ElementwiseKernel` (2688)
- Remove memory copy to improve memory range checking (2699)
- Avoid `can_cast` calling to reduce overhead (2704)
- Use `getrfBatched` in `linalg.slogdet` (2735)
- reduce overhead in calls to multi-dimensional FFTs. (2746, thanks grlee77!)
- Allow squashing f-contiguous axes for faster reduction (2822)
- Support CUB prefix sum & product (2919, thanks leofang!)
- Improve performance of element-wise `einsum` where no contraction is necessary (2960)
Bug Fixes
- Fix `true_divide` with dtype argument (2076)
- `keepdims` should always preserve all dimensions in CUB-based reductions (2725, thanks grlee77!)
- Update thrust::complex headers with a bug fix (2741, thanks leofang!)
- Separate code and path arguments in `RawModule` (2784)
- Avoid looking up null pointers' attributes (2802, thanks leofang!)
- Fix range used in `cupyx.scipy.ndimage` filter origin check (2805, thanks grlee77!)
- Detect interpreter shutdown for proper `__del__` behavior (2809)
- Fix `split` and `array_split` with indices overrun (2814)
- Fix `split` and `array_split` with unordered indices supplied (2815)
- Fix compilation error causes when thrust is enabled (2838)
- Fix `testing.shaped_random` for shape `()` (2870)
- Fix `argmin`/`argmax` `dtype` argument (2872)
- Fix `imag` for 0-size array (2886)
- Fix logic to check explicit `size` argument in `ElementwiseKernel` (2909)
- Sets the default value for `thread_local.linalg` if not defined (2915)
- Fix `cupy.cuda.cub.device_segmented_reduce()` not being used (2921, thanks leofang!)
- Fix complex type checks in `_correlate_or_convolve` (2923)
- Fix `ParameterInfo` as a cache key (2941)
- Avoid invalid in-place division in CUB-based mean (2943, thanks grlee77!)
- Fix empty vector access (3020)
- Fix `nvcc` command lookup (3028)
Code Fixes
- Use `intptr_t` for cuSOLVER handles (2718)
- Merge reduction implementations (2732)
- Rename and reorder private functions in `reduction.pxi` (2767)
- Avoid using PyThread API (2769)
- Remove unused `cuParamSetTexRef()` (2770, thanks leofang!)
- Separate reduction code from `_kernel.pyx` (2785)
- Refactor reduction code (2801)
- Refactor ops (2817)
- Separate `CArray` and family from `core.pyx` (2831)
- Add missing blank lines (2887)
- Readability fix in `memory.pyx` (2899)
- Clean up `_scalar.pyx` (2917)
- Enhance type and argument manipulation in elementwise and reduction kernels (2940)
- Remove intermediate aliases of `cupy.sort` (2944, thanks rushabh-v!)
- Silence sign comparison warnings (2949, thanks leofang!)
- Fix typos in comments (2978)
- Remove dependency to six (2980)
- A nit-picking code fix (2988)
- Rename `_op` variable in cub.pyx (3002)
- Remove code paths for unsupported Python versions (3004)
Documentation
- Fix docs of options argument in `RawKernel` and `RawModule` (2643)
- Document device synchronization (2798)
- Fix typo in scipy.fft docs (2804, thanks grlee77!)
- Fix the docstring format of `cupy.asarray` (2821, thanks leofang!)
- Update cuTENSOR version in docs (2948)
- Document `get_allocator` function (2953, thanks jakirkham!)
- Add NumPy 1.18 to installation guide (3005)
- Fix typo in note (3012, thanks Schoyen!)
- Add `cupy-cuda102` (3057)
Installation
- Do not let Python 2 users build CuPy v7+ (2766, thanks leofang!)
- Fix an issue that `cuComplex_bridge.h` is not installed (2984)
- Fix ROCm build errors (3071)
Examples
- Fix GMM example for matplotlib 3 (2996)
- Use `cupy.random` in kmeans example (3026)
Tests
- Test cuTENSOR v1.0.0 (2727)
- Use more stable input to test `linalg.matrix_power` (2788)
- Remove Python 3.4 matrix from Travis CI (2794)
- Drop ChainerCV's test in master branch. (2803)
- Refactor array testing decorators (2818)
- Fix decorator usage in tests (2820)
- Add f-contiguous reduction tests (2830)
- Test `ifloordiv` with numpy 1.18 (2852)
- Fix `test_helper.py` for NumPy 1.18 (2883)
- Avoid 0s in the diagonal of `TestSolveTriangular` inputs (2927)
- Add tests for size argument with no input (2931)
- Print installed packages in pytest (2979)
- Make `testing.parameterize` pdb-friendly (3024)
- Require `scipy` in `test_gmm` (3048)
Others
- Allow install without thrust (2730)
- Add Mergify configuration file (2894)
- Make `cupyx.time.repeat` experimental (2897)
- Make `cupyx.allow_synchronize` experimental (2947)
- Some fixes to `.pfnci/script.sh` (3041)
- Set `CUPY_CI` environment variable in Travis CI and AppVeyor (3058)
- Bump version to v8.0.0a1 (3069)