This is the release of v4.0.0b1. See [here](https://github.com/cupy/cupy/milestone/14?closed=1) for the complete list of solved issues and merged PRs.
Announcements
As the version number indicates, we decided to name the next major version of CuPy **v4** instead of v3 to align the versioning with Chainer.
From this version, you can install compatible versions of Chainer and CuPy by specifying the same version number for both.
New features
- Add FFT functions under `cupy.fft` (477)
- Standard FFTs: `fft`, `ifft`, `fft2`, `ifft2`, `fftn`, `ifftn`
- Real FFTs: `rfft`, `irfft`, `rfft2`, `irfft2.`, `rfftn`, `irfftn`
- Hermitian FFTs: `hfft`, `ihfft`
- Helper routines: `fftfreq`, `rfftfreq`, `fftshift`, `ifftshift`
- Add `random.RandomState.tomaxint` (389)
- Add `sparse.csr_matrix.eliminate_zeros` and `sparse.coo_matrix.eliminate_zeros` (398)
- Add `linalg.tensorinv` (464)
- Add `unravel_index` (632, thanks Hakuyume!)
- Add `percentile` (643)
- Add `random.set_random_state` (704)
- Support ellipsis in `einsum` (410, thanks fukatani!)
- Support `dtype` argument in `random.randint` (567)
- Support `sparse.coo_matrix` initialization with other types of sparse matrices (573)
- Better CUDA support
- Change max dimension size of CUDA grid to make use of Compute Capability >= 3 (616, thanks anaruse!)
- Support CUDA stream with stream memory pool (306, 732)
- cuDNN grouped convolution (581, thanks anaruse!)
Bug fixes
- Fix indexing zero-dimensional array with boolean mask (580)
- Fix memory pool for multi-threaded applications (606)
- Setup Python’s builtin random state in `testing.fix_random` (640)
- Use v6 RNN API when using cuDNN7 to avoid incompatibility (660, thanks anaruse!)
- Set arch option for NVRTC, as the option is necessary on some GPUs (687, thanks grafi-tt!)
- Fix `var` and `std` to correctly handle `ddof` argument (693, thanks stevendbrown!)
- Fix advanced indexing to not alter the indices (713, thanks yuyu2172!)
- Fix bit-width issue in `random.RandomState.tomaxint` for Windows (658)
Improvements
- Performance improvements
- Improved performance of `concatenate` by using continuous copies (452, thanks uchida!)
- Optimize `sparse.csc_matrix.__mul__` (572)
- Cythonize cuDNN wrapper (512)
- Cythonize memory hook (722)
- Avoid implicit conversion into PyInt in `linear_launch` (673)
- Eliminate a redundant check in memory pool (731)
- Support uint32 sampling up to 0xffffffff in `random.RandomState.interval` (583)
- Fix `random.RandomState.seed` to only accept integer types (688)
- Fix typo in IndexError error message (681)
- Fix interface for cuDNN find algorithm APIs (624)
Examples
- Add an example of option pricing using Monte-Carlo simulation (493)
Documentation
- Update testing section in the contribution guide (671)
- Write note about environment variables for installation (534)
- Remove unrelated “see also” from `testing.numpy_cupy_raises` (634, thanks Hakuyume!)
- Fix reference page of `linalg` (650)
- Fix typo and heading in documentation (654)
- Add intersphinx mapping to Chainer (655)
- Fix a link in README.md to the contribution guide (628)
- Fix a link in README.md to the forum (752, thanks muupan!)
- Fix incorrect heading “CuPy” instead of “NumPy” in license page (656)
Test
- Move to PyTest
- Move to PyTest (623)
- Remove nose dependency in tests (672)
- Use pytest-warnings to check deprecated warnings (675)
- Fix NumPy warning for bool and complex operations (496)
- Use the latest Cython in Travis CI (597)
- Fix typo (631, thanks Hakuyume!)
- Fix doctest for Python 3.5 (644)
- Allow filtering test cases by number of GPUs with `CUPY_TEST_GPU_LIMIT` environment variable (662)
- Fix `test_einsum` (679)
- Ignore `ComplexWarning` in `numpy.pad` for NumPy 1.11 or older (689)
- Fix test of `where` to use different seeds for different arrays (703)
- Avoid deprecation warnings (718)
- Skip some dtypes in `test_einsum` (726)
- Skip `test_fft` for NumPy 1.9 or older (727)
- Skip some tests for old NumPy (744)
Others
- Improve version embedding (639)