Mxnet

Latest version: v1.9.1

Safety actively analyzes 621363 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

1.8.0

Not secure

Features
CUDA Graphs
- Enable CUDA Graphs for TRT (19184)
- CUDA graphs support (19142)
- Update cudnn version. (19375)
CUDA 11 Support
- Update CUB and include it only for CUDA < 11 18799' (18975)
- Add new CI pipeline for building and testing with cuda 11.0. (19149)
- Enable CUDA 11.0 on nightly development builds (19314)
TensorRT
- TensorRT: add int8 with calibration (19011)
- Add TRT verbose mode (19100)
- Backporting TensorRT-Gluon Partition API (and TensorRT 7 support) (18916)
- Backport TRT test update 19296 (19298)
OneDNN
- Upgrade to oneDNN v1.6.3 (19153) (19161)
- Update oneDNN to official v1.6 release (18867) (18867)
- Upgrade to oneDNN v1.6 (18822)
- bumped version to v1.6.5 (19437)
- Upgrade to oneDNN v1.7 (19560)
IntGemm
- Backport of intgemm 17559 (19099)
- Change intgemm to a submodule instead of fetch. (19406)
Subgraph API
- Backport Fix for duplicate subgraph inputs/outputs (16131) (19112)
Extensions
- Backport 19103 (19117)
- Backporting 19016 (19069)
- Backport: Change Partition API's options_map to std::unordered_map 18929 (18964)
- Backporting 18779 to v1.x (18894)
- Backport extension bug fixes to v1.8.x (19469) (19504)
- fix for MX_ERROR_MSG namespace (19756)
ONNX
- Update onnx support to work with onnx 1.7.0 with most CV models (19017)
Large Tensor
- Fix linalg_potri and linalg_potrf operators for large tensor. (18752)
- Add forward, backward test for linalg.gemm2 (18784)
- Add large matrix tests for linalg ops: det, inverse, trsm, trmm (18744)
- Add Large Tensor Test for linalg_syrk (18782)
- Add Large Dim Checks for linalg Operators (18816)
- Add forward & backward linalg.gemm test for large size (18825)
- Adding error message when attempting to use Large tensor with linalg_syevd (18807)
Website Improvements
- v1.8 website patch (19212)
- Automate website artifacts uploading (19244)
Documentation
- Fix mxnet.test_utils.check_numeric_gradient documentation (19060)
- Update windows_setup.md (18874)
License
- Stop packaging GPL libquadmath.so (19055)
- Remove mention of nightly in pypi (18635) (18884)
- Mkldnn header fix v1x for nightly binaries (18797)
- Update LICENSE for all submodules. (19440)
- LICENSE update (19443)
- Update LICENSE (19704) (19707)
CI Improvements
- Upgrade unix gpu toolchain (18186) (18785)
- Fix CI in v1.x branch (18907)
- Remove extra --build-arg causing docker command to fail. (19412)
- Fix CI builds failing due to invalid GPG keys. (19377) (19388)
Bug Fixes
- Backport 19656 - fix R builds (19658)
- remove cleanup on side threads (19557)
- Don't use namespace for pow() function, since it is built into cuda math library, and cast the second argument so it will find an acceptable form. (19533)
- Remove temporary fix for RNN (19451)
- backport 19393 to v1.8.x (19398)
- Fix SoftReLU fused operator numerical stability (17849) (19390)
- Temporary fix for RNN with oneDNN seg faults/core dumps (19308)
- Fix MKLDNN BatchNorm with even number of channels (19150) 19299 19425 (19428)
- Relaxing type requirements for broadcast_like (17977) (19448)
- Backporting: Fixed setting attributes in reviewSubgraph (19278)
- Include oneDNN gemm fix (19251)
- Fix for breaking change introduced in 17123 when batch_axis=0 (19283)
- Backport PR 19272 to v1.8.x (19273)
- Backport PRs in v1.7.x missing from v1.x to v1.8.x (19262)
- Delete executor before reallocating it memory (19222)
- Nightly Large Tensor test cherrypicks (19194) (19215)
- Tweeking syntax to be closer to other tests (19186) (19206)
- ElementWiseSum fix for oneDNN (18777) (19200)
- Fix flaky intgemm test in v1.8.x too (19204)
- Revert "Fix memory leaks in Gluon (18328) (18359)" (19181)
- Improve environment variable handling in unittests (18424) (19173)
- Backport Unittest tolerance handling improvements (18694). Also test seeding (18762). (19148)
- Fix the error of gradient of np.pad (19044) (19167)
- Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (19123) (19158)
- SymbolBlock.imports ignore_extra & allow_missing (19156)
- Fix race condition in NaiveEngine::PushAsync (19108) (19122)
- Empty list cannot be cleared issue fixed. (14882)
- Update base_module.py (19096)
- Fix block.export (17970) (19075)
- Support for fp16 in SpM x DnsM on GPU (18930) (19074)
- Backport of Fix LeakyRelu behaviour on empty input (18934) (19009)
- Get rid of monkey patching in LossScaler overflow handling (18959) (18973)
- Remove upper bound (18857) (18910)
- Fix gelu to use erf based algorithm (18827) (18946)
- Cherry-pick 18635 to v1.7.x (18935) (18945)
- Backporting backward inference from 2.x 18348 and 18378 (18895)
- Backport Invoke mkldnn and cudnn BatchNorm when axis != 1 to v1.7.x (18676) (18890)
- Bump version to 1.8.0 (18899)
- Fixing ONNX spatial export for batchnorm (17711) (18846)
- Fix softmax, logsoftmax failed on empty ndarray (18602) (18708)
- Add unit tests for potri and potrf backward and check output shape in unit tests. (18803)
- Add syrk test shape check (18812)
- Back port optimization to broadcast_axis to MXNet1.x (18773)
- Fix crash when accessing already destructed static variables (18768) (18778)
- Cherrypick 18677 18713 (18742)

1.7.0

New features
MXNet Extensions: custom operators, partitioning, and graph passes

Adds support for extending MXNet with custom operators, partitioning strategies, and graph passes. All implemented in a library easily compiled separately from the MXNet codebase, and dynamically loaded at runtime into any prebuilt installation of MXNet.

- fix for number of inputs/outputs for backward custom ops (17069)
- Enhancements for custom subgraph op (17194)
- Disable flaky test_custom_op_fork (17481)
- fix custom op makefile (17516)
- Update CustomOp doc with changes for GPU support (17486)
- [WIP] MXNet Extensions enhancements (17885) (18128)
- Dynamic subgraph property (17034)
- Dynamic subgraph property doc (17585)
- [1.7] Backport MXNet Extension PRs (17623, 17569, 17762) 18063 (18069)

OpPerf utility enabled in the binary distribution
- [OpPerf] Add Neural network loss ops (17482)
- [OpPerf] Fixes the issue when you pass NDArray to run_perf_test (17508)
- [OpPerf] Fix markdown for native profile and add profile param in function desc (17494)
- [OpPerf] Add Indexing ops (16253)
- [OpPerf] Implement remaining random sampling ops (17502)
- [OpPerf] Implement remaining GEMM ops (17501)
- [OpPerf] Implement all linalg ops (17528)
- [OpPerf] Fixed native output ordering, added warmup & runs command line args (17571)
- [OpPerf] Add norm, cast ops, remaining optimizer ops (17542)
- [OpPerf] Fixed Python profiler bug (17642)

MKL-DNN
MKL-DNN as the default CPU backend in binary distribution
Branding change to DNNL
- Upgrade MKL-DNN dependency to v1.1 (16823)

Support bfloat16 datatype
- Add bfloat16 floating-point format support based on AMP (17265)

New operators
- [New Op] Add deformable conv v2 (16341)
- Add MXNet Ops for fast multihead attention (16408)
- Support boolean elemwise/broadcast binary add, multiply and true_divide (16728)
- add gammaln, erf, erfinv (16811)
- add aligned roi introduced in Detectron2 (16619)
- Implement atleast_1d/2d/3d (17099)
- Interleaved MHA for CPU path (17138)
- Lamb optimizer update (16715)
- Quantized Embedding (16691)
- Add gelu fuse ops (18082) (18092)

Feature improvements
Numpy compatible interface(experimental)
- [NumPy] NumPy support for linalg.inv (16730)
- add numpy op nan_to_num (16717)
- [Numpy] Add sampling method for bernoulli (16638)
- Fix numpy-compatible mean output type for integer inputs (16792)
- [Numpy] Fix collect_params().zero_grad() in gluon numpy interface (16716)
- [Numpy][Operator] 'where' Implementation in MXNet (16829)
- [Numpy] Random.normal() with backward (16330)
- Add OP diag [numpy] (16786)
- Mixed precison binary op backward (use in) for numpy (16791)
- add numpy op diagflat [numpy] (16813)
- add op bitwise_or [numpy] (16801)
- [Numpy] Implementation npx.{sample}_n (16876)
- [Numpy] Add NumPy support for np.linalg.det and np.linalg.slogdet (16800)
- Op Unravel_index PR [Numpy] (16862)
- [Numpy] Fix imperative basic indexing in numpy (16902)
- [Numpy] Basic indexing in symbolic interface of DeepNumpy (16621)
- [Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (16804)
- [Numpy] Implement numpy operator 'average' (16720)
- [Bugfix] [Numpy] Add `kAddTo` and kNullOp to Transpose (16979)
- set rtol = 1e-2 and atol = 1e-4 when dtype == np.float32 in test_numpy_op.py:test_np_linalg_solve (17025)
- Op_Diagonal [Numpy] (16989)
- numpy bincount (16965)
- [numpy] add op bitwise_not (16947)
- [Numpy ]Modify np.random.shuffle to enable inplace by default (17133)
- [numpy] fix argsort typo (17150)
- [numpy] add op round (17175)
- [numpy]Add op delete (17023)
- [numpy] add op flipud, fliplr (17192)
- [CI] Re-enable testing with numpy 1.18 (17200)
- [Numpy] Add broadcast_to scalar case (17233)
- [Numpy] Random.gamma() implemented (16152)
- [Numpy] add row_stack (=vstack) (17171)
- [Numpy] Add infra for performing constraint check (17272)
- porting numpy-compatible hstack to master and add dstack for interoperability (17030)
- adding asnumpy() to output of gather(implicitly called) to fix gather test in large vector and tensor tests (17290)
- [numpy] add op random.exponential (17280)
- [NumPy] Add NumPy support for norm (17014)
- [numpy]add op random.lognormal (17415)
- Add numpy random weibull operator (17505)
- [numpy] Add np.random.pareto and np.random.power (17517)
- [Numpy] Add sort op (17393)
- [numpy]implement exponential backward (17401)
- [Numpy] Where operator scalar version (17249)
- [numpy] add op matmul (16990)
- [numpy]add op random.logistic, random.gumbel (17302)
- [numpy][Do Not Review]add op insert (16865)
- [numpy] add op random.rayleigh (17541)
- [numpy] add fallback ops (17609)
- [numpy] add op pad (17328)
- [numpy] add op fabs, sometrue, round_ (17619)
- Add arange_like to npx (16883)
- try to move shape_array to npx (16897)
- support np.argsort (16949)
- np.broadcast_to extension (17358)
- support bitwise_and (16861)
- fix np.argmax/argmin output data type (17476)
- add op random.beta (17390)
- add op isnan isinf (17535)
- array_split pr (17032)
- Mixed data type binary ops (16699)
- randn implemented (17141)
- refactor and reduce float types for some functions, also add bitwise_xor (16827)
- any/all (17087)
- amax (17176)
- fix format (17100)
- add op empty_like, add nan_to_num to dispatch (17169)
- handle array_like fill_value for np.full; add unit test coverage (17245)
- add np.amin (17538)
- add npx.gather_nd (17477)
- add np.random.chisquare (17524)
- add polyval (17416)
- add isposinf isneginf isfinite (17563)
- Support broadcast assign for `npi_boolean_mask_assign_tensor` (17131)
- Implement Weibull backward (17590)
- support np.dsplit, fix some error msgs and corner cases for hsplit and vsplit, add interoperability tests for h/v/dsplit (17478)
- add np.product (17489)
- Implement np.random.pareto backward (17607)
- add np.ediff1d (17624)
- more support for boolean indexing and assign (18352)
- Fix einsum gradient (18482)
- [v1.7.x] Backport PRs of numpy features (18653)
- [v1.7.x] backport mixed type binary ops to v1.7.x (18649)
- revise activations (18700)

Large tensor support
- [Large Tensor] Add support to Random Sample & Pdf ops (17445)
- [Large Tensor] Add LT support for NN optimizers and 1 activation function (17444)
- [Large Tensor] Fixed SoftmaxActivation op (17634)
- [Large Tensor] Fixed col2im op (17622)
- [Large Tensor] Fixed Spatial Transformer op (17617)
- [Large Tensor] Fix ravel_multi_index op (17644)
- Sparse int64 Large tensor support (16898)
- Re-Enabling Large Tensor Nightly on GPU (16164)
- enabling build stage gpu_int64 to enable large tensor nightly runs (17546)
- [Large Tensor] Fixed Embedding op (17599)

MKL-DNN enhancement
- MKLDNN FC : Add error info when mkldnn fc bias dimension is wrong (16692)
- [MKLDNN] support mkldnn gelu (16710)
- [MKLDNN] Fix int8 convolution/fc bias overflow (16734)
- [MKLDNN] use dim_t instead of int in slice/transpose operators (16737)
- Mkldnn fullyConnect bwd bug fix (16890)
- Revert Mkldnn fullyConnect bwd bug fix (16890) (16907)
- [MKLDNN] Use MKLDNNRun (16772)
- [MKLDNN] mkldnn RNN operator enhancement (17075)
- [MKLDNN] enable MaxPooling with full pooling convention (16860)
- update mkldnn to v1.1.2 (17165)
- improve mkldnn doc (17198)
- [MKLDNN] Fix _copyto (17173)
- [MKLDNN] Support channel wise quantization for FullyConnected (17187)
- fixed seed for mkldnn test (17386)
- add mkldnn softmax backward (17170)
- cmake: copy dnnl headers to include/mkldnn (17647)
- [mkldnn]Mkldnn bn opt backport from master to 1.7x (18009)
- [v1.x] Update 3rdparty/mkldnn remote URL and pin to v1.3 (17972) (18033)
- [v1.x] backport 17900 [MKLDNN] support using any format in pooling backward (18067)
- Static link MKL-DNN library (16731)
- Add large tensor nightly tests for MKL-DNN operators (16184)
- [MKL-DNN] Enable and Optimization for s8 eltwise_add (16931)
- [MKL-DNN] Enhance Quantization Method (17161)
- Static Build and CD for mxnet-cu102/mxnet-cu102mkl (17074)
- MKL-DNN RNN backward path enhancement (17183)
- cmake: check USE_OPENMP and pass proper MKL-DNN build flags (17356)
- update mkl to 2020.0 (17355)
- Enable MKL-DNN by default in pip packages (16899)
- Enable MKL-DNN FullyConnected backward (17318)
- Softmax primitive cache and in-place computation (17152)
- boolean_mask_assign with start_axis (16886)
- use identity_with_cast (16913)
- change error tolerance for bf16 bn (18110)
- [v1.x] Backport 17689 and 17884 to v1.x branch (18064)
- refactor codes and add an option to skip/check weight's version to reduce overhead (17707) (18039)
- [v1.x] Backport 17702 and 17872 to v1.x branch (18038)

TensorRT integration
- Update TensorRT tutorial to build-from-source. (14860)
- Minor fix, use RAII for TensorRT builder and network object (17189)

Quantization
- Add silent option to quantization script (17094)

Profiler
- Implemented final two binary ops, added default params for functionality (17407)
- Implement remaining nn_activation ops in opperf (17475)
- Implement all miscellaneous ops (17511)
- Implement remaining nn_basic ops in opperf (17456)

ONNX
- Fix memory leak reported by ASAN in NNVM to ONNX conversion (15516)
- ONNX export: Gather (15995)
- ONNX export: Slice op - Handle None value for ends (14942)

New models
- [Model] Implement Neural Collaborative Filtering with MXNet (16689)
- Further optimization for NCF model (17148)
- HMM Model (17120)

Operator improvements
- Faster GPU NMS operator (16542)
- [MXNET-1421] Added (CuDNN)BatchNorm operator to the list of mirrored operators (16022)
- dynamic custom operator support (15921)
- Multi Precision Lamb Update operator (16885)
- Add im2col and col2im operator (16502)
- Quantized Elemwise Mul Operator (17147)
- Enhancements for MXTensor for custom operators (17204)
- Enabling large tensor support for binary broadcast operators (16755)
- Fix operators lying about their number of inputs (17049)
- [WIP] Fallback mechanism for mx.np operators (16923)
- Dynamic custom operator GPU support (17270)
- Fix flaky - test_operator_gpu.test_np_insert (17620)
- MXNet FFI for Operator Imperative Invocation (17510)
- [MXNET-978] Higher Order Gradient Support `logp1`, `expm1`, `square`. (15416)
- [MXNET-978] Higher Order Gradient Support `arcsin`, `arccos`. (15515)
- [MXNET-978] Higher Order Gradient Support `rsqrt`, `rcbrt`. (15476)
- gather_nd: check bound and wrap negative indices (17208)
- Remove dilation restriction for conv3d (17491)
- Fix storage type infer of softmax backward (17576)
- Fix and optimize handling of vectorized memory accesses (17767) (18113)
- Cherry-pick of 17995 and 17937 to 1.x branch (18041)
- No tensor cores for fp32 interleaved attention, remove div by 8 restriction (17994) (18085)
- GPU gemms true fp16 (17466) (18023)
- Add support for boolean inputs to FusedOp (16796)

Bug fixes
- [BUG FIX] Always preserve batch dimension in batches returned from dataloader (16233)
- Fix SliceChannel Type inference (16748)
- change _generate_op_module_signature get_module_file open with encoding=utf-8,it fix some encode error in Chinese windows system. (16738)
- Fix rtrue_divide grad (16769)
- fix inv test flakiness using random matrices generated by SVD (16782)
- [MXNET-1426] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan (16234)
- Fix (16781)
- fix expand_dims fall back when input's ndim is 0 (16837)
- [fix] missing input log higher order. (15331)
- Fix IndentationError in setup.py (16857)
- Fix a few np issues (16849)
- Fix InferAttr/InferShapeAttr not calling inference for all nodes in a graph (16836)
- fix for enable model parallelism for non-fp32 data (16683)
- Fix NDArrayIter iteration bug when last_batch_handle='pad' (16166)
- Fix crashing on Windows in ObjectPool ~ctor (16941)
- Fix NDArrayIter cant pad when size is large (17001)
- fix axis=-1 bug (17016)
- Fix CUDNN detection for CMake build (17019)
- Fix omp assert issue (17039)
- mshadow: fix vector access (17021)
- [BUGFIX] Fix race condition in kvstore.pushpull (17007)
- [BUGFIX] Fix trainer param order (17068)
- [BugFix] fix filter channel calculation in ModulatedDeformableConvV2 (17070)
- Fix reshape interoperability test (17155)
- fix norm sparse fallback (17149)
- fix py27 quantization (17153)
- fix int8 add ut (17166)
- Fix and clean up Ubuntu build from source instructions (17229)
- fix lstm layer with projection save params (17266)
- Fix rendering of ubuntu_setup.md codeblocks (17294)
- Fix 17267, add expected and got datatype for concat error msgs (17271)
- [BUGFIX] fix model zoo parallel download (17372)
- fix use int8, uint8, int32, int64 (17188)
- [Fix] Add ctx to the original ndarray and revise the usage of context to ctx (16819)
- Fix ndarray indexing bug (16895)
- fix requantize flaky test (16709)
- Initial checkin (16856)
- Fix flakey test_ndarray.py:test_reduce (17312)
- fix flaky test: boolean index and fix bugs (17222)
- Fix IOT Devices section of Get Started page (17326)
- add logic for no batch size while getting data arrays from executors (17772) (18122)
- Fix reverse shape inference in LayerNorm (17683)
- fix full and full_like when input is boolean (17668)
- Fix MBCC inference (17660)
- Additional fix for vector access. (17230)
- Cherrypick Fix nightly large_vector test caused by incorrect with_seed path (18178) (18220)
- [1.7] Pass args fix3 (18237)
- fixing batch_norm and layer_norm for large tensors (17805) (18261)
- [1.7.x] Backport of LSTM and GRU fix (17898) and RNN op (17632) (18316)
- [v1.7.x] backport 18500 - [Bug Fixed] Fix batch norm when grad_req is `add` (18517)
- Fix the monitor_callback invalid issue during calibration with variable input shapes (18632) (18703)

Front end API
- Fix the problem in printing feature in c++ API examples : feature_extract (15686)
- updating MXNet version to 1.6.0 in base.h for C APIs (16905)
- [API] unified API for custom kvstores (17010)
- fix parameter names in the estimator api (17051)
- adding docs for 64bit C APIs of large tensor (17309)
- Add API docs to INT64 APIs (16617)

Gluon
- [Quantization] Enhance gluon quantization API (16695)
- [Gluon] Improve estimator usability and fix logging logic (16810)
- Fix test_gluon.py:test_sync_batchnorm when number of GPUS > 4 (16834)
- [Gluon] Update contrib.Estimator LoggingHandler to support logging per batch interval (16922)
- Include eval_net the validation model in the gluon estimator api (16957)
- Fix Gluon Estimator nightly test (17042)
- [MXNET-1431] Multiple channel support in Gluon PReLU (16262)
- Fix gluon.Trainer regression if no kvstore is used with sparse gradients (17199)
- refactor gluon.utils.split_data() following np.array_split() (17123)
- Add RandomApply in gluon's transforms (17242)
- Partitioning Gluon HybridBlocks (15969)
- Random rotation (16794)
- bump up atol for gradient check (16843)
- Extend estimator.evaluate() to support event handlers (16971)
- [MXNET-1438] Adding SDML loss function (17298)

Symbol
- Add unoptimized symbol to executor for sharing (16798)
- Enforces NDArray type in get_symbol (16871)
- Fix 17164 symbolblock with BatchNorm inside during cast to fp16 (17212)
- autograd video and image link fixes and removing symbol tutorials (17227)
- Fix CosineEmbeddingLoss in when symbol API is used (17308)
- Fix Horovod build error due to missing exported symbols (17348)
- Update symbol.py (17408)
- update symbol to json (16948)

Language Bindings
Python
- Python 2 compatibility fix in base.py
- adding stacktrace in Jenkinsfile_utils.groovy to inspect Python2 failure cause in CI (17065)
- Fix image display in python autograd tutorial (17243)
- Fix Python 3 compatibility in example/speech_recognition (17354)
- Stop testing Python 2 on CI (15990)
- Docs: Python tutorials doc fixes (17435)
- pin python dependencies (17556)
- Python 2 cleanup (17583)

C/C++
- Simplify C++ flags (17413)

R
- fix R docs (16733)
- [R package] Make R package compilation support opencv 4.0 (16934)
- Support R-package with cmake build and fix installation instructions (17228)
- Fix R-package/src/Makevars for OpenCV 4 (17404)
- Fix typo in Install the MXNet Package for R (17340)

Clojure

Julia
- [MXNET-1440] julia: porting `current_context` (17142)
- julia: porting `context.empty_cache` (17172)
- pin Markdown version to 3.1 in Julia doc build (17549)

Perl
- [Perl] - ndarray operator overloading enhancements (16779)
- MXNET-1447 [Perl] Runtime features and large tensor support. (17610)

Scala
- Fix scala publish & nvidia-docker cublas issue (16968)
- Fix publishing scala gpu with cpu instance (16987)
- swap wget to curl in Scala scripts (17041)
- [Scala/Java] Remove unnecessary data slicing (17544)
- quantile_scalar (17572)
- Fix get_started scala gpu (17434)
- Fix MBCC & scala publish pipeline (17643)
- Bump up additional scala 1.x branch to 1.7.0 (17765)

Performance improvements
- Build.py improvement (16976)
- Improvements to config.cmake (17639)
- [Done] BilinearResize2D optimized (16292)
- Speed fused_op compilation by caching ptx and jit-compiled functions (16783)
- Improve the speed of the pointwise fusion graph pass (17114)
- broadcast_axis optimization (17091)
- Optimize AddTakeGrad Tensor Sum (17906) (18045)

Example and tutorials
- Add CustomOp tutorial doc (17241)
- Correct the grammar in 1-ndarray tutorial (17513)

Website and documentation
- Website edits (17050)
- [Website 2.0] Nightly Build for v1.x (17956)
- [docs] Fix runtime feature detection documentation (16746)
- Adding user guidelines for using MXNet built with Large Tensor Support (16894)
- fix typo and doc (16921)
- large tensor faq doc fix (16953)
- [DOC] Add a few tips for running horovod (17235)
- Update NOTICE to fix copyright years (17330)
- [DOC] Fix tutorial link, and better error msg (17057)
- doc fix for argmax & argmin (17604)

CI/CD
- support mixed-precision true_divide (16711)
- Try to fix CI (16908)
- mixed precision for power (16859)
- Fix desired precision for test_ndarray.py:test_reduce (16992)
- [reproducibility] multi_sum_sq review, AtomicAdd removal (17002)
- fix precision problem in linalg_solve, linalg_tensorinv, linalg_cholesky op test (16981)
- grouping large array tests based on type and updating nightly CI function (17305)
- [LICENSE] fix cpp predcit license (17377)
- [CI] Fix static build pipeline (17474)
- skipping tests that cannot fit in nightly CI machine corrected imports (17450)
- Update Windows CI scripts to use syntax compatible with Win 2019 server powershell. (17526)
- Fix Non-ASCII character in docstring (17600)
- [CI] Follow redirects when downloading apache-maven-3.3.9-bin.tar.gz (17608)
- [CI] Upgrade sphinx and autodocsumm (17594)
- Reduce load on CI due to excessive log flood (17629)
- Enable users to specify BLAS (17648)
- [CI] Add AMI id to instance info on builds (17649)
- [v1.7.x] Backport staggered CI builds (17999 & 18119) (18142)
- [v1.7.x] Backport 17177 to 1.7.x (Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator) (18147)
- Fix formatting and typos in CD README.md (16703)
- [CD] dynamic libmxet pipeline fix + small fixes (16966)
- [CD] enable s3 publish for nightly builds in cd (17112)
- [CD] fix CD pipeline (17259)
- [CD] update publish path (17453)
- fix CD and remove leftover from 15990 (17551)
- Fix nightly build (16773)
- Update pypi_publish.py to disable nighlty build upload to Pypi (17082)
- [v1.7.x] update jetson dockerfile to support CUDA 10.0 (18339)
- Remove manually created symbolic link to ninja-build (18437) (18456)
- Increase staggered build timeout to 180 min (18568) (18585)

License
- Don't relicense FindCUDAToolkit.cmake (17334)
- fix license and copyright issues (17364)
- Update ps-lite LICENSE (17351)
- remove unused file with license issue (17371)
- Update LICENSE for fonts (17365)
- license np_einsum file under bsd (17367)
- Update Apache License for mshadow (18109) (18134)
- Julia: remove downloading of the non-ASF binary build (18489) (18502)
- Add missing license header for md files (18541)
- [v1.7.x]License checker enhancement (18478)

Miscellaneous changes
- Link fixes4 (16764)
- Refactoring names for mxnet version of nnvm to avoid conflicting with the original tvm/nnvm. (15303)
- minor typo fix (17008)
- Add micro averaging strategy to pearsonr metric (16878)
- introduce gradient update handler to the base estimator (16900)
- fix latency calculation and print issue (17217)
- add inference benchmark script (16978)
- change the wording and log level to be more in line with the general use (16626)
- Updated logos. (16719)
- Pinning rvm version to satisfy Jekyll build (18016)
- Workaround gnu_tls handshake error on Ubuntu 14.04 Nvidia Docker (18044)

1.6.0

Not secure

Deprecation of Python 2

MXNet community [voted](https://lists.apache.org/thread.html/r3a2db0f22a1680cc56804191446fef2289595798ca19fd17de1ff03e%40%3Cdev.mxnet.apache.org%3E) to no longer support Python 2 in future releases of MXNet. Therefore, MXNet 1.6 release is going to be the last MXNet release to support Python 2.

New features

NumPy compatible interface and using TVM to generate operators

NumPy has long been established as the standard math library in Python, the most prevalent language for the deep learning community. With this library as the cornerstone, there are now the largest ecosystem and community for scientific computing. The popularity of NumPy comes from its flexibility and generality.

In 14253, the MXNet community reached consensus on moving towards a NumPy-compatible programing experience and committed to a major endeavor on providing NumPy compatible operators.

The primary goal of the projects below is to provide the equivalent usability and expressiveness of NumPy in MXNet to facilitate Deep Learning model development, which not only helps existing deep learning practitioners but also provides people in the existing NumPy community with a shortcut for getting started in Deep Learning. The efforts towards this goal would also help a secondary goal, which is to enable the existing NumPy ecosystem to utilize GPUs and accelerators to speed up large scale computation.

- Infra to use tvm write op kernels (15550)
- fix boolean_mask for 0-size output (15731)
- fix tvm cmake (15781)
- Numpy-compatible Infra (15581)
- [MXNET-1206] Support NDArray indexing with None and Ellipsis (13143)
- numpy-compatible sum (15810)
- [Numpy] Numpy compatible slicing (15798)
- Numpy Tensordot and Dot Operator (15820)
- numpy linspace (15852)
- tvm infra for op attrs (15854)
- Port several np ops to master (15867)
- numpy-compatible split upstream (15841)
- Numpy-compatible concatenate upstream (15894)
- Numpy-compatible stack upstream (15842)
- [Numpy] Numpy behavior random.uniform() (15858)
- Tvm broadcast backward (15938)
- np elemwise unary ops upstream (15831)
- [Numpy] random.randint() implemented (15956)
- Refines NDArray indexing and adds numpy ndarray indexing [READY FOR REVIEW] (15942)
- Port ops from np branch (16018)
- numpy-compatible cumsum upstream (15924)
- NumPy-compatible infrastructure on Gluon (16024)
- [OP] Support range as advanced index for ndarrays (16047)
- Numpy compatible max min (16046)
- NumPy-compatible Mean, Std and Var (16014)
- Add fluent methods mean, std, var for ndarray (16077)
- numpy multinomial op (15878)
- add numpy operator remainder (16080)
- [Numpy] Random.choice implemented (16089)
- Fix sample.normal shape inference
- Numpy add numpy op indices (15837)
- [Numpy] Numpy copysign (15851)
- numpy operator ravel, derive from reshape (16016)
- Add __array_function__
- Improved error mesages
- Fix np.choice
- add exception check for numpy reshape (16180)
- [Numpy] Numpy behavior normal distribution (16109)
- fix multinomial bug on gpu (16204)
- [Numpy] Differentiable svd (15795)
- add epsilon to sum(pvalue) upperbound (16211)
- np compatible vstack (15850)
- Numpy add numpy op roll (15902)
- add numpy compatible trace (16008)
- add numpy op hanning, hamming, blackman (15815)
- [Numpy]flip (15819)
- numpy operator around (16126)
- numpy operator arctan2 (15890)
- numpy operator nonzero (15838)
- numpy operator hypot (15901)
- tvm numpy operator deg2rad && rad2deg (16015)
- numpy op unique
- try to fix bug
- fix memory bug and disable some test
- fix according to review
- Numpy operators: `lcm`, `tril`, `identity` and `take` (16264)
- [numpy] Cosmetic improvement on mxnet.numpy builtin op signature in documentation (16305)
- Disable Pylint false error in numpy_op_signature (16370)
- boolean_mask_assign operator for future boolean indexing (16361)
- Implements ldexp. (15845)
- Numpy Operators: Inner, Outer, vdot (15846)
- Numpy det and slogdet operators (15861)
- Fix random op signature
- fix choice signature
- add raise test for shape
- Add boolean ndarray (15940)
- global numpy shape flag (16335)
- numpy-compatible histogram (16266)
- [Numpy] Numpy compatible dstack (15871)
- numpy eye op (16132)
- Numpy compatible vsplit; minor changes to split (15983)
- add numpy op logspace (15825)
- add numpy op bitwise_xor, hsplit, moveaxis, rot90 (16257)
- Fix optimizer bug for np attribute (16494)
- Tests of NumPy interoperability (16469)
- improve unary and binary operator handling and refactor tests (16423)
- [DOC] Fix numpy op doc (16504)
- [Numpy] More numpy dispatch tests (16426)
- [Numpy] einsum (15911)
- Add test pipeline for USE_TVM_OP=OFF on Unix (16450)
- Numpy dispatch test of ...... (16422)
- setup and concatenate, copy, expand_dims, expm1 (16493)
- add sum for boolean type in mainline (16436)
- [Numpy] SVD outputs tuple (16530)
- numpy op doc: max, min, prod (16506)
- add interface for rand
- Fix numpy bugs (16537)
- pickler override for np ndarrays (16561)
- [numpy]op test in new pattern (16556)
- Enforce adding documentation for builtin numpy operators (16575)
- [Numpy] Support N_D(N>=3) batch_dot (16586)
- [Numpy] Loading numpy-incompatible NDArray in numpy-compatible mode (16597)
- Fix index overflow bug in einsum (16589)
- add npx reshape (16640)
- add type switch to weight tensor (16543)
- numpy doc enhancement (16637)
- Infra for tvm op runtime dispatch (16100)
- [NumPy][Operator] NumPy operator `may_share_memory` and `shares_memory` (16533)
- [Numpy] Numpy operator diff (15906)
- Miscellaneous fix for several numpy issues (16664)
- [Numpy] implement np.column_stack (16594)
- [numpy] add numpy operator : append (16564)
- Backport of 16711, 16737, 16408 to 1.6 branch (16763)
- Backport to 1.6 (16773, 16781, 16783, 16716, 16699, 16728, 16769, 16792) (16832)
- [Backport][v1.6.x] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan (16884)
- Backport of 16827, 16791 and 16888 to 1.6 branch (16901)
- port shape op to 1.6.x (16912)
- [Numpy] Fix imperative basic indexing in numpy (16902) (16919)
- Backport 16895, 16922, 16878, 16979 and 16900 to 1.6 (17029)

Graph optimizations

Pointwise fusion for GPU

DL models, besides compute intensive operations like convolutions and fully connected layers, feature a lot of simple pointwise (aka elementwise) operations (like elementwise addition etc.). Performance of those operations is fully memory bandwidth bound and so limit speedups from newer GPU hardware, which typically has high compute/memory bandwidth ratio. When multiple of such operations are chained one after another, it results in a series of unnecessary stores and loads as well as potential increased memory usage to store the intermediate results. Pointwise fusion helps in alleviating those problems by just-in-time generation of fused operators, which do not store intermediate results in memory, resulting in performance and memory usage improvements.

- Pointwise fusion for GPU (15167)
- Backport 16798, 16836 and 16838 to 1.6 (16874)
- Add support for boolean inputs to FusedOp (16796) (16892)
- Workaround problem with fusion in CUDA 9 (17028) (17035)

Eliminate common subexpressions

- Eliminate common expressions (15657)

Default MKLDNN Subgraph fusion

- [MKLDNN] Enable subgraph backend mkldnn by default. (15518)

New operators

- [OP] Add a new arange_like operator to contrib (15400)
- PDF operators for each distribution for which we have a random sampler (plus also the PDF of the Dirichlet). Supports probabilities and log-probabilities, as well as gradients. (14617)
- Group Normalization (14959)
- Add RROIAlign (16017)
- Add fast implementation of LARS (16122)
- Round and sign straight-through-estimators C operators. (16373)
- New ops for RCNN + old ops improvements for RCNN (16215)
- Comparison ops implemented using mshadow (16414)
- Add mask target generator operator for Mask-RCNN (16268)
- Move MRCNNMaskTarget op to contrib (16486)
- Mxnet allclose (14443)
- Aggregated adamw update (16398)
- Make mrcnn_mask_target arg mask_size a 2d tuple (16567)
- Dgl ops 2 (16416)
- Lamb optimizer update (16715)
- [OP] changing data type of 't' to int in lamb_update_phase1 (16903)
- Multi Precision Lamb Update operator (16885)
- Interleaved MHA for CPU path (17138) (17211)

Feature improvements

Automatic Mixed Precision

- [AMP] Move topk from FP16_FP32_FUNCS to FP32_FUNCS (15342)
- Conversion from FP32 model to Mixed Precision model (15118)
- Update fp16 docs: Block.cast is inplace (15458)
- FP16 Support for C Predict API (15245)
- Add AMP Conversion support for BucketingModule (15528)

Gluon Fit API

- Fixing build for gluon estimator test, including libtvm in pack libs (16148)
- [Estimator] handle composite metrics in estimator (16676)
- [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (16678)
- [Estimator] refactor estimator and clarify docs (16694)
- [Gluon] Improve estimator usability and fix logging logic (16810) (16846)
- Backport Gluon estimator changes to 1.6 (17048)
- fix parameter names in the estimator api (17051) (17162)

MKLDNN

- Upgrade MKL-DNN submodule to v0.20 release (15422)
- Fix quantized concat when inputs are mixed int8 and uint8 (15693)
- [MKLDNN]Enhance Quantization APIs and Tutorial (15448)
- Add quantization support for GluonCV (15754)
- add int8 bn mkldnn implementation and test (15664)
- [Quantization]support exclude operators while quantization (15910)
- [MKLDNN]Support fullyconnected and element-wise ops fusion (15950)
- Disable test coverage for Clang MKLDNN (15977)
- update support MKLDNN BN conditions (15870)
- [MKLDNN] Fix out of bound access of req vector (16000)
- add uint8 bn mkldnn implementation (16003)
- Improve quantization flow (15961)
- [MKLDNN] fix uint8 batch norm memory misuse (16034)
- MKL-DNN RNN checks NDArray version (16071)
- Float64 fallback for mkldnn subgraph and rnn op (15853)
- Update MKL-DNN dependency (16073)
- Integrate MKL-DNN leakyrelu (16075)
- [MKLDNN] NDArray reorder in C API and deconv (16265)
- Fix mkldnn reshape (16455)
- [MKLDNN] Fix uint quantized fc when not fusing with requantize (16523)
- [MKLDNN]Fix reorder2default (16602)
- Upgrade MKL-DNN dependency to v1.0 (16555)
- Revert "[MKLDNN]Fix reorder2default (16602)" (16697)
- [v1.6.x] Backport 16837 into v1.6.x (16847)
- Initial checkin (16856) (16872)

Large tensor support

- [MXNET-1413] Adding Large Tensor support for sort operators (15170)
- Large Index Support for Slice (15593)
- Add large tensor support binary arithmetic (15785)
- Large tensor support for random ops (15783)
- Add Large Tensor Support for Sequence, NN Ops (15807)
- Add power, exponent, log ops large tensor support (15794)
- removing unnecessary int64 C apis that were added to support Large Tensors and Vectors (15944)
- creating ndarray directly using mxnet ndarray primitives to reduce memory footprint of tests for topk, sort and argsort (15900)
- Adding tests to verify support for Large Tensors in additional Ops along with new C_Apis supporting 64bit indexing (15895)
- Added tests to verify Large Vector Support for initial set of ops (15943)
- Added more tests for Large Indices (15960)
- Add Large tensor vector test cases (15941)
- Test large vector mean operator and fix a few bugs (16079)
- Reducing memory footprint of one_hot for Large Array Testing (16136)
- removing MXNDArrayLoadFromBuffer64 and MXNDArrayLoad64 (16203)
- Fix large array tests (16328)
- added more tests to verify support for large vector (16477)
- added support for large tensors for Dropout operator and tests to verify support for more operators (16409)
- adding large tensor support for add_n and tests for more ops (16476)
- adding large tensor support for pad operator (15126)
- Added large tensor support and test for gather_nd (16371)
- Large Vector tests for DGL Ops Part 2 (16497)
- Showing proper error message when an attempt is made to create large tensor but MXNet is not built with it (16570)

TensorRT integration

- enable TensorRT integration with cpp api (15335)
- Add unit tests for TensorRT integration and fix some bugs (15399)

Higher order gradient support

- [MXNET-978] Higher order gradient for sigmoid (15288)
- [MXNET-978] Higher Order Gradient Support `reciprocal`, `abs`. (15413)
- [MXNET-978] Add higher order gradient support `tan`, `tanh` (15253)
- [MXNET-978] Higher Order Gradient Support `arctan`, `arctanh`, `radians`. (15531)
- [MXNET-978] Higher Order Gradient Support `sqrt`, `cbrt`. (15474)
- [MXNET-978] Higher Order Gradient Support `clip`, `dropout`. (15746)
- [MXNET-978] Higher Order Gradient Support `sinh`, `cosh`. (15412)
- [MXNET-978] n-th order gradient test support. (15611)
- [MXNET-978] Fully connected, higher order grad (14779)
- [MXNET-978] Higher Order Gradient Support `arcsinh`, `arccosh`. (15530)

Operator improvements

- broadcast axis is alias to broadcast axes; doc fix (15546)
- Utility to help developers debug operators: Tensor Inspector (15490)
- Softmax with length (15169)
- in-place reshape ops (14053)
- Add missing default axis value to symbol.squeeze op (15707)
- Add matrix determinant operator in linalg (15007)
- Add fp16 support for topk (15560)
- [MXNET-1399] multiclass-mcc metric enhancements (14874)
- new raise mode for nd.take and fix backward for wrap mode (15887)

Profiler

- Fixing duplication in operator profiling (15240)
- Custom Operator Profiling Enhancement (15210)
- [Opperf] Make module/namespace of the operator parameterized (15226)
- Opperf: Support Python<3.6 (15487)
- Add transpose_conv, sorting and searching operator benchmarks to Opperf (15475)
- Deprecate USE_PROFILER flag (15595)
- Update profiler.md (15477)
- [Opperf] Add array rearrange operators to opperf (15606)
- [OpPerf] PDF Random ops fix (15661)
- [Opperf] Add optimizer update operator benchmarks to opperf (15522)
- fix broadcast op param (15714)
- [OpPerf] Profiler flag for Python, Cpp (15881)
- [Opperf] Filter out deprecated ops (15541)
- [OpPerf] Handle positional arguments (15761)
- [OpPerf] Take care of 4d param (15736)
- Add Median,p50,p99 to python profiler (15953)
- adding "total" (total time) to profiler aggregate stats sorting criteria (16055)

ONNX import/export

- Correct ONNX documentation (15914)
- [MXNET-895] ONNX import/export: TopK (13627)

Runtime discovery of features

- Making Features as a singleton for improved caching (15835)

Bug fixes

- [bug] fix higher grad log (15120)
- Showing proper error when csr array is not 2D in shape. (15242)
- add 'asnumpy' dtype option to check_symbolic_backward (15186)
- point fix the vector declaration in MultiBoxDetection (15300)
- Temporarily Commenting out Flaky Test (15436)
- Fix memory leak in NaiveEngine (15405)
- fix nightly CI failure (15452)
- Small typo fixes in batch_norm-inl.h (15527)
- Bypass cuda/cudnn checks if no driver. (15551)
- Julia path patch (15561)
- Fix AMP Tutorial failures (15526)
- Fix warnings in CLang: (15270)
- Fix dumps for Constant initializer (15150)
- fix normalize mean error bug (15539)
- [fix] print `self` in warning. (15614)
- [MXNET-1411] solve pylint error issue14851 (15113)
- [Flaky test] Skip test_operator_gpu.test_convolution_independent_gradients (15631)
- Fix subgraph with custom_op (15671)
- Fix USE_BLAS == openblas check (15691)
- update previous flaky naive engine test (15651)
- make TransposeShape infer shape form both sides (15713)
- Skip Flaky Test (15722)
- Revert "Dynamic Library Loading Support" (15755)
- Fix flaky test test_global_metric (15756)
- Fix PR 15489 (Dynamic Library Loading Support) (15760)
- Refactor LibraryInitializer so it's thread safe. Fixes random sporadical concurrency crashes. (15762)
- Fix backward_clip num inputs and type of clip params (15688)
- fixing problem with existing Singleton Caching (15868)
- Allow operators with multiple outputs in get_atomic_symbol (15740)
- Fix ConcatType backward type inference (15829)
- Add disable attr to subgraph property (15926)
- Re-enable flaky test_prelu (15777)
- declare explicitly the tblob default assign operator and copy constructor (15937)
- Discard needless test cases in `test_convolution_independent_gradients` (15939)
- fix naive engine for multi-threaded inference (15574)
- Fix get_rows_per_block (15979)
- Fix a memory misalignment in topk operator (15948)
- Decouple dtype from shape for Random multinomial (15980)
- Fix dtype inference in arange_like operator (15930)
- Disable laop_6 (15976)
- Fix flaky clojure profile test (16058)
- fix test_pick test time is too long (16066)
- [fix] Support nullop in `transpose` (15865)
- fix flaky test (16074)
- fix some test files test time is too long (16067)
- Fix gradient tensor mutate in `{adam/ftrl/rmprop/rmspropalex}_update`. (15768)
- Fix unary operator ceil/floor/trunc when data type is integer (14251)
- Fix failing tests (16117)
- Fixes NAG optimizer 15543 (16053)
- avoid test relu at the origin due to discontinuous gradient (16133)
- Fix remaining errors reported by D2L (16157)
- use 1E-4 in groupnorm test(16169)
- Sequence last fix (16156)
- fixing test for model compatibility checker (16159)
- assert_allclose -> rtol=1e-10 (16198)
- [MEMORY] retry GPU memory allocation if fragmented (16194)
- improve dataloader signals and messages (16114)
- Update ndarray.py (16205)
- fix flaky test (16191)
- Solve 14116, 15143 (15144)
- [MXNET-1422] Fix wrong results of min([inf, inf]) and max([-inf,-inf]) (16226)
- Fix inconsistent interpolation method values (16212)
- set fixed seed for profiler (16155)
- Fix MXNDArrayGetData (16289)
- fix atol for test_preloaded_multi_sgd (16356)
- Fix windows flakiness (16415)
- cuDNN non-persistant bidirectional RNN dgrad sync fix (16391)
- [BUGFIX] Minor type issues in Squeeze (16448)
- Fix Nightly Tests for Binaries (16451)
- Fix dtype bug (16467)
- Fix flakey pylint CI failures (16462)
- Load NDArray only to GPU if GPU is present (16432)
- Bug fix for the input of same axes of the swapaxes operator (16513)
- Fix learning rate scheduler being unexpectedly overwritten by optimizer's default value (16487)
- disable tests (16536)
- fix pylint in CI (16540)
- image crop gpu (16464)
- Build dmlc-core with old thread_local implementation (16526)
- fix doc for topk (16571)
- RNNOp to call cudaEventCreate lazily (16584)
- add encoding to the stub files for potential utf8 char in doc strings (16580)
- Surpress subgraph log in CI (16607)
- Fix dequantize memory corruption (16606)
- Fix for wrong reqs set after switching from training to inference (16553)
- Disables test_bulking_operator_gpu due to flakiness (16611)
- Imagenet inference to nightly fix (16599)
- Move some subgraph verbose to MXNET_SUBGRAPH_VERBOSE=2 (16622)
- RNNOp only call cuda/cudnn if GPU ctx is requested (16632)
- fix bad encode (16641)
- Disable float16 test (16643)
- Fix GetMKLDNNData for delay alloc (16618)
- Move ops which don't support FP16 dtype to FP32 list (16668)
- no such method => modified function args (16610)
- fix cuDNN RNN dtype_with_fallback_ bug (16671)
- Add check if scipy is imported in sparse.py (16574)
- Added launch bounds to the reduce kernels (16397)
- fix install dir (16690)
- fix binary dependencies in CD and nightly (16693)
- Fix SliceChannel Type inference (16748) (16797)
- fix flakiness of test_np_mixed_precision_binary_funcs (16873)
- Fix test_gluon.py:test_sync_batchnorm when number of GPUS > 4 (16835)
- Omp fork numthreads fix 1.6 (17000)
- [BUGFIX] Fix race condition in kvstore.pushpull (17007) (17052)
- Backport 17002, 17068 and 17114 to 1.6 branch (17137)
- Backport 3rdparty/openmp fixes (17193)
- fix norm sparse fallback (17149)

Front end API

- Expose get_all_registered_operators and get_operator_arguments in the… (15364)
- Add magic method `abs` to NDArray and Symbol. (15680)
- Dynamic Library Loading Support (15489)
- [MXNET-1294] Add KVSTORE PushPull API (15559)

Gluon

- [Dataset] Add take, filter, sample API to dataset (16078)
- Add register_op_hook for gluon (15839)
- [Dataset] add shard API (16175)
- Add list_ctx to ParameterDict (16185)
- [Gluon] Support None argument in HybridBlock (16280)
- Aggregated zero grad (16446)
- try to fix block (16465)
- [Gluon] Don't serialize shared parameters twice (16582)
- Initializer.__eq__ (16680)

Symbol

- Add symbol api for randn and fix shape issue for randn ndarray and symbol api (15772)
- Graph Partition API (15886)

Language Bindings

Python

MXNet community [voted](https://lists.apache.org/thread.html/r3a2db0f22a1680cc56804191446fef2289595798ca19fd17de1ff03e%40%3Cdev.mxnet.apache.org%3E) to no longer support Python 2 in future releases of MXNet. Therefore, MXNet 1.6 release is going to be the last MXNet release to support Python 2.

C/C++

- [C++] Improve inference script to support benchmark on Imagenet (15164)
- C Api for simplebind, fix comment for trigoops, add atol to assert (16585)

Clojure

- Extend Clojure BERT example (15023)
- [Clojure] Add fastText example (15340)
- make clojure api generator tests less brittle (15579)

Julia

- add julia env settings (15523)
- julia: bump window prebult binary version to v1.5.0 (15608)
- julia: remove Travis CI related files (15616)
- julia: bump binding version to v1.6.0 (15607)
- julia: rename build env var `MXNET_HOME` to `MXNET_ROOT` (15568)
- Revert "julia: rename build env var `MXNET_HOME` to `MXNET_ROOT` (15568)" (16147)
- julia: fix `mx.forward` kwargs checking (16138)
- julia: implement `context.num_gpus` (16236)
- julia: add `AbstractMXError` as parent type (16235)
- [MXNET-1430] julia: implement context.gpu_memory_info (16324)
- julia/docs: more DRY on page rendering (16396)

Perl

- [Perl] - simplify aliasing strategy (15395)
- [Perl] - ndarray to native array conversion fix (16635)

Scala

- Add Sparse NDArray support for Scala (15378)
- fix the bug on Scala Sparse (15500)
- fix heap-use-after-free in scala (15503)
- Bump Scala version to 1.6 (15660)
- Fix Scala Symbolic API some/Some typo (15687)
- Faster Scala NDArray to BufferedImage function (16219)

Performance improvements

- Proper bulking of ops not using FCompute (15272)
- improve layernorm CPU performance (15313)
- Efficient MXNet sampling in the multinomial distribution (15311)
- Revert default return type for indices in argsort() and topk() back to float32 (15360)
- Use omp threads for cpu data loader (15379)
- Accelerate ROIPooling layer (14894)
- Avoid memory copy for dropout inference (15521)
- Add omp parallel optimization for _contrib_BilinearReisze2D (15584)
- Softmax optimization for GPU (15545)
- Speed up group executor (16069)
- FullyConnected Bias performance improvement on GPU (16039)
- Embedding gradient performance optimization on GPU (16355)
- Faster Transpose 2D (16104)
- Pseudo 2D transpose kernel (16229)
- Faster general take (16615)

Examples and tutorials

- [TUTORIAL] Gluon performance tips and tricks (15427)
- Updating profiler tutorial to include new custom operator profiling (15403)
- [TUTORIAL] Gluon and Sparse NDArray (15396)
- [TUTORIAL] Revise Naming tutorial (15365)
- Revise Symbol tutorial (15343)
- Two fixes for info_gan.md example Code (15323)
- Rebase 13757 to master (15189)
- Tensor Inspector Tutorial (15517)
- logging (15106)
- update profiler tutorial (15580)
- [MXNET-1358] Fit api tutorial (15353)
- Tutorials nighly fix (16179)
- Update add_op_in_backend.md (16403)
- typo fix in r doc lstm tutorial (16546)
- [MKL-DNN] Add mxnet mkldnn cmake tutorial (16688)

Website and documentation

- [DOC] Clarify that global pooling is going to reset padding (15269)
- Update sparse_retain Documentation (15394)
- nano instructions (15117)
- remove comments from nano instructions (15433)
- REAME MTCNN Link URL Error in original website (15020)
- Update Horovod docs links in README (15366)
- fix doc for sort and argsort (15317)
- fix comment (15481)
- Improve docs for AMP (15455)
- [Doc] Add MKL install method apt/yum into tutorial (15491)
- Julia docs (15454)
- Docs: Fix misprints (15505)
- website build for julia: fix path to be static (15554)
- some minor typos/clarifications (15538)
- refine Nano setup directions (15524)
- [Doc] add squeeze to Array change shape (15549)
- fix typo (15648)
- Fix url (404 error) (15683)
- update julia install doc (15609)
- [DOC] refine autograd docs (15109)
- [DOC] Fix many arguments in the doc: reshape_like, arange_like, shape_array (15752)
- Add Gather_nd Scatter_nd to NDArray API category doc (15689)
- [Dependency Update] [Doc] move the general prerequisite software to the top (15896)
- typo in docs (16094)
- [WIP] New Website: New Docs [1/3] (15884)
- [DOC] Fix doc for nn.Embedding, nn.Dense and nd.Embedding (15869)
- [DOC] Consistent capitalization: mxnet -> MXNet, scala -> Scala (16041)
- New Website: Remove Old Content [2/3] (15885)
- New Website: New Pipeline [3/3] (15883)
- Update KL Divergence formula (16170)
- fix broken links (16255)
- redirect to the 404 page (16287)
- add google-analytics config (16271)
- Fixing links for website + Fixing search (16284)
- Minor fix in ToTensor documentation. (16299)
- adding redirects so that old website API links surfaced from searches (16342)
- Fix code block formatting in Why MXNet doc page (16334)
- Julia: add API docs back (16363)
- Change mailing list url in footer to point to instructions about how to subscribe instead (16384)
- Add instructions to report a security vulnerability (16383)
- [DOC] fix installation selector wrong history (16381)
- Beta build (16411)
- [WIP] Improving Python Docs API (16392)
- fix autodoc for spurrious toggles (16452)
- [Doc] Update the download page with 1.5.1 release (16442)
- Fixing broken links (16500)
- add binary and docs build command options (16514)
- add option to remove indexes (16525)
- Correct Google Analytics Tracker (16490)
- [Doc] Use mirror link in the download page (16501)
- checking broken link fixes work (16538)
- detect number of procs during sphinx build (16512)
- fixed broken links across multiple files (16581)
- fix missing docs due to git add issues (16496)
- second round of fixing broken links in multiple files (16598)
- Python Docstring Convetion (16550)
- [MXNET-1434] Fix a broken link for basic C++ tutorial (16461)
- Fix python doc build issue (16630)
- fixing broken links in multiple files - round 3 (16634)

CI/CD

- Fix build_ccache_wrappers: (14631)
- Remove mhard-float option. This is already deprecated by Google. (15435)
- CI: upgrade Julia version from 1.0.3 to 1.0.4 (15502)
- Add -R option to ci/build.py to avoid rebuilding containers (15426)
- [Dependency Update] Bump up the CI Nvidia docker to CUDA 10.1 (14986)
- fixed config.mk and Makefile bugs for installing mkl (15424)
- Add -DMXNET_USE_OPENMP to Makefiles so libinfo gets updated accordingly (15498)
- [Dependency Update] Dependency update doc (15045)
- Remove Scala package test on build (15915)
- Refactor for windows CI 'out of heap space' errors (15922)
- Fix Nightly Maven GPU (15989)
- Windows cmake flags cleanup (16013)
- Disable flaky test in test_amp_conversion (16031)
- Updates git_init Jenkins utility function to support checking out a particular commit id
- Adds artifact repository scripts
- Adds CD pipeline framework
- Adds static libmxnet release pipeline
- Updates CD pipeline
- Adds documentation
- Updates kvstore functions to use pushd and popd
- Throws exceptions instead o magic numbers
- Updates artifact repository cli to use --libtype instead of --static or --dynamic
- Clarifies ci_utils and cd_utils origin remark
- Adds clarifying note on why ubuntu 14.04 is being used for compilation
- Removes MXNET_SHA
- Removes set_release_job_name
- Adds license headers
- Updates artifact repository to expect licenses
- Moves ci/cd to cd directory
- Takes downstream job name from environment
- Updates order of parameters
- Updates job type parameter to dropdown
- Adds libmxnet feature extraction code comments
- Removes ccache setup from static build
- Disable test coverage of C++ codebase on CI (15981)
- Update readme and project.clj comment (16084)
- Enable tvm_op for ci (15889)
- Not to search for coverage files when none exist (16107)
- Fixes openblas installation for static build
- Update python dependencies (16105)
- CD Fixes (16127)
- Adds dynamic libmxnet to CD pipeline (16163)
- Fix README Build Status (16183)
- subscribe to build and CD changes (16192)
- [CD] Add COMMIT_ID param to release job (16202)
- Fix lack of dylib support in Makefile when use lapack (15813)
- Removes git status update stop gap solution (16285)
- add mkl installation temp fix (16304)
- add 'Release' cmake flag (16294)
- S3 upload artifacts (16336)
- Fix nightly scala pipeline (16362)
- remove redundant branch name (16372)
- Skipping installing nightly test (16418)
- Adds PyPI CD Pipeline (16190)
- upgrade the pytest version (16429)
- Revert "add mkl installation temp fix (16304)" (16369)
- increase docker cache timeout (16430)
- Adds pip requirements file to nightly gpu ci image (16472)
- [CD] Adds python docker pipeline (16547)
- Move imagenet inference to nightly (16577)
- Backport 16980 17031 17018 17019 to 1.6 branch (17213)

Misc

- update committer info (15289)
- Typo fix in plan_memory relase -> release. (15299)
- indent changes (15321)
- Had a few PRs merged. Hope to become an official contributor and potentially a commiter. (15451)
- cuda/cuDNN lib version checking. Force cuDNN v7 usage. (15449)
- Improve diagnose.py, adding build features info and binary library path. (15499)
- update ratcheck for apache-rat 0.13 release (15417)
- add myself to interested modules (15590)
- 1.5.0 news (15137)
- bump up version from 1.5.0 to 1.6.0 on master (15072)
- Remove myself from CODEOWNERS (15617)
- remove mshadow submodule
- import mshadow source tree
- cuDNN support cleanup (15812)
- Remove requests_failed_to_import handling
- Update CODEOWNERS. (15972)
- Improve diagnose.py to display environment variables (15715)
- Update README.md (16035)
- [Dev] update ps-lite dependency (15936)
- Typedef cleanup (15899)
- add KEY for Tao Lv (16081)
- remove 'foo' and other print msg from test (16088)
- Revert accidental change to CMakelists (16040)
- Update env_var.md (16145)
- Update dmlc-core (16149)
- adding codeowners (16165)
- Factorize CUDA_KERNEL_LOOP used in CUDA kernels (16197)
- add code of conduct and conflict resolution (16343)
- simple typo error in NEWS.md (16344)
- update NEWS.md and README.md (16385)
- split issue templates (16558)
- Create SECURITY.md (16573)

1.5.1

Not secure

Apache MXNet (incubating) 1.5.1 is a maintenance release incorporating important bug fixes and important performance improvements. All users of Apache MXNet (incubating) 1.5.0 are advised to upgrade. You can install Apache MXNet (incubating) 1.5.1 at the usual place. Please review these Release Notes to learn the bug fixes.

Bug-fixes
* add deconv in TRT subgraph (15666) (16043)
* Update TRT tutorial with new APIs (16044)
* Fix _copy_to on MKLDNN backend (15637) (15803)
* Benchmark doc fix (15769) (16029)
* remove Julia cat image for license issue (15964) (16026)
* added check for empty params file and unknown param (not arg/aux) (15917)
* fix license issues (15806) (15860)
* prevent TRT_Logger to be destroyed before TRT engine (14898) (15877)
* [MXNET-1086] added sub and mul to ONNX->TensorRT conversion (15344) (15875)
* handle fix_gamma in tensorrt subgraph conversion correctly (15645) (15874)
* fix LinearRegressionOutput with empty label (15620) (15873)
* [v1.5.x] [MKLDNN] Independent gradients requests check with respect to weights… (15805)
* fix dropout mask output (15697) (15804)
* fix fp32 flatten issue (15351) (15802)
* Clojure package remove source images (15828)
* changed constructor args (15601) (15827)
* Add MKLDNN 4c layout to fix gluoncv se_resnext101_64x4d (15692) (15801)
* Fix the bug of `MXEnginePushAsyncND` and `MXEnginePushSyncND` (15751) (15792)

1.5.0

Not secure

New Features

Automatic Mixed Precision(experimental)
Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing numbers of layers and parameters, which slow down training. Fortunately, software optimizations and new generations of training hardware make it a feasible task.
However, most of the hardware and software optimization opportunities exist in exploiting lower precision (e.g. FP16) to, for example, utilize Tensor Cores available on new Volta and Turing GPUs. While training in FP16 showed great success in image classification tasks, other more complicated neural networks typically stayed in FP32 due to difficulties in applying the FP16 training guidelines.
That is where AMP (Automatic Mixed Precision) comes into play. It automatically applies the guidelines of FP16 training, using FP16 precision where it provides the most benefit, while conservatively keeping in full FP32 precision operations unsafe to do in FP16. To learn more about AMP, check out this [tutorial](https://github.com/apache/incubator-mxnet/blob/master/docs/tutorials/amp/amp_tutorial.md).

MKL-DNN Reduced precision inference and RNN API support
Two advanced features, fused computation and reduced-precision kernels, are introduced by MKL-DNN in the recent version. These features can significantly speed up the inference performance on CPU for a broad range of deep learning topologies. MXNet MKL-DNN backend provides optimized implementations for various operators covering a broad range of applications including image classification, object detection, and natural language processing. Refer to the [MKL-DNN operator documentation](https://github.com/apache/incubator-mxnet/blob/v1.5.x/docs/tutorials/mkldnn/operator_list.md) for more information.

Dynamic Shape(experimental)
MXNet now supports Dynamic Shape in both imperative and symbolic mode. MXNet used to require that operators statically infer the output shapes from the input shapes. However, there exist some operators that don't meet this requirement. Examples are:
* while_loop: its output size depends on the number of iterations in the loop.
* boolean indexing: its output size depends on the value of the input data.
* many operators can be extended to take a shape symbol as input and the shape symbol can determine the output shape of these operators (with this extension, the symbol interface of MXNet can fully support shape).
To support dynamic shape and such operators, we have modified MXNet backend. Now MXNet supports operators with dynamic shape such as [`contrib.while_loop`](https://mxnet.apache.org/api/python/ndarray/contrib.html#mxnet.ndarray.contrib.while_loop), [`contrib.cond`](https://mxnet.apache.org/api/python/ndarray/contrib.html#mxnet.ndarray.contrib.cond), and [`mxnet.ndarray.contrib.boolean_mask`](https://mxnet.apache.org/api/python/ndarray/contrib.html#contrib)
Note: Currently dynamic shape does not work with Gluon deferred initialization.

Large Tensor Support
Currently, MXNet supports maximal tensor size of around 4 billon (2^32). This is due to uint32_t being used as the default data type for tensor size, as well as variable indexing.
This limitation has created many problems when larger tensors are used in the model.
A naive solution to this problem is to replace all uint32_t in the MXNet backend source code to int64_t.
This solution is not viable, however, because many data structures use uint32_t as the data type for its members.
Unnecessarily replacing these variables to int64_t will increase the memory consumption causing another limitation. Second, MXNet has many submodule dependencies.
Updating the variable types in the MXNet repository is not enough. We also need to make sure different libraries, such as MKLDNN, MShadow etc. supports the int64_t integer data type.
Third, many front end APIs assume unsigned 32-bit integer interface. Only updating the interface in C/C++ will cause all the language bindings to fail.
Therefore, we need a systematic approach to enhance MXNet to support large tensors.
Now you can enable large tensor support by changing the following build flag to 1: `USE_INT64_TENSOR_SIZE = 1`. Note this is set to 0 by default.
For more details please refer to the [design document](https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support).

Dependency Update
MXNet has added support for CUDA 10, CUDA 10.1, cudnn7.5, NCCL 2.4.2, and numpy 1.16.0.
These updates are available through PyPI packages and build from source, refer to [installation guide](https://mxnet.apache.org/versions/master/install/index.html) for more details.

Gluon Fit API(experimental)
Training a model in Gluon requires users to write the training loop. This is useful because of its imperative nature, however repeating the same code across multiple models can become tedious and repetitive with boilerplate code.
The training loop can also be overwhelming to some users new to deep learning. We have introduced an Estimator and Fit API to help facilitate training loop.
Note: this feature is still experimental, for more details, refer to [design document](https://cwiki.apache.org/confluence/display/MXNET/Gluon+Fit+API+-+Tech+Design).

New Operators
* split_v2 (13687)
* Gradient multiplier (contrib) operator (13632)
* Image normalize operator - GPU support, 3D/4D inputs (13802)
* Image ToTensor operator - GPU support, 3D/4D inputs (13837)
* Add Gluon Transformer Crop (14259)
* GELU (14449)
* AdamW operator (Fixing Weight Decay Regularization in Adam) (13728)
* [MXNET-1382] Add the index_array operator (14638)
* add an operator for computing the likelihood of a Hawkes self-exciting process (14683)
* Add numpy linspace (14927)

Feature Improvements

Operators
* make ROIAlign support position-sensitive pooling (13088)
* Add erfinv operator for calculating inverse error function (13811)
* Added optional parameters to BilinearResize2D to do relative scaling (13985)
* MXNET-1295 Adding integer index support to Sequence* family of operators. (13880)
* Export resize and support batch size (14014)
* CUDNN dropout (13896)
* Relaxing type requirements for slice_like op (14097)
* Relaxing type requirements for reshape_like op (14325)
* Parallelize CPU version and add GPU version of boolean_mask op (14090)
* Add NHWC layout support to Pooling (cpu, gpu cuda, gpu cuDNN) (13749)
* Multi-precision AdamW update op (14171)
* [op] add back support for scalar type rescale_grad argument for adamw_update/mp_adamw_update (14221)
* move choose_element_0index to operator (14273)
* Optimize NMS (14290)
* Optimize NMS part 2 (14352)
* add background class in box_nms (14058)
* Use cudnn for dropout by default (14278)
* In-place updates for Nadam, Adadelta, Adamax and SGLD (13960)
* Aggregate SGD (13346)
* Add proper exception message for negative shape in array creation routines (14362)
* Support multi-threading for Custom Operator (14363)
* moveaxis operator now accepts negative indices and sequence of ints as well. (14321)
* Support SyncBatchNorm5D (14542)
* Add nd.power and sym.pow (14606)
* Change RNN OP to stateful (14476)
* Add imresize and copyMakeBorder to mx.image (13357)
* add ctx for rand_ndarray and rand_sparse_ndarray (14966)
* Add cpu implementation for Deformable PSROIPooling (14886)
* Add warning for fp16 inputs with MXNET_SAFE_ACCUMULATION=0 (15046)
* Safe LayerNorm (15002)
* use MXNET_SAFE_ACCUMULATION for softmax accumulator (15037)
* LayerNorm acceleration on GPU (14935)
* Add matrix inversion operator in linalg (14963)
* implementation for equivalence of tf.moments (14842)
* Use env var to enforce safe accumulation in ReduceAxesCompute (14830)
* [MXNet-1211] Factor and "Like" modes in BilinearResize2D operator (13226)
* added extraction/generation of diagonal and triangonal matrices to linalg (14501)
* [Mxnet-1397] Support symbolic api for requantize and dequantize (14749)
* [MXNET-978] Support higher order gradient for `log`. (14992)
* Add cpu implementation for Deformable Convolution (14879)

MKLDNN
* Feature/mkldnn static (13628)
* Feature/mkldnn static 2 (13503)
* support mkl log when dtype is fp32 or fp64 (13150)
* Add reshape op supported by MKL-DNN (12980)
* Move the debug output message into MXNET_MKLDNN_DEBUG (13662)
* Integrate MKLDNN Conv1d and support 3d layout (13530)
* Making MKL-DNN default on MXNet master (13681)
* Add mkldnn OP for slice (13730)
* mkldnn s8 conv API change for master (13903)
* [MKLDNN] Enable signed int8 support for convolution. (13697)
* add mkldnn softmax_output (13699)
* MKLDNN based Quantized FullyConnected Operator and its fusion (14128)
* Fix entropy for uint8 (14150)
* Update MKL-DNN to v0.18 release (was: fix the Dense layer issue) (13668)
* [MKL-DNN] Enable s8 support for inner product and 3d input with flatten=false (14466)
* Optimize transpose operator with MKL-DNN (14545)
* [MKLDNN] Remove repeat parts in MKLDNN.md (14995)
* [MKLDNN] Enable more convolution + activation fusion (14819)
* Update MKL-DNN submodule to v0.19 (14783)
* Add mkldnn_version.h to pip package (14899)
* [MKLDNN] add quantized sum (14614)
* [MKLDNN]Refactor requantize to speed up execution (14608)
* [MKLDNN]Add quantized relu (14604)
* Add MKLDNN headers to pip package (14339)
* add symbolic link to mkldnn header files in include (14300)
* disable default MKLDNN for cross compilation (13893)
* Update MKLDNN_README.md (13653)
* [Quantization] Support zero-size tensor input for quantization flow (15031)
* Support 3D input for MKL-DNN softmax operator (14818)
* Add primitive cache for MKL-DNN sum(elemwise_add operator (14914)
* Fix reshape to add in-place back (14903)
* [int8] Add MobileNetV2_1.0 & ResNet18 Quantization (14823)
* [MKLDNN]Improve quantizeV2 and dequantize latency (14641)
* added mkldnn dependency for plugin compile target (14274)
* Support Quantized Fully Connected by INT8 GEMM (12922)

ONNX
* ONNX export: Instance normalization, Shape (12920)
* ONNX export: Logical operators (12852)
* ONNX import/export: Size (13112)
* ONNX export: Add Flatten before Gemm (13356)
* ONNX import/export: Add missing tests, ONNX export: LogSoftMax (13654)
* ONNX import: Hardmax (13717)
* [MXNET-898] ONNX import/export: Sample_multinomial, ONNX export: GlobalLpPool, LpPool (13500)
* ONNX ops: norm exported and lpnormalization imported (13806)
* [MXNET-880] ONNX export: Random uniform, Random normal, MaxRoiPool (13676)
* ONNX export: Add Crop, Deconvolution and fix the default stride of Pooling to 1 (12399)
* onnx export ops (13821)
* ONNX export: broadcast_to, tile ops (13981)
* ONNX export: Support equal length splits (14121)

TensorRT
* [MXNET-1252][1 of 2] Decouple NNVM to ONNX from NNVM to TenosrRT conversion (13659)
* [MXNET-703] Update to TensorRT 5, ONNX IR 3. Fix inference bugs. (13310)
* [MXNET-703] Minor refactor of TensorRT code (13311)
* reformat trt to use subgraph API, add fp16 support (14040)

FP16 Support
* Update mshadow to support batch_dot with fp16. (13716)
* float32 → float16 cast consistency across implementations (13857)
* modifying SyncBN doc for FP16 use case (14041)
* support dot(vector, vector) for fp16 inputs on GPU (14102)
* softmax for fp16 with fp32 accumulator (14098)
* [MXNET-1327] Allow RNN Layers to be initialized to fp16 (14219)
* fp16 safe norm operator (14616)
* NAG Optimizer with multi-precision support (14568)

Deep Graph Library(DGL) support
* Add graph_compact operator. (13436)
* Accelerate DGL csr neighbor sampling (13588)

Horovod Integration
* Add extra header file to export for error checking (13795)
* whitelist symbols for using MXNet error handling externally (13812)
* Use CPUPinned context in ImageRecordIOParser2 (13980)
* Add pin_device_id option to Gluon DataLoader (14136)

Dynamic Shape
* [MXNET-1315] Add checks for dynamic-shaped operators in CachedOp (14018)
* [MXNET-1325] Make InferShapeAttr a standalone pass (14193)
* [MXNET-1324] Add NaiveRunGraph to imperative utils (14192)
* [MXNET-1352] Allow dynamic shape in while_loop and if conditionals (14393)

Backend Engine
* Add infer_type_partial (14214)
* Tidy up storage allocation and deallocation (14480)
* Add MXEnginePushAsync and MXEnginePushSync C APIs (14615)
* Enhance subgraph API (14113)
* Enhance PartitionGraph (14277)
* Allow clearing gpu cache (14252)
* Fix warning / static function in header. (14900)
* Simplify creation of NodeEntry instances and use emplace_back (14095)
* Add unpooled gpu memory type (14716)
* [MXNET-1398] Enable zero-copy from numpy to MXNet NDArray (14733)
* Use DEFAULT macro in C APIs (14767)
* Avoid unnecessary vector copies in imperative_utils.cc (14665)
* Support populating errors back to MXNet engine in callback (13922)
* Restore save/load ndarray to 1.4.1 (15073)
* Enable serializing/deserializing ndarrays in np_shape semantics (15090)
* [numpy] Support zero-dim and zero-size tensors in MXNet (14661)
* Rename np_compat to np_shape (15063)
* [MXNET-1330] Bring nnvm::Tuple to mxnet::Tuple (14270)

Large Tensor Support
* Large array support for randint (14242)
* [MXNET-1185] Support large array in several operators (part 1) (13418)
* [MXNET-1401] adding more operators to test support for Large Tensor (14944)
* [MXNET-1410]Adding Large Tensor Support for tensor transpose (15059)

Quantization
* Exclude concat layer for gpu quantization (14060)
* Enhance gpu quantization (14094)
* Register fake grad to subgraph and quantized operators (14275)
* Add int8 data loader (14123)

Profiler
* [MXNET-857] Add initial NVTX profiler implementation (12328)

CoreML
* Add more support for mxnet_to_coreml (14222)

Front End API

Gluon
* Add pixelshuffle layers (13571)
* [MXNET-766] add dynamic_unroll RNN for HybridBlock (11948)
* add pos_weight for SigmoidBinaryCrossEntropyLoss (13612)
* Rewrite dataloader with process pool, improves responsiveness and reliability (13447)
* Complimentary gluon DataLoader improvements (13606)
* [Fit-API] Adress PR comments (14885)
* [Fit API] update estimator (14849)
* [MXNET-1396][Fit-API] Update default handler logic (14765)
* [Fit API] improve event handlers (14685)
* move to gluon contrib (14635)
* move estimator to contrib (14633)
* [MXNet-1340][Fit API]Update train stats (14494)
* [MXNet-1334][Fit API]base class for estimator and eventhandler (14346)
* [MXNET-1333] Estimator and Fit API (14629)
* Add support for fast variable-length LSTM (14208)
* Add the Gluon Implementation of Deformable Convolution (14810)
* hybridize rnn and add model graph (13244)

Python
* Python BucketingModule bind() with grad_req = 'add' (13984)
* Refine runtime feature discovery python API and add documentation to ... (14130)
* Runtime feature detection (13549)
* Add dtype visualization to plot_network (14066)
* [MXNET-1359] Adds a multiclass-MCC metric derived from Pearson (14461)
* support long for mx.random.seed (14314)
* Optimization of metric evaluation (13471)
* [MXNET-1403] Disable numpy's writability of NDArray once it is zero-copied to MXNet (14948)
* Refactor ImageRecordIter (14824)

Language Bindings

Scala
* [MXNET-1260] Float64 DType computation support in Scala/Java (13678)
* [MXNET-1000] get Ndarray real value and form it from a NDArray (12690)
* Now passing DType of Label downstream to Label's DataDesc object (14038)
* Scala interpreter instructions (14169)
* Add default parameters for Scala NDArray.arange (13816)
* [MXNET-1287] Up scala comp (14667)
* [MXNET-1385] Improved Scala Init and Macros warning messages (14656)
* Remove all usages of makefile for scala (14013)
* Update scala-package gitignore configuration. (13962)
* [MXNET-1177]Adding Scala Demo to be run as a part of Nightly CI (13823)
* [MXNET-1287] Miscellaneous Scala warning fixes (14658)
* Fix jar path and add missing ones for spark jobs (14020)
* [MXNET-1155] Add scala packageTest utility (13046)
* [MXNET-1195] Cleanup Scala README file (13582)
* Add scalaclean to make clean (14322)
* Add maven wraper to scala project. (13702)
* Add new Maven build for Scala package (13819)
* [MXNET-1287] Feat dep (14668)
* add Apache header on all XML (14138)
* update the version name (14076)
* change to compile time (13835)
* [MXNET-918] Random module (13039)
* Avoid secondary deployment of package to local (14647)

Java
* [MXNET-1180] Java Image API (13807)
* [MXNET-1285] Draw bounding box with Scala/Java Image API (14474)
* Add BERT QA Scala/Java example (14592)
* [MXNET-1232] fix demo and add Eclipse support (13979)
* [MXNET-1331] Removal of non-MXNET classes from JAR (14303)
* Java install info update (13912)
* [MXNET-1226] add Docs update for MXNet Java (14395)
* [MXNET-1383] Java new use of ParamObject (14645)
* MXNET-1302 Exclude commons-codec and commons-io from assembled JAR (14000)

C++
* print error message for mxnet::cpp::Operator::Invoke when failed (14318)
* build docs with CPP package (13983)
* Update inception_inference.cpp (14674)
* Optimize C++ API (13496)

Clojure
* [Clojure] - Add Spec Validations to the Optimizer namespace (13499)
* [Clojure] Add Spec Validations for the Random namespace (13523)
* [Clojure] Correct the versions in the README so they correspond to the latest maven.org release ([13507)
* Port of scala infer package to clojure (13595)
* Clojure example for fixed label-width captcha recognition (13769)
* Update project.clj file to use the snapshots repo to be able to pull (13935)
* [Clojure] Add resource scope to clojure package (13993)
* [clojure-package] improve docstrings in image.clj (14307)
* [Clojure] Helper function for n-dim vector to ndarray (14305)
* [clojure]: add comp-metric based on CompositeEvalMetric (14553)
* [Clojure] enhance draw bounding box (14567)
* [Clojure] Add methods based on NDArrayAPI/SymbolAPI (14195)
* [Clojure] Clojure BERT QA example (14691)
* [clojure-package][wip] add ->nd-vec function in ndarray.clj (14308)
* [Clojure] Correct the versions in the README so they correspond to the latest maven.org release (13507)
* Update version to v1.5.0 including clojure package (13566)
* [clojure][generator] ndarray/symbol api random merged (14800)
* upgrade codox to work with lein 2.9.0 (14133)
* [clojure] fix: image test does not rely on s3 to run (15122)

Julia
* Julia v0.7/1.0 support and drop v0.6 support (12845)
* Julia: split ndarray.jl into several snippets (14001)
* Julia: split symbolic-node.jl into several snippets (14024)
* Julia: rename mx.clip to clamp for NDArray (14027)
* Julia: add binding for runtime feature detection (13992)

Perl:
* Two more gluon loss classes. (14194)

R
* add NAG optimizer to r api (14023)
* R-Package Makefile (14068)

Performance Improvements

* Less cudaGet/SetDevice calls in Gluon execution (13764)
* Improve bulking in Gluon (13890)
* Increase perfomance of BulkAppend and BulkFlush (14067)
* Performance improvement in ToTensor GPU Kernel (14099)
* Performance improvement in Normalize GPU Kernel (14139)
* Bulked op segments to allow Variable nodes (14200)
* Performance improving for MKL-DNN Quantized FullyConnected (14528)
* speedup SequenceMask on GPU (14445)
* Dual stream cudnn Convolution backward() with MXNET_GPU_WORKER_NSTREAMS=2. (14006)
* Speedup `_contrib_index_copy` (14359)
* use mkl sparse matrix to improve performance (14492)
* Re-enable static cached_op optimization (14931)
* Speed up SequenceReverse (14627)
* Improve FC perf when no_bias=False (15033)
* Improve cached_op performance for static mode (14785)

Example and Tutorials

* [MXNET-949] Module API to Gluon API tutorial (12542)
* Support SSD f32/int8 evaluation on COCO dataset (14646)
* [MXNET-1209] Tutorial transpose reshape (13208)
* [Clojure] Add Fine Tuning Sentence Pair Classification BERT Example (14769)
* example/ssd/evaluate/eval_metric.py (14561)
* Add examples of running MXNet with Horovod (14286)
* Added link to landing page for Java examples (14481)
* Update lip reading example (13647)
* [MXNET-1121] Example to demonstrate the inference workflow using RNN (13680)
* [MXNET-1301] Remove the unnecessary WaitAll statements from inception_inference example (13972)
* Modifying clojure CNN text classification example (13865)
* [MXNET-1210 ] Gluon Audio - Example (13325)
* add examples and fix the dependency problem (13620)
* add quantization example to readme (14186)
* Add an inference script providing both accuracy and benchmark result for original wide_n_deep example (13895)
* Update autoencoder example (12933)
* 13813 examples with opencv4/origami (13813)
* [MXNET-1083] Add the example to demonstrate the inference workflow using C++ API (13294)
* Add tutorial on how to use build from source jar (14197)
* Gluon end to end tutorial (13411)
* Update MXNetTutorialTemplate.ipynb (13568)
* Simplifications and some fun stuff for the MNIST Gluon tutorial (13094)
* Clarify dependency on OpenCV in CNN Visualization tutorial. (13495)
* Update row_sparse tutorial (13414)
* add clojure tutorials to index (14814)
* Update lstm_crf.py (14865)

Website

* Version switching user experience improvements (13921)
* fix toctree Sphinx errors (13489)
* fix link (15036)
* fix website build (14148)
* Fixed mailing list addresses (13766)
* website publish updates (14015)
* use relative links; update links (13741)
* update social media section (13705)
* [MXNET] Updated http://data.dmlc.ml/ links to http://data.mxnet.io/ (#15065)

Documentation
* [MXNET-1402] MXNet docs change for 1.4.1 release (14949)
* Add API documentation for upsampling operator with examples (14919)
* Make docblocks for Gluon BatchNorm and SyncBatchNorm consistent with the code (14840)
* [DOC] Update ubuntu install instructions from source (14534)
* [Clojure] Better api docstrings by replacing newlines (14752)
* Fix documentation for bilinear upsampling and add unit test (14035)
* Updated docs for R-package installation (14269)
* [docstring] improve docstring and indentation in `module.clj` (14705)
* The folder python-howto was removed in an earlier commit. The reference to that folder was not removed. Making a PR to remove the reference to this folder to keep documents consistent (14573)
* Updated documentation about nightly tests (14493)
* [Doc] Start the tutorials for MKL-DNN backend (14202)
* [DOC] fix sym.arange doc (14237)
* fix render issue in NDArray linalg docs (14258)
* [clojure-package] fix docstrings in `normal.clj` (14295)
* [DOC] Refine documentation of runtime feature detection (14238)
* [MXNET-1178] updating scala docs (14070)
* Fix website scala doc (14065)
* Return value docs for nd.random.* and sym.random.* (13994)
* Fixing the doc for symbolic version of rand_zipfian (13978)
* fix doc of take operator (13947)
* beta doc fixes (13860)
* [MXNET-1255] update hybridize documentation (13597)
* Update Adam optimizer documentation (13754)
* local docs build feature (13682)
* gluon docfix (13631)
* Added javadocs and improved example instructions (13711)
* [MXNET-1164] Generate the document for cpp-package using Doxygen (12977)
* Fix warning in waitall doc (13618)
* Updated docs for randint operator (13541)
* Update java setup docs for 1.4.0 (13536)
* clarify ops faq regarding docs strings (13492)
* [MXNET-1158] JVM Memory Management Documentation (13105)
* Fixing a 404 in the ubuntu setup doc (13542)
* Fix READMEs for examples (14179)
* [Doc] Add MKL-DNN operator list (14891)
* Fixed some typos in AvgPooling Docs (14324)
* doc fix (13465)
* Change Straight Dope to Dive into Deep Learning (14465)
* [DEV] update code owner (14862)
* Add notes about debug with libstdc++ symbols (13533)
* Mention additional language bindings and add links (14798)
* add contributors from intel (14455)
* what's new - add 1.4.0 release (14435)
* added note about cuda9.2 requirement (14140)
* Remove unnecessary "also" in README.md (14543)
* Updated news.md with the latest mkldnn submodule version (14298)
* add new cloud providers to install page (14039)
* Update NOTICE (14043)
* Update README.md (13973)
* Update profiler doc (13901)
* Add CODEOWNERS for Julia package (13872)
* update code owner (13737)
* Update git clone location to apache github (13706)
* NEWS.md backport from v1.4.x to master (13693)
* Update CODEOWNERS, add Pedro Larroy. (13579)
* [MXNET-1225] Always use config.mk in make install instructions (13364)
* Docs & website sphinx errors squished 🌦 (13488)
* add Qing's Key to master (14180)
* add KEY for zachgk (14965)
* corrected a spellign (14247)

1.4.1

Not secure

Apache MXNet (incubating) 1.4.1 is a maintenance release incorporating important bug fixes and important performance improvements. All users of Apache MXNet (incubating) 1.4.0 are advised to upgrade. You can install Apache MXNet (incubating) 1.4.1 at the usual place. Please review these Release Notes to learn the bug fixes.

Bug-fixes
* Java bug-fix cherry pick (14834)
* Use DEFAULT macro in C APIs (14767) (14789)
* Set idx2name for Optimizer object (14703) (14772)
* Add pin_device_id option to Gluon DataLoader (14136) (14771)
* Tidy up storage allocation and deallocation (14480) (14768)
* Add MXEnginePushAsync and MXEnginePushSync C APIs (14615) (14770)
* Less cudaGet/SetDevice calls in Gluon execution (13764)
* Fix nightly build of 1.4.x (14556)
* Memory fixes. Resolves 10867, and resolves 14080 (14372) (14586)
* Fixes for data links (14526)
* Backport of Windows CI Fixes (14420)

Page 1 of 4

Releases

Has known vulnerabilities

Mxnet

Page 1 of 4

1.8.0

1.7.0

1.6.0

1.5.1

1.5.0

1.4.1

Page 1 of 4

Links

Releases