Changelogs » Numba

Numba

3.6

NEP 29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_.)

Highlights of core feature changes include:

* Full support for Python 3.8 (Siu Kwan Lam)
* Opt-in bounds checking (Aaron Meurer)
* Support for ``map``, ``filter`` and ``reduce`` (Stuart Archibald)

Intel also kindly sponsored research and development that lead to some exciting
new features:

* Initial support for basic ``try``/``except`` use (Siu Kwan Lam)
* The ability to pass functions created from closures/lambdas as arguments
(Stuart Archibald)
* ``sorted`` and ``list.sort()`` now accept the ``key`` argument (Stuart
Archibald and Siu Kwan Lam)
* A new compiler pass triggered through the use of the function
``numba.literal_unroll`` which permits iteration over heterogeneous tuples
and constant lists of constants. (Stuart Archibald)

Enhancements from user contributed PRs (with thanks!):

* Ankit Mahato added a reference to a new talk on Numba at PyCon India 2019 in
4862
* Brian Wignall kindly fixed some spelling mistakes and typos in 4909
* Denis Smirnov wrote numerous methods to considerable enhance string support
including:

* ``str.rindex()`` in 4861
* ``str.isprintable()`` in 4836
* ``str.index()`` in 4860
* ``start/end`` parameters for ``str.find()`` in 4866
* ``str.isspace()`` in 4835
* ``str.isidentifier()`` 4837
* ``str.rpartition()`` in 4841
* ``str.lower()`` and ``str.islower()`` in 4651

* Elena Totmenina implemented both ``str.isalnum()``, ``str.isalpha()`` and
``str.isascii`` in 4839, 4840 and 4847 respectively.
* Eric Larson fixed a bug in literal comparison in 4710
* Ethan Pronovost updated the ``np.arange`` implementation in 4770 to allow
the use of the ``dtype`` key word argument and also added ``bool``
implementations for several types in 4715.
* Graham Markall fixed some issues with the CUDA target, namely:

* 4931: Added physical limits for CC 7.0 / 7.5 to CUDA autotune
* 4934: Fixed bugs in TestCudaWarpOperations
* 4938: Improved errors / warnings for the CUDA vectorize decorator

* Guilherme Leobas fixed a typo in the ``urem`` implementation in 4667
* Isaac Virshup contributed a number of patches that fixed bugs, added support
for more NumPy functions and enhanced Python feature support. These
contributions included:

* 4729: Allow array construction with mixed type shape tuples
* 4904: Implementing ``np.lcm``
* 4780: Implement np.gcd and math.gcd
* 4779: Make slice constructor more similar to python.
* 4707: Added support for slice.indices
* 4578: Clarify numba ufunc supported features

* James Bourbeau fixed some issues with tooling, 4794 add ``setuptools`` as a
dependency and 4501 add pre-commit hooks for ``flake8`` compliance.
* Leo Fang made ``numba.dummyarray.Array`` iterable in 4629
* Marc Garcia fixed the ``numba.jit`` parameter name signature_or_function in
4703
* Marcelo Duarte Trevisani patched the llvmlite requirement to ``>=0.30.0`` in
4725
* Matt Cooper fixed a long standing CI problem in 4737 by remove maxParallel
* Matti Picus fixed an issue with ``collections.abc`` in 4734
from Azure Pipelines.
* Rob Ennis patched a bug in ``np.interp`` ``float32`` handling in 4911
* VDimir fixed a bug in array transposition layouts in 4777 and re-enabled and
fixed some idle tests in 4776.
* Vyacheslav Smirnov Enable support for `str.istitle()`` in 4645

General Enhancements:

* PR 4432: Bounds checking
* PR 4501: Add pre-commit hooks
* PR 4536: Handle kw args in inliner when callee is a function
* PR 4599: Permits closures to become functions, enables map(), filter()
* PR 4611: Implement method title() for unicode based on Cpython
* PR 4645: Enable support for istitle() method for unicode string
* PR 4651: Implement str.lower() and str.islower()
* PR 4652: Implement str.rfind()
* PR 4695: Refactor `overload*` and support `jit_options` and `inline`
* PR 4707: Added support for slice.indices
* PR 4715: Add `bool` overload for several types
* PR 4729: Allow array construction with mixed type shape tuples
* PR 4755: Python3.8 support
* PR 4756: Add parfor support for ndarray.fill.
* PR 4768: Update typeconv error message to ask for sys.executable.
* PR 4770: Update `np.arange` implementation with `overload`
* PR 4779: Make slice constructor more similar to python.
* PR 4780: Implement np.gcd and math.gcd
* PR 4794: Add setuptools as a dependency
* PR 4802: put git hash into build string
* PR 4803: Better compiler error messages for improperly used reduction
variables.
* PR 4817: Typed list implement and expose allocation
* PR 4818: Typed list faster copy
* PR 4835: Implement str.isspace() based on CPython
* PR 4836: Implement str.isprintable() based on CPython
* PR 4837: Implement str.isidentifier() based on CPython
* PR 4839: Implement str.isalnum() based on CPython
* PR 4840: Implement str.isalpha() based on CPython
* PR 4841: Implement str.rpartition() based on CPython
* PR 4847: Implement str.isascii() based on CPython
* PR 4851: Add graphviz output for FunctionIR
* PR 4854: Python3.8 looplifting
* PR 4858: Implement str.expandtabs() based on CPython
* PR 4860: Implement str.index() based on CPython
* PR 4861: Implement str.rindex() based on CPython
* PR 4866: Support params start/end for str.find()
* PR 4874: Bump to llvmlite 0.31
* PR 4896: Specialise arange dtype on arch + python version.
* PR 4902: basic support for try except
* PR 4904: Implement np.lcm
* PR 4910: loop canonicalisation and type aware tuple unroller/loop body
versioning passes
* PR 4961: Update hash(tuple) for Python 3.8.
* PR 4977: Implement sort/sorted with key.
* PR 4987: Add `is_internal` property to all Type classes.

Fixes:

* PR 4090: Update to LLVM8 memset/memcpy intrinsic
* PR 4582: Convert sub to add and div to mul when doing the reduction across
the per-thread reduction array.
* PR 4648: Handle 0 correctly as slice parameter.
* PR 4660: Remove multiply defined variables from all blocks' equivalence sets.
* PR 4672: Fix pickling of dufunc
* PR 4710: BUG: Comparison for literal
* PR 4718: Change get_call_table to support intermediate Vars.
* PR 4725: Requires  llvmlite >=0.30.0
* PR 4734: prefer to import from collections.abc
* PR 4736: fix flake8 errors
* PR 4776: Fix and enable idle tests from test_array_manipulation
* PR 4777: Fix transpose output array layout
* PR 4782: Fix issue with SVML (and knock-on function resolution effects).
* PR 4785: Treat 0d arrays like scalars.
* PR 4787: fix missing incref on flags
* PR 4789: fix typos in numba/targets/base.py
* PR 4791: fix typos
* PR 4811: fix spelling in now-failing tests
* PR 4852: windowing test should check equality only up to double precision
errors
* PR 4881: fix refining list by using extend on an iterator
* PR 4882: Fix return type in arange and zero step size handling.
* PR 4885: suppress spurious RuntimeWarning about ufunc sizes
* PR 4891: skip the xfail test for now.  Py3.8 CFG refactor seems to have
changed the test case
* PR 4892: regex needs to accept singular form of "argument"
* PR 4901: fix typed list equals
* PR 4909: Fix some spelling typos
* PR 4911: np.interp bugfix for float32 handling
* PR 4920: fix creating list with JIT disabled
* PR 4921: fix creating dict with JIT disabled
* PR 4935: Better handling of prange with multiple reductions on the same
variable.
* PR 4946: Improve the error message for `raise <string>`.
* PR 4955: Move overload of literal_unroll to avoid circular dependency that
breaks Python 2.7
* PR 4962: Fix test error on windows
* PR 4973: Fixes a bug in the relabelling logic in literal_unroll.
* PR 4978: Fix overload_method problem with stararg
* PR 4981: Add ind_to_const to enable fewer equivalence classes.
* PR 4991: Continuation of 4588 (Let dead code removal handle removing more of
the unneeded code after prange conversion to parfor)
* PR 4994: Remove xfail for test which has since had underlying issue fixed.
* PR 5018: Fix 5011.
* PR 5019: skip pycc test on Python 3.8 + macOS because of distutils issue

CUDA Enhancements/Fixes:

* PR 4629: Make numba.dummyarray.Array iterable
* PR 4675: Bump cuda array interface to version 2
* PR 4741: Update choosing the "CUDA_PATH" for windows
* PR 4838: Permit ravel('A') for contig device arrays in CUDA target
* PR 4931: Add physical limits for CC 7.0 / 7.5 to autotune
* PR 4934: Fix fails in TestCudaWarpOperations
* PR 4938: Improve errors / warnings for cuda vectorize decorator

Documentation Updates:

* PR 4418: Directed graph task roadmap
* PR 4578: Clarify numba ufunc supported features
* PR 4655: fix sphinx build warning
* PR 4667: Fix typo on urem implementation
* PR 4669: Add link to ParallelAccelerator paper.
* PR 4703: Fix numba.jit parameter name signature_or_function
* PR 4862: Addition of PyCon India 2019 talk on Numba
* PR 4947: Document jitclass with numba.typed use.
* PR 4958: Add docs for `try..except`
* PR 4993: Update deprecations for 0.47

CI Updates:

* PR 4737: remove maxParallel from Azure Pipelines
* PR 4767: pin to 2.7.16 for py27 on osx
* PR 4781: WIP/runtest cf pytest

Authors:

* Aaron Meurer
* Ankit Mahato
* Brian Wignall
* Denis Smirnov
* Ehsan Totoni (core dev)
* Elena Totmenina
* Eric Larson
* Ethan Pronovost
* Giovanni Cavallin
* Graham Markall
* Guilherme Leobas
* Isaac Virshup
* James Bourbeau
* Leo Fang
* Marc Garcia
* Marcelo Duarte Trevisani
* Matt Cooper
* Matti Picus
* Rob Ennis
* Rujal Desai
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)
* VDimir
* Valentin Haenel (core dev)
* Vyacheslav Smirnov

3.5.1

Updated to 3.5.1 with the same ELF relocation patched for v0.2.2.

3.5

The binaries from the numba binstar channel use a patched LLVM3.5 for fixing
a LLVM ELF relocation bug that is caused by the use of 32-bit relative offset
in 64-bit binaries.  The problem appears to occur more often on hardened
kernels, like in CentOS.  The patched source code is available at:
https://github.com/numba/llvm-mirror/releases/tag/3.5p1

1.17

* PR 4325: accept scalar/0d-arrays
* PR 4338: Fix 4299. Parfors reduction vars not deleted.
* PR 4350: Use process level locks for fork() only.
* PR 4354: Try to fix 4352.
* PR 4357: Fix np1.17 isnan, isinf, isfinite ufuncs
* PR 4363: Fix np.interp for np1.17 nan handling
* PR 4371: Fix nump1.17 random function non-aliasing

Contributors:

* Siu Kwan Lam (core dev)
* Stuart Archibald (core dev)
* Valentin Haenel (core dev)

0.50.0

--------------

In development

0.49.1

----------------------------

This is a bugfix release for 0.49.0, it fixes some residual issues with SSA
form, a critical bug in the branch pruning logic and a number of other smaller
issues:

* PR 5587: Fixed 5586 Threading Implementation Typos
* PR 5592: Fixes 5583 Remove references to cffi_support from docs and examples
* PR 5614: Fix invalid type in resolve for comparison expr in parfors.
* PR 5624: Fix erroneous rewrite of predicate to bit const on prune.
* PR 5627: Fixes 5623, SSA local def scan based on invalid equality
assumption.
* PR 5629: Fixes naming error in array_exprs
* PR 5630: Fix 5570. Incorrect race variable detection due to SSA naming.
* PR 5638: Make literal_unroll function work as a freevar.
* PR 5648: Unset the memory manager after EMM Plugin tests
* PR 5651: Fix some SSA issues
* PR 5652: Pin to sphinx=2.4.4 to avoid problem with C declaration
* PR 5658: Fix unifying undefined first class function types issue
* PR 5669: Update example in 5m guide WRT SSA type stability.
* PR 5676: Restore ``numba.types`` as public API

Authors:

* Graham Markall
* Juan Manuel Cruz Martinez
* Pearu Peterson
* Sean Law
* Stuart Archibald (core dev)
* Siu Kwan Lam (core dev)

0.49.0

-----------------------------

This release is very large in terms of code changes. Large scale removal of
unsupported Python and NumPy versions has taken place along with a significant
amount of refactoring to simplify the Numba code base to make it easier for
contributors. Numba's intermediate representation has also undergone some
important changes to solve a number of long standing issues. In addition some
new features have been added and a large number of bugs have been fixed!

IMPORTANT: In this release Numba's internals have moved about a lot. A backwards
compatibility "shim" is provided for this release so as to not immediately break
projects using Numba's internals. If a module is imported from a moved location
the shim will issue a deprecation warning and suggest how to update the import
statement for the new location. The shim will be removed in 0.50.0!

Highlights of core feature changes include:

* Removal of all Python 2 related code and also updating the minimum supported
Python version to 3.6, the minimum supported NumPy version to 1.15 and the
minimum supported SciPy version to 1.0. (Stuart Archibald).
* Refactoring of the Numba code base. The code is now organised into submodules
by functionality. This cleans up Numba's top level namespace.
(Stuart Archibald).
* Introduction of an ``ir.Del`` free static single assignment form for Numba's
intermediate representation (Siu Kwan Lam and Stuart Archibald).
* An OpenMP-like thread masking API has been added for use with code using the
parallel CPU backends (Aaron Meurer and Stuart Archibald).
* For the CUDA target, all kernel launches now require a configuration, this
preventing accidental launches of kernels with the old default of a single
thread in a single block. The hard-coded autotuner is also now removed, such
tuning is deferred to CUDA API calls that provide the same functionality
(Graham Markall).
* The CUDA target also gained an External Memory Management plugin interface to
allow Numba to use another CUDA-aware library for all memory allocations and
deallocations (Graham Markall).
* The Numba Typed List container gained support for construction from iterables
(Valentin Haenel).
* Experimental support was added for first-class function types
(Pearu Peterson).

Enhancements from user contributed PRs (with thanks!):

* Aaron Meurer added support for thread masking at runtime in 4615.
* Andreas Sodeur fixed a long standing bug that was preventing ``cProfile`` from
working with Numba JIT compiled functions in 4476.
* Arik Funke fixed error messages in ``test_array_reductions`` (5278), fixed
an issue with test discovery (5239), made it so the documentation would build
again on windows (5453) and fixed a nested list problem in the docs in 5489.
* Antonio Russo fixed a SyntaxWarning in 5252.
* Eric Wieser added support for inferring the types of object arrays (5348) and
iterating over 2D arrays (5115), also fixed some compiler warnings due to
missing (void) in 5222. Also helped improved the "shim" and associated
warnings in 5485, 5488, 5498 and partly 5532.
* Ethan Pronovost fixed a problem with the shim erroneously warning for jitclass
use in 5454 and also prevented illegal return values in jitclass ``__init__``
in 5505.
* Gabriel Majeri added SciPy 2019 talks to the docs in 5106.
* Graham Markall changed the Numba HTML documentation theme to resolve a number
of long standing issues in 5346. Also contributed were a large number of CUDA
enhancements and fixes, namely:

* 5519: CUDA: Silence the test suite - Fix 4809, remove autojit, delete
prints
* 5443: Fix 5196: Docs: assert in CUDA only enabled for debug
* 5436: Fix 5408: test_set_registers_57 fails on Maxwell
* 5423: Fix 5421: Add notes on printing in CUDA kernels
* 5400: Fix 4954, and some other small CUDA testsuite fixes
* 5328: NBEP 7: External Memory Management Plugin Interface
* 5144: Fix 4875: Make 2655 test with debug expect to pass
* 5323: Document lifetime semantics of CUDA Array Interface
* 5061: Prevent kernel launch with no configuration, remove autotuner
* 5099: Fix 5073: Slices of dynamic shared memory all alias
* 5136: CUDA: Enable asynchronous operations on the default stream
* 5085: Support other itemsizes with view
* 5059: Docs: Explain how to use Memcheck with Numba, fixups in CUDA
documentation
* 4957: Add notes on overwriting gufunc inputs to docs

* Greg Jennings fixed an issue with ``np.random.choice`` not acknowledging the
RNG seed correctly in 3897/5310.
* Guilherme Leobas added support for ``np.isnat`` in 5293.
* Henry Schreiner made the llvmlite requirements more explicit in
requirements.txt in 5150.
* Ivan Butygin helped fix an issue with parfors sequential lowering in
5114/5250.
* Jacques Gaudin fixed a bug for Python >= 3.8 in ``numba -s`` in 5548.
* Jim Pivarski added some hints for debugging entry points in 5280.
* John Kirkham added ``numpy.dtype`` coercion for the ``dtype`` argument to CUDA
device arrays in 5252.
* Leo Fang added a list of libraries that support ``__cuda_array_interface__``
in 5104.
* Lucio Fernandez-Arjona added ``getitem`` for the NumPy record type when the
index is a ``StringLiteral`` type in 5182 and improved the documentation
rendering via additions to the TOC and removal of numbering in 5450.
* Mads R. B. Kristensen fixed an issue with ``__cuda_array_interface__`` not
requiring the context in 5189.
* Marcin Tolysz added support for nested modules in AOT compilation in 5174.
* Mike Williams fixed some issues with NumPy records and ``getitem`` in the CUDA
simulator in 5343.
* Pearu Peterson added experimental support for first-class function types in
5287 (and fixes in 5459, 5473/5429, and 5557).
* Ravi Teja Gutta added support for ``np.flip`` in 4376/5313.
* Rohit Sanjay fixed an issue with type refinement for unicode input supplied to
typed-list ``extend()`` (5295) and fixed unicode ``.strip()`` to strip all
whitespace characters in 5213.
* Vladimir Lukyanov fixed an awkward bug in ``typed.dict`` in 5361, added a fix
to ensure the LLVM and assembly dumps are highlighted correctly in 5357 and
implemented a Numba IR Lexer and added highlighting to Numba IR dumps in
5333.
* hdf fixed an issue with the ``boundscheck`` flag in the CUDA jit target in
5257.

General Enhancements:

* PR 4615: Allow masking threads out at runtime
* PR 4798: Add branch pruning based on raw predicates.
* PR 5115: Add support for iterating over 2D arrays
* PR 5117: Implement ord()/chr()
* PR 5122: Remove Python 2.
* PR 5127: Calling convention adaptor for boxer/unboxer to call jitcode
* PR 5151: implement None-typed typed-list
* PR 5174: Nested modules https://github.com/numba/numba/issues/4739
* PR 5182: Add getitem for Record type when index is StringLiteral
* PR 5185: extract code-gen utilities from closures
* PR 5197: Refactor Numba, part I
* PR 5210: Remove more unsupported Python versions from build tooling.
* PR 5212: Adds support for viewing the CFG of the ELF disassembly.
* PR 5227: Immutable typed-list
* PR 5231: Added support for ``np.asarray`` to be used with
``numba.typed.List``
* PR 5235: Added property ``dtype`` to ``numba.typed.List``
* PR 5272: Refactor parfor: split up ParforPass
* PR 5281: Make IR ir.Del free until legalized.
* PR 5287: First-class function type
* PR 5293: np.isnat
* PR 5294: Create typed-list from iterable
* PR 5295: refine typed-list on unicode input to extend
* PR 5296: Refactor parfor: better exception from passes
* PR 5308: Provide ``numba.extending.is_jitted``
* PR 5320: refactor array_analysis
* PR 5325: Let literal_unroll accept types.Named*Tuple
* PR 5330: refactor common operation in parfor lowering into a new util
* PR 5333: Add: highlight Numba IR dump
* PR 5342: Support for tuples passed to parfors.
* PR 5348: Add support for inferring the types of object arrays
* PR 5351: SSA again
* PR 5352: Add shim to accommodate refactoring.
* PR 5356: implement allocated parameter in njit
* PR 5369: Make test ordering more consistent across feature availability
* PR 5428: Wip/deprecate jitclass location
* PR 5441: Additional changes to first class function
* PR 5455: Move to llvmlite 0.32.*
* PR 5457: implement repr for untyped lists

Fixes:

* PR 4476: Another attempt at fixing frame injection in the dispatcher tracing
path
* PR 4942: Prevent some parfor aliasing.  Rename copied function var to prevent
recursive type locking.
* PR 5092: Fix 5087
* PR 5150: More explicit llvmlite requirement in requirements.txt
* PR 5172: fix version spec for llvmlite
* PR 5176: Normalize kws going into fold_arguments.
* PR 5183: pass 'inline' explicitly to overload
* PR 5193: Fix CI failure due to missing files when installed
* PR 5213: Fix ``.strip()`` to strip all whitespace characters
* PR 5216: Fix namedtuple mistreated by dispatcher as simple tuple
* PR 5222: Fix compiler warnings due to missing (void)
* PR 5232: Fixes a bad import that breaks master
* PR 5239: fix test discovery for unittest
* PR 5247: Continue PR 5126
* PR 5250: Part fix/5098
* PR 5252: Trivially fix SyntaxWarning
* PR 5276: Add prange variant to has_no_side_effect.
* PR 5278: fix error messages in test_array_reductions
* PR 5310: PR 3897 continued
* PR 5313: Continues PR 4376
* PR 5318: Remove AUTHORS file reference from MANIFEST.in
* PR 5327: Add warning if FNV hashing is found as the default for CPython.
* PR 5338: Remove refcount pruning pass
* PR 5345: Disable test failing due to removed pass.
* PR 5357: Small fix to have llvm and asm highlighted properly
* PR 5361: 5081 typed.dict
* PR 5431: Add tolerance to numba extension module entrypoints.
* PR 5432: Fix code causing compiler warnings.
* PR 5445: Remove undefined variable
* PR 5454: Don't warn for numba.experimental.jitclass
* PR 5459: Fixes issue 5448
* PR 5480: Fix for 5477, literal_unroll KeyError searching for getitems
* PR 5485: Show the offending module in "no direct replacement" error message
* PR 5488: Add missing ``numba.config`` shim
* PR 5495: Fix missing null initializer for variable after phi strip
* PR 5498: Make the shim deprecation warnings work on python 3.6 too
* PR 5505: Better error message if __init__ returns value
* PR 5527: Attempt to fix 5518
* PR 5529: PR 5473 continued
* PR 5532: Make ``numba.<mod>`` available without an import
* PR 5542: Fixes RC2 module shim bug
* PR 5548: Fix 5537 Removed reference to ``platform.linux_distribution``
* PR 5555: Fix 5515 by reverting changes to ArrayAnalysis
* PR 5557: First-class function call cannot use keyword arguments
* PR 5569: Fix RewriteConstGetitems not registering calltype for new expr
* PR 5571: Pin down llvmlite requirement

CUDA Enhancements/Fixes:

* PR 5061: Prevent kernel launch with no configuration, remove autotuner
* PR 5085: Support other itemsizes with view
* PR 5099: Fix 5073: Slices of dynamic shared memory all alias
* PR 5104: Add a list of libraries that support __cuda_array_interface__
* PR 5136: CUDA: Enable asynchronous operations on the default stream
* PR 5144: Fix 4875: Make 2655 test with debug expect to pass
* PR 5189: __cuda_array_interface__ not requiring context
* PR 5253: Coerce ``dtype`` to ``numpy.dtype``
* PR 5257: boundscheck fix
* PR 5319: Make user facing error string use abs path not rel.
* PR 5323: Document lifetime semantics of CUDA Array Interface
* PR 5328: NBEP 7: External Memory Management Plugin Interface
* PR 5343: Fix cuda spoof
* PR 5400: Fix 4954, and some other small CUDA testsuite fixes
* PR 5436: Fix 5408: test_set_registers_57 fails on Maxwell
* PR 5519: CUDA: Silence the test suite - Fix 4809, remove autojit, delete
prints

Documentation Updates:

* PR 4957: Add notes on overwriting gufunc inputs to docs
* PR 5059: Docs: Explain how to use Memcheck with Numba, fixups in CUDA
documentation
* PR 5106: Add SciPy 2019 talks to docs
* PR 5147: Update master for 0.48.0 updates
* PR 5155: Explain what inlining at Numba IR level will do
* PR 5161: Fix README.rst formatting
* PR 5207: Remove AUTHORS list
* PR 5249: fix target path for See also
* PR 5262: fix typo in inlining docs
* PR 5270: fix 'see also' in typeddict docs
* PR 5280: Added some hints for debugging entry points.
* PR 5297: Update docs with intro to {g,}ufuncs.
* PR 5326: Update installation docs with OpenMP requirements.
* PR 5346: Docs: use sphinx_rtd_theme
* PR 5366: Remove reference to Python 2.7 in install check output
* PR 5423: Fix 5421: Add notes on printing in CUDA kernels
* PR 5438: Update package deps for doc building.
* PR 5440: Bump deprecation notices.
* PR 5443: Fix 5196: Docs: assert in CUDA only enabled for debug
* PR 5450: Docs: remove numbers and add titles to TOC
* PR 5453: fix building docs on windows
* PR 5489: docs: fix rendering of nested bulleted list

CI updates:

* PR 5314: Update the image used in Azure CI for OSX.
* PR 5360: Remove Travis CI badge.

Authors:

* Aaron Meurer
* Andreas Sodeur
* Antonio Russo
* Arik Funke
* Eric Wieser
* Ethan Pronovost
* Gabriel Majeri
* Graham Markall
* Greg Jennings
* Guilherme Leobas
* hdf
* Henry Schreiner
* Ivan Butygin
* Jacques Gaudin
* Jim Pivarski
* John Kirkham
* Leo Fang
* Lucio Fernandez-Arjona
* Mads R. B. Kristensen
* Marcin Tolysz
* Mike Williams
* Pearu Peterson
* Ravi Teja Gutta
* Rohit Sanjay
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)
* Valentin Haenel (core dev)
* Vladimir Lukyanov

0.48.0

-----------------------------

This release is particularly small as it was present to catch anything that
missed the 0.47.0 deadline (the deadline deliberately coincided with the end of
support for Python 2.7). The next release will be considerably larger.

The core changes in this release are dominated by the start of the clean up
needed for the end of Python 2.7 support, improvements to the CUDA target and
support for numerous additional unicode string methods.

Enhancements from user contributed PRs (with thanks!):

* Brian Wignall fixed more spelling typos in 4998.
* Denis Smirnov added support for string methods ``capitalize`` (4823),
``casefold`` (4824), ``swapcase`` (4825), ``rsplit`` (4834), ``partition``
(4845) and ``splitlines`` (4849).
* Elena Totmenina extended support for string methods ``startswith`` (4867) and
added ``endswith`` (4868).
* Eric Wieser made ``type_callable`` return the decorated function itself in
4760
* Ethan Pronovost added support for ``np.argwhere`` in 4617
* Graham Markall contributed a large number of CUDA enhancements and fixes,
namely:

* 5068: Remove Python 3.4 backports from utils
* 4975: Make ``device_array_like`` create contiguous arrays (Fixes 4832)
* 5023: Don't launch ForAll kernels with 0 elements (Fixes 5017)
* 5016: Fix various issues in CUDA library search (Fixes 4979)
* 5014: Enable use of records and bools for shared memory, remove ddt, add
additional transpose tests
* 4964: Fix 4628: Add more appropriate typing for CUDA device arrays
* 5007: test_consuming_strides: Keep dev array alive
* 4997: State that CUDA Toolkit 8.0 required in docs

* James Bourbeau added the Python 3.8 classifier to setup.py in 5027.
* John Kirkham added a clarification to the ``__cuda_array_interface__``
documentation in 5049.
* Leo Fang Fixed an indexing problem in ``dummyarray`` in 5012.
* Marcel Bargull fixed a build and test issue for Python 3.8 in 5029.
* Maria Rubtsov added support for string methods ``isdecimal`` (4842),
``isdigit`` (4843), ``isnumeric`` (4844) and ``replace`` (4865).

General Enhancements:

* PR 4760: Make type_callable return the decorated function
* PR 5010: merge string prs

This merge PR included the following:

* PR 4823: Implement str.capitalize() based on CPython
* PR 4824: Implement str.casefold() based on CPython
* PR 4825: Implement str.swapcase() based on CPython
* PR 4834: Implement str.rsplit() based on CPython
* PR 4842: Implement str.isdecimal
* PR 4843: Implement str.isdigit
* PR 4844: Implement str.isnumeric
* PR 4845: Implement str.partition() based on CPython
* PR 4849: Implement str.splitlines() based on CPython
* PR 4865: Implement str.replace
* PR 4867: Functionality extension str.startswith() based on CPython
* PR 4868: Add functionality for str.endswith()

* PR 5039: Disable help messages.
* PR 4617: Add coverage for ``np.argwhere``

Fixes:

* PR 4724: Only use lives (and not aliases) to create post parfor live set.
* PR 4998: Fix more spelling typos
* PR 5024: Propagate semantic constants ahead of static rewrites.
* PR 5027: Add Python 3.8 classifier to setup.py
* PR 5046: Update setup.py and buildscripts for dependency requirements
* PR 5053: Convert from arrays to names in define() and don't invalidate for
multiple consistent defines.
* PR 5058: Permit mixed int types in wrap_index
* PR 5078: Catch the use of global typed-list in JITed functions
* PR 5092: Fix 5087, bug in bytecode analysis.

CUDA Enhancements/Fixes:

* PR 4964: Fix 4628: Add more appropriate typing for CUDA device arrays
* PR 4975: Make ``device_array_like`` create contiguous arrays (Fixes 4832)
* PR 4997: State that CUDA Toolkit 8.0 required in docs
* PR 5007: test_consuming_strides: Keep dev array alive
* PR 5012: Fix IndexError when accessing the "-1" element of dummyarray
* PR 5014: Enable use of records and bools for shared memory, remove ddt, add
additional transpose tests
* PR 5016: Fix various issues in CUDA library search (Fixes 4979)
* PR 5023: Don't launch ForAll kernels with 0 elements (Fixes 5017)
* PR 5068: Remove Python 3.4 backports from utils

Documentation Updates:

* PR 5049: Clarify what dictionary means
* PR 5062: Update docs for updated version requirements
* PR 5090: Update deprecation notices for 0.48.0

CI updates:

* PR 5029: Install optional dependencies for Python 3.8 tests
* PR 5040: Drop Py2.7 and Py3.5 from public CI
* PR 5048: Fix CI py38

Authors:

* Brian Wignall
* Denis Smirnov
* Elena Totmenina
* Eric Wieser
* Ethan Pronovost
* Graham Markall
* James Bourbeau
* John Kirkham
* Leo Fang
* Marcel Bargull
* Maria Rubtsov
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)
* Valentin Haenel (core dev)

0.47.0

-----------------------------

This release expands the capability of Numba in a number of important areas and
is also significant as it is the last major point release with support for
Python 2 and Python 3.5 included. The next release (0.48.0) will be for Python

0.46.0

--------------

This release significantly reworked one of the main parts of Numba, the compiler
pipeline, to make it more extensible and easier to use. The purpose of this was
to continue enhancing Numba's ability for use as a compiler toolkit. In a
similar vein, Numba now has an extension registration mechanism to allow other
Numba-using projects to automatically have their Numba JIT compilable functions
discovered. There were also a number of other related compiler toolkit
enhancement added along with some more NumPy features and a lot of bug fixes.

This release has updated the CUDA Array Interface specification to version 2,
which clarifies the `strides` attribute for C-contiguous arrays and specifies
the treatment for zero-size arrays. The implementation in Numba has been
changed and may affect downstream packages relying on the old behavior
(see issue 4661).

Enhancements from user contributed PRs (with thanks!):

* Aaron Meurer fixed some Python issues in the code base in 4345 and 4341.
* Ashwin Srinath fixed a CUDA performance bug via 4576.
* Ethan Pronovost added support for triangular indices functions in 4601 (the
NumPy functions ``tril_indices``, ``tril_indices_from``, ``triu_indices``, and
``triu_indices_from``).
* Gerald Dalley fixed a tear down race occurring in Python 2.
* Gregory R. Lee fixed the use of deprecated ``inspect.getargspec``.
* Guilherme Leobas contributed five PRs, adding support for ``np.append`` and
``np.count_nonzero`` in 4518 and 4386. The typed List was fixed to accept
unsigned integers in 4510. 4463 made a fix to NamedTuple internals and 4397
updated the docs for ``np.sum``.
* James Bourbeau added a new feature to permit the automatic application of the
`jit` decorator to a whole module in 4331. Also some small fixes to the docs
and the code base were made in 4447 and 4433, and a fix to inplace array
operation in 4228.
* Jim Crist fixed a bug in the rendering of patched errors in 4464.
* Leo Fang updated the CUDA Array Interface contract in 4609.
* Pearu Peterson added support for Unicode based NumPy arrays in 4425.
* Peter Andreas Entschev fixed a CUDA concurrency bug in 4581.
* Lucio Fernandez-Arjona extended Numba's ``np.sum`` support to now accept the
``dtype`` kwarg in 4472.
* Pedro A. Morales Maries added support for ``np.cross`` in 4128 and also added
the necessary extension ``numba.numpy_extensions.cross2d`` in 4595.
* David Hoese, Eric Firing, Joshua Adelman, and Juan Nunez-Iglesias all made
documentation fixes in 4565, 4482, 4455, 4375 respectively.
* Vyacheslav Smirnov and Rujal Desai enabled support for ``count()`` on unicode
strings in 4606.

General Enhancements:

* PR 4113: Add rewrite for semantic constants.
* PR 4128: Add np.cross support
* PR 4162: Make IR comparable and legalize it.
* PR 4208: R&D inlining, jitted and overloaded.
* PR 4331: Automatic JIT of called functions
* PR 4353: Inspection tool to check what numba supports
* PR 4386: Implement np.count_nonzero
* PR 4425: Unicode array support
* PR 4427: Entrypoints for numba extensions
* PR 4467: Literal dispatch
* PR 4472: Allow dtype input argument in np.sum
* PR 4513: New compiler.
* PR 4518: add support for np.append
* PR 4554: Refactor NRT C-API
* PR 4556: 0.46 scheduled deprecations
* PR 4567: Add env var to disable performance warnings.
* PR 4568: add np.array_equal support
* PR 4595: Implement numba.cross2d
* PR 4601: Add triangular indices functions
* PR 4606: Enable support for count() method for unicode string

Fixes:

* PR 4228: Fix inplace operator error for arrays
* PR 4282: Detect and raise unsupported on generator expressions
* PR 4305: Don't allow the allocation of mutable objects written into a
container to be hoisted.
* PR 4311: Avoid deprecated use of inspect.getargspec
* PR 4328:  Replace GC macro with function call
* PR 4330: Loosen up typed container casting checks
* PR 4341: Fix some coding lines at the top of some files (utf8 -> utf-8)
* PR 4345: Replace "import \*" with explicit imports in numba/types
* PR 4346: Fix incorrect alg in isupper for ascii strings.
* PR 4349: test using jitclass in typed-list
* PR 4361: Add allocation hoisting info to LICM section at diagnostic L4
* PR 4366: Offset search box to avoid wrapping on some pages with Safari.
Fixes 4365.
* PR 4372: Replace all "except BaseException" with "except Exception".
* PR 4407: Restore the "free" conda channel for NumPy 1.10 support.
* PR 4408: Add lowering for constant bytes.
* PR 4409: Add exception chaining for better error context
* PR 4411: Name of type should not contain user facing description for debug.
* PR 4412: Fix 4387. Limit the number of return types for recursive functions
* PR 4426: Fixed two module teardown races in py2.
* PR 4431: Fix and test numpy.random.random_sample(n) for np117
* PR 4463: NamedTuple - Raises an error on non-iterable elements
* PR 4464: Add a newline in patched errors
* PR 4474: Fix liveness for remove dead of parfors (and other IR extensions)
* PR 4510: Make List.__getitem__ accept unsigned parameters
* PR 4512: Raise specific error at typing time for iteration on >1D array.
* PR 4532: Fix static_getitem with Literal type as index
* PR 4547: Update to inliner cost model information.
* PR 4557: Use specific random number seed when generating arbitrary test data
* PR 4559: Adjust test timeouts
* PR 4564: Skip unicode array tests on ppc64le that trigger an LLVM bug
* PR 4621: Fix packaging issue due to missing numba/cext
* PR 4623: Fix issue 4520 due to storage model mismatch
* PR 4644: Updates for llvmlite 0.30.0

CUDA Enhancements/Fixes:

* PR 4410: Fix 4111. cudasim mishandling recarray
* PR 4576: Replace use of `np.prod` with `functools.reduce` for computing size
from shape
* PR 4581: Prevent taking the GIL in ForAll
* PR 4592: Fix 4589.  Just pass NULL for b2d_func for constant dynamic
sharedmem
* PR 4609: Update CUDA Array Interface & Enforce Numba compliance
* PR 4619: Implement math.{degrees, radians} for the CUDA target.
* PR 4675: Bump cuda array interface to version 2

Documentation Updates:

* PR 4317: Add docs for ARMv8/AArch64
* PR 4318: Add supported platforms to the docs.  Closes 4316
* PR 4375: Add docstrings to inspect methods
* PR 4388: Update Python 2.7 EOL statement
* PR 4397: Add note about np.sum
* PR 4447: Minor parallel performance tips edits
* PR 4455: Clarify docs for typed dict with regard to arrays
* PR 4482: Fix example in guvectorize docstring.
* PR 4541: fix two typos in architecture.rst
* PR 4548: Document numba.extending.intrinsic and inlining.
* PR 4565: Fix typo in jit-compilation docs
* PR 4607: add dependency list to docs
* PR 4614: Add documentation for implementing new compiler passes.

CI Updates:

* PR 4415: Make 32bit incremental builds on linux not use free channel
* PR 4433: Removes stale azure comment
* PR 4493: Fix Overload Inliner wrt CUDA Intrinsics
* PR 4593: Enable Azure CI batching

Contributors:

* Aaron Meurer
* Ashwin Srinath
* David Hoese
* Ehsan Totoni (core dev)
* Eric Firing
* Ethan Pronovost
* Gerald Dalley
* Gregory R. Lee
* Guilherme Leobas
* James Bourbeau
* Jim Crist
* Joshua Adelman
* Juan Nunez-Iglesias
* Leo Fang
* Lucio Fernandez-Arjona
* Pearu Peterson
* Pedro A. Morales Marie
* Peter Andreas Entschev
* Rujal Desai
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)
* Valentin Haenel (core dev)
* Vyacheslav Smirnov

0.45.1

--------------

This patch release addresses some regressions reported in the 0.45.0 release and

0.45.0

--------------

In this release, Numba gained an experimental :ref:`numba.typed.List
<feature-typed-list>` container as a future replacement of the :ref:`reflected
list <feature-reflected-list>`. In addition, functions decorated with
``parallel=True`` can now be cached to reduce compilation overhead associated
with the auto-parallelization.


Enhancements from user contributed PRs (with thanks!):

* James Bourbeau added the Numba version to reportable error messages in 4227,
added the ``signature`` parameter to ``inspect_types`` in 4200, improved the
docstring of ``normalize_signature`` in 4205, and fixed 3658 by adding
reference counting to ``register_dispatcher`` in 4254

* Guilherme Leobas implemented the dominator tree and dominance frontier
algorithms in 4216 and 4149, respectively.

* Nick White fixed the issue with ``round`` in the CUDA target in 4137.

* Joshua Adelman added support for determining if a value is in a `range`
(i.e.  ``x in range(...)``) in 4129, and added windowing functions
(``np.bartlett``, ``np.hamming``, ``np.blackman``, ``np.hanning``,
``np.kaiser``) from NumPy in 4076.

* Lucio Fernandez-Arjona added support for ``np.select`` in 4077

* Rob Ennis added support for ``np.flatnonzero`` in 4157

* Keith Kraus extended the ``__cuda_array_interface__`` with an optional mask
attribute in 4199.

* Gregory R. Lee replaced deprecated use of ``inspect.getargspec`` in 4311.


General Enhancements:

* PR 4328: Replace GC macro with function call
* PR 4311: Avoid deprecated use of inspect.getargspec
* PR 4296: Slacken window function testing tol on ppc64le
* PR 4254: Add reference counting to register_dispatcher
* PR 4239: Support len() of multi-dim arrays in array analysis
* PR 4234: Raise informative error for np.kron array order
* PR 4232: Add unicodetype db, low level str functions and examples.
* PR 4229: Make hashing cacheable
* PR 4227: Include numba version in reportable error message
* PR 4216: Add dominator tree
* PR 4200: Add signature parameter to inspect_types
* PR 4196: Catch missing imports of internal functions.
* PR 4180: Update use of unlowerable global message.
* PR 4166: Add tests for PR 4149
* PR 4157: Support for np.flatnonzero
* PR 4149: Implement dominance frontier for SSA for the Numba IR
* PR 4148: Call branch pruning in inline_closure_call()
* PR 4132: Reduce usage of inttoptr
* PR 4129: Support contains for range
* PR 4112: better error messages for np.transpose and tuples
* PR 4110: Add range attrs, start, stop, step
* PR 4077: Add np select
* PR 4076: Add numpy windowing functions support (np.bartlett, np.hamming,
np.blackman, np.hanning, np.kaiser)
* PR 4095: Support ir.Global/FreeVar in find_const()
* PR 3691: Make TypingError abort compiling earlier
* PR 3646: Log internal errors encountered in typeinfer

Fixes:

* PR 4303: Work around scipy bug 10206
* PR 4302: Fix flake8 issue on master
* PR 4301: Fix integer literal bug in np.select impl
* PR 4291: Fix pickling of jitclass type
* PR 4262: Resolves 4251 - Fix bug in reshape analysis.
* PR 4233: Fixes issue revealed by 4215
* PR 4224: Fix 4223. Looplifting error due to StaticSetItem in objectmode
* PR 4222: Fix bad python path.
* PR 4178: Fix unary operator overload, check with unicode impl
* PR 4173: Fix return type in np.bincount with weights
* PR 4153: Fix slice shape assignment in array analysis
* PR 4152: fix status check in dict lookup
* PR 4145: Use callable instead of checking __module__
* PR 4118: Fix inline assembly support on CPU.
* PR 4088: Resolves 4075 - parfors array_analysis bug.
* PR 4085: Resolves 3314 - parfors array_analysis bug with reshape.

CUDA Enhancements/Fixes:

* PR 4199: Extend `__cuda_array_interface__` with optional mask attribute,
bump version to 1
* PR 4137: CUDA - Fix round Builtin
* PR 4114: Support 3rd party activated CUDA context

Documentation Updates:

* PR 4317: Add docs for ARMv8/AArch64
* PR 4318: Add supported platforms to the docs. Closes 4316
* PR 4295: Alter deprecation schedules
* PR 4253: fix typo in pysupported docs
* PR 4252: fix typo on repomap
* PR 4241: remove unused import
* PR 4240: fix typo in jitclass docs
* PR 4205: Update return value order in normalize_signature docstring
* PR 4237: Update doc links to point to latest not dev docs.
* PR 4197: hyperlink repomap
* PR 4170: Clarify docs on accumulating into arrays in prange
* PR 4147: fix docstring for DictType iterables
* PR 3951: A guide to overloading

CI Updates:

* PR 4300: AArch64 has no faulthandler package
* PR 4273: pin to MKL BLAS for testing to get consistent results
* PR 4209: Revert previous network tol patch and try with conda config
* PR 4138: Remove tbb before Azure test only on Python 3, since it was already
removed for Python 2

Contributors:

* Ehsan Totoni (core dev)
* Gregory R. Lee
* Guilherme Leobas
* James Bourbeau
* Joshua L. Adelman
* Keith Kraus
* Lucio Fernandez-Arjona
* Nick White
* Rob Ennis
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)
* Valentin Haenel (core dev)

0.44.1

--------------

This patch release addresses some regressions reported in the 0.44.0 release:

- PR 4165: Fix 4164 issue with NUMBAPRO_NVVM.
- PR 4172: Abandon branch pruning if an arg name is redefined. (Fixes 4163)
- PR 4183: Fix 4156. Problem with defining in-loop variables.

0.44.0

--------------

IMPORTANT: In this release a few significant deprecations (and some less
significant ones) are being made, users are encouraged to read the related
documentation.

General enhancements in this release include:

- Numba is backed by LLVM 8 on all platforms apart from ppc64le, which, due to
bugs, remains on the LLVM 7.x series.
- Numba's dictionary support now includes type inference for keys and values.
- The .view() method now works for NumPy scalar types.
- Newly supported NumPy functions added: np.delete, np.nanquantile, np.quantile,
np.repeat, np.shape.

In addition considerable effort has been made to fix some long standing bugs and
a large number of other bugs, the "Fixes" section is very large this time!

Enhancements from user contributed PRs (with thanks!):

- Max Bolingbroke added support for the selective use of ``fastmath`` flags in
3847.
- Rob Ennis made min() and max() work on iterables in 3820 and added
np.quantile and np.nanquantile in 3899.
- Sergey Shalnov added numerous unicode string related features, zfill in 3978,
ljust in 4001, rjust and center in 4044 and strip, lstrip and rstrip in
4048.
- Guilherme Leobas added support for np.delete in 3890
- Christoph Deil exposed the Numba CLI via ``python -m numba`` in 4066 and made
numerous documentation fixes.
- Leo Schwarz wrote the bulk of the code for jitclass default constructor
arguments in 3852.
- Nick White enhanced the CUDA backend to use min/max PTX instructions where
possible in 4054.
- Lucio Fernandez-Arjona implemented the unicode string ``__mul__`` function in
3952.
- Dimitri Vorona wrote the bulk of the code to implement getitem and setitem for
jitclass in 3861.

General Enhancements:

* PR 3820: Min max on iterables
* PR 3842: Unicode type iteration
* PR 3847: Allow fine-grained control of fastmath flags to partially address 2923
* PR 3852: Continuation of PR 2894
* PR 3861: Continuation of PR 3730
* PR 3890: Add support for np.delete
* PR 3899: Support for np.quantile and np.nanquantile
* PR 3900: Fix 3457 :: Implements np.repeat
* PR 3928: Add .view() method for NumPy scalars
* PR 3939: Update icc_rt clone recipe.
* PR 3952: __mul__ for strings, initial implementation and tests
* PR 3956: Type-inferred dictionary
* PR 3959: Create a view for string slicing to avoid extra allocations
* PR 3978: zfill operation implementation
* PR 4001: ljust operation implementation
* PR 4010: Support `dict()` and `{}`
* PR 4022: Support for llvm 8
* PR 4034: Make type.Optional str more representative
* PR 4041: Deprecation warnings
* PR 4044: rjust and center operations implementation
* PR 4048: strip, lstrip and rstrip operations implementation
* PR 4066: Expose numba CLI via python -m numba
* PR 4081: Impl `np.shape` and support function for `asarray`.
* PR 4091: Deprecate the use of iternext_impl without RefType

CUDA Enhancements/Fixes:

* PR 3933: Adds `.nbytes` property to CUDA device array objects.
* PR 4011: Add .inspect_ptx() to cuda device function
* PR 4054: CUDA: Use min/max PTX Instructions
* PR 4096: Update env-vars for CUDA libraries lookup

Documentation Updates:

* PR 3867: Code repository map
* PR 3918: adding Joris' Fosdem 2019 presentation
* PR 3926: order talks on applications of Numba by date
* PR 3943: fix two small typos in vectorize docs
* PR 3944: Fixup jitclass docs
* PR 3990: mention preprint repo in FAQ. Fixes 3981
* PR 4012: Correct runtests command in contributing.rst
* PR 4043: fix typo
* PR 4047: Ambiguous Documentation fix for guvectorize.
* PR 4060: Remove remaining mentions of autojit in docs
* PR 4063: Fix annotate example in docstring
* PR 4065: Add FAQ entry explaining Numba project name
* PR 4079: Add Documentation for atomicity of typed.Dict
* PR 4105: Remove info about CUDA ENVVAR potential replacement

Fixes:

* PR 3719: Resolves issue 3528.  Adds support for slices when not using parallel=True.
* PR 3727: Remove dels for known dead vars.
* PR 3845: Fix mutable flag transmission in .astype
* PR 3853: Fix some minor issues in the C source.
* PR 3862: Correct boolean reinterpretation of data
* PR 3863: Comments out the appveyor badge
* PR 3869: fixes flake8 after merge
* PR 3871: Add assert to ir.py to help enforce correct structuring
* PR 3881: fix preparfor dtype transform for datetime64
* PR 3884: Prevent mutation of objmode fallback IR.
* PR 3885: Updates for llvmlite 0.29
* PR 3886: Use `safe_load` from pyyaml.
* PR 3887: Add tolerance to network errors by permitting conda to retry
* PR 3893: Fix casting in namedtuple ctor.
* PR 3894: Fix array inliner for multiple array definition.
* PR 3905: Cherrypick 3903 to main
* PR 3920: Raise better error if unsupported jump opcode found.
* PR 3927: Apply flake8 to the numpy related files
* PR 3935: Silence DeprecationWarning
* PR 3938: Better error message for unknown opcode
* PR 3941: Fix typing of ufuncs in parfor conversion
* PR 3946: Return variable renaming dict from inline_closurecall
* PR 3962: Fix bug in alignment computation of `Record.make_c_struct`
* PR 3967: Fix error with pickling unicode
* PR 3964: Unicode split algo versioning
* PR 3975: Add handler for unknown locale to numba -s
* PR 3991: Permit Optionals in ufunc machinery
* PR 3995: Remove assert in type inference causing poor error message.
* PR 3996: add is_ascii flag to UnicodeType
* PR 4009: Prevent zero division error in np.linalg.cond
* PR 4014: Resolves 4007.
* PR 4021: Add a more specific error message for invalid write to a global.
* PR 4023: Fix handling of titles in record dtype
* PR 4024: Do a check if a call is const before saying that an object is multiply defined.
* PR 4027: Fix issue 4020.  Turn off no_cpython_wrapper flag when compiling for…
* PR 4033: [WIP] Fixing wrong dtype of array inside reflected list 4028
* PR 4061: Change IPython cache dir name to numba_cache
* PR 4067: Delete examples/notebooks/LinearRegr.py
* PR 4070: Catch writes to global typed.Dict and raise.
* PR 4078: Check tuple length
* PR 4084: Fix missing incref on optional return None
* PR 4089: Make the warnings fixer flush work for warning comparing on type.
* PR 4094: Fix function definition finding logic for commented def
* PR 4100: Fix alignment check on 32-bit.
* PR 4104: Use PEP 508 compliant env markers for install deps

Contributors:

* Benjamin Zaitlen
* Christoph Deil
* David Hirschfeld
* Dimitri Vorona
* Ehsan Totoni (core dev)
* Guilherme Leobas
* Leo Schwarz
* Lucio Fernandez-Arjona
* Max Bolingbroke
* NanduTej
* Nick White
* Ravi Teja Gutta
* Rob Ennis
* Sergey Shalnov
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)
* Valentin Haenel (core dev)

0.43.1

--------------

This is a bugfix release that provides minor changes to fix: a bug in branch
pruning, bugs in `np.interp` functionality, and also fully accommodate the
NumPy 1.16 release series.

* PR 3826: NumPy 1.16 support
* PR 3850: Refactor np.interp
* PR 3883: Rewrite pruned conditionals as their evaluated constants.

Contributors:

* Rob Ennis
* Siu Kwan Lam (core dev)
* Stuart Archibald (core dev)

0.43.0

--------------

In this release, the major new features are:

- Initial support for statically typed dictionaries
- Improvements to `hash()` to match Python 3 behavior
- Support for the heapq module
- Ability to pass C structs to Numba
- More NumPy functions: asarray, trapz, roll, ptp, extract


NOTE:

The vast majority of NumPy 1.16 behaviour is supported, however
``datetime`` and ``timedelta`` use involving ``NaT`` matches the behaviour
present in earlier release. The ufunc suite has not been extending to
accommodate the two new time computation related additions present in NumPy
1.16. In addition the functions ``ediff1d`` and ``interp`` have known minor
issues in replicating outputs exactly when ``NaN``'s occur in certain input
patterns.

General Enhancements:

* PR 3563: Support for np.roll
* PR 3572: Support for np.ptp
* PR 3592: Add dead branch prune before type inference.
* PR 3598: Implement np.asarray()
* PR 3604: Support for np.interp
* PR 3607: Some simplication to lowering
* PR 3612: Exact match flag in dispatcher
* PR 3627: Support for np.trapz
* PR 3630: np.where with broadcasting
* PR 3633: Support for np.extract
* PR 3657: np.max, np.min, np.nanmax, np.nanmin - support for complex dtypes
* PR 3661: Access C Struct as Numpy Structured Array
* PR 3678: Support for str.split and str.join
* PR 3684: Support C array in C struct
* PR 3696: Add intrinsic to help debug refcount
* PR 3703: Implementations of type hashing.
* PR 3715: Port CPython3.7 dictionary for numba internal use
* PR 3716: Support inplace concat of strings
* PR 3718: Add location to ConstantInferenceError exceptions.
* PR 3720: improve error msg about invalid signature
* PR 3731: Support for heapq
* PR 3754: Updates for llvmlite 0.28
* PR 3760: Overloadable operator.setitem
* PR 3775: Support overloading operator.delitem
* PR 3777: Implement compiler support for dictionary
* PR 3791: Implement interpreter-side interface for numba dict
* PR 3799: Support refcount'ed types in numba dict

CUDA Enhancements/Fixes:

* PR 3713: Fix the NvvmSupportError message when CC too low
* PR 3722: Fix 3705: slicing error with negative strides
* PR 3755: Make cuda.to_device accept readonly host array
* PR 3773: Adapt library search to accommodate multiple locations

Documentation Updates:

* PR 3651: fix link to berryconda in docs
* PR 3668: Add Azure Pipelines build badge
* PR 3749: DOC: Clarify when prange is different from range
* PR 3771: fix a few typos
* PR 3785: Clarify use of range as function only.
* PR 3829: Add docs for typed-dict

Fixes:

* PR 3614: Resolve 3586
* PR 3618: Skip gdb tests on ARM.
* PR 3643: Remove support_literals usage
* PR 3645: Enforce and fix that AbstractTemplate.generic must be returning a Signature
* PR 3648: Fail on overload signature mismatch.
* PR 3660: Added Ignore message to test numba.tests.test_lists.TestLists.test_mul_error
* PR 3662: Replace six with numba.six
* PR 3663: Removes coverage computation from travisci builds
* PR 3672: Avoid leaking memory when iterating over uniform tuple
* PR 3676: Fixes constant string lowering inside tuples
* PR 3677: Ensure all referenced compiled functions are linked properly
* PR 3692: Fix test failure due to overly strict test on floating point values.
* PR 3693: Intercept failed import to help users.
* PR 3694: Fix memory leak in enumerate iterator
* PR 3695: Convert return of None from intrinsic implementation to dummy value
* PR 3697: Fix for issue 3687
* PR 3701: Fix array.T analysis (fixes 3700)
* PR 3704: Fixes for overload_method
* PR 3706: Don't push call vars recursively into nested parfors. Resolves 3686.
* PR 3710: Set as non-hoistable if a mutable variable is passed to a function in a loop. Resolves 3699.
* PR 3712: parallel=True to use better builtin mechanism to resolve call types. Resolves issue 3671
* PR 3725: Fix invalid removal of dead empty list
* PR 3740: add uintp as a valid type to the tuple operator.getitem
* PR 3758: Fix target definition update in inlining
* PR 3782: Raise typing error on yield optional.
* PR 3792: Fix non-module object used as the module of a function.
* PR 3800: Bugfix for np.interp
* PR 3808: Bump macro to include VS2014 to fix py3.5 build
* PR 3809: Add debug guard to debug only C function.
* PR 3816: Fix array.sum(axis) 1d input return type.
* PR 3821: Replace PySys_WriteStdout with PySys_FormatStdout to ensure no truncation.
* PR 3830: Getitem should not return optional type
* PR 3832: Handle single string as path in find_file()

Contributors:

* Ehsan Totoni
* Gryllos Prokopis
* Jonathan J. Helmus
* Kayla Ngan
* lalitparate
* luk-f-a
* Matyt
* Max Bolingbroke
* Michael Seifert
* Rob Ennis
* Siu Kwan Lam
* Stan Seibert
* Stuart Archibald
* Todd A. Anderson
* Tao He
* Valentin Haenel

0.42.1

--------------

Bugfix release to fix the incorrect hash in OSX wheel packages.
No change in source code.

0.42.0

--------------

In this release the major features are:

- The capability to launch and attach the GDB debugger from within a jitted
function.
- The upgrading of LLVM to version 7.0.0.

We added a draft of the project roadmap to the developer manual. The roadmap is
for informational purposes only as priorities and resources may change.

Here are some enhancements from contributed PRs:

- 3532. Daniel Wennberg improved the ``cuda.{pinned, mapped}`` API so that
the associated memory is released immediately at the exit of the context
manager.
- 3531. Dimitri Vorona enabled the inlining of jitclass methods.
- 3516. Simon Perkins added the support for passing numpy dtypes (i.e.
``np.dtype("int32")``) and their type constructor (i.e. ``np.int32``) into
a jitted function.
- 3509. Rob Ennis added support for ``np.corrcoef``.

A regression issue (3554, 3461) relating to making an empty slice in parallel
mode is resolved by 3558.

General Enhancements:

* PR 3392: Launch and attach gdb directly from Numba.
* PR 3437: Changes to accommodate LLVM 7.0.x
* PR 3509: Support for np.corrcoef
* PR 3516: Typeof dtype values
* PR 3520: Fix stencil ignoring cval if out kwarg supplied.
* PR 3531: Fix jitclass method inlining and avoid unnecessary increfs
* PR 3538: Avoid future C-level assertion error due to invalid visibility
* PR 3543: Avoid implementation error being hidden by the try-except
* PR 3544: Add `long_running` test flag and feature to exclude tests.
* PR 3549: ParallelAccelerator caching improvements
* PR 3558: Fixes array analysis for inplace binary operators.
* PR 3566: Skip alignment tests on armv7l.
* PR 3567: Fix unifying literal types in namedtuple
* PR 3576: Add special copy routine for NumPy out arrays
* PR 3577: Fix example and docs typos for `objmode` context manager.
reorder statements.
* PR 3580: Use alias information when determining whether it is safe to
* PR 3583: Use `ir.unknown_loc` for unknown `Loc`, as 3390 with tests
* PR 3587: Fix llvm.memset usage changes in llvm7
* PR 3596: Fix Array Analysis for Global Namedtuples
* PR 3597: Warn users if threading backend init unsafe.
* PR 3605: Add guard for writing to read only arrays from ufunc calls
* PR 3606: Improve the accuracy of error message wording for undefined type.
* PR 3611: gdb test guard needs to ack ptrace permissions
* PR 3616: Skip gdb tests on ARM.

CUDA Enhancements:

* PR 3532: Unregister temporarily pinned host arrays at once
* PR 3552: Handle broadcast arrays correctly in host->device transfer.
* PR 3578: Align cuda and cuda simulator kwarg names.

Documentation Updates:

* PR 3545: Fix njit description in 5 min guide
* PR 3570: Minor documentation fixes for numba.cuda
* PR 3581: Fixing minor typo in `reference/types.rst`
* PR 3594: Changing `stencil` docs to correctly reflect `func_or_mode` param
* PR 3617: Draft roadmap as of Dec 2018

Contributors:

* Aaron Critchley
* Daniel Wennberg
* Dimitri Vorona
* Dominik Stańczak
* Ehsan Totoni (core dev)
* Iskander Sharipov
* Rob Ennis
* Simon Muller
* Simon Perkins
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)

0.41.0

--------------

This release adds the following major features:

* Diagnostics showing the optimizations done by ParallelAccelerator
* Support for profiling Numba-compiled functions in Intel VTune
* Additional NumPy functions: partition, nancumsum, nancumprod, ediff1d, cov,
conj, conjugate, tri, tril, triu
* Initial support for Python 3 Unicode strings

General Enhancements:

* PR 1968: armv7 support
* PR 2983: invert mapping b/w binop operators and the operator module 2297
* PR 3160: First attempt at parallel diagnostics
* PR 3307: Adding NUMBA_ENABLE_PROFILING envvar, enabling jit event
* PR 3320: Support for np.partition
* PR 3324: Support for np.nancumsum and np.nancumprod
* PR 3325: Add location information to exceptions.
* PR 3337: Support for np.ediff1d
* PR 3345: Support for np.cov
* PR 3348: Support user pipeline class in with lifting
* PR 3363: string support
* PR 3373: Improve error message for empty imprecise lists.
* PR 3375: Enable overload(operator.getitem)
* PR 3402: Support negative indexing in tuple.
* PR 3414: Refactor Const type
* PR 3416: Optimized usage of alloca out of the loop
* PR 3424: Updates for llvmlite 0.26
* PR 3462: Add support for `np.conj/np.conjugate`.
* PR 3480: np.tri, np.tril, np.triu - default optional args
* PR 3481: Permit dtype argument as sole kwarg in np.eye

CUDA Enhancements:

* PR 3399: Add max_registers Option to cuda.jit

Continuous Integration / Testing:

* PR 3303: CI with Azure Pipelines
* PR 3309: Workaround race condition with apt
* PR 3371: Fix issues with Azure Pipelines
* PR 3362: Fix 3360: `RuntimeWarning: 'numba.runtests' found in sys.modules`
* PR 3374: Disable openmp in wheel building
* PR 3404: Azure Pipelines templates
* PR 3419: Fix cuda tests and error reporting in test discovery
* PR 3491: Prevent faulthandler installation on armv7l
* PR 3493: Fix CUDA test that used negative indexing behaviour that's fixed.
* PR 3495: Start Flake8 checking of Numba source

Fixes:

* PR 2950: Fix dispatcher to only consider contiguous-ness.
* PR 3124: Fix 3119, raise for 0d arrays in reductions
* PR 3228: Reduce redundant module linking
* PR 3329: Fix AOT on windows.
* PR 3335: Fix memory management of __cuda_array_interface__ views.
* PR 3340: Fix typo in error name.
* PR 3365: Fix the default unboxing logic
* PR 3367: Allow non-global reference to objmode() context-manager
* PR 3381: Fix global reference in objmode for dynamically created function
* PR 3382: CUDA_ERROR_MISALIGNED_ADDRESS Using Multiple Const Arrays
* PR 3384: Correctly handle very old versions of colorama
* PR 3394: Add 32bit package guard for non-32bit installs
* PR 3397: Fix with-objmode warning
* PR 3403 Fix label offset in call inline after parfor pass
* PR 3429: Fixes raising of user defined exceptions for exec(<string>).
* PR 3432: Fix error due to function naming in CI in py2.7
* PR 3444: Fixed TBB's single thread execution and test added for 3440
* PR 3449: Allow matching non-array objects in find_callname()
* PR 3455: Change getiter and iternext to not be pure. Resolves 3425
* PR 3467: Make ir.UndefinedType singleton class.
* PR 3478: Fix np.random.shuffle sideeffect
* PR 3487: Raise unsupported for kwargs given to `print()`
* PR 3488: Remove dead script.
* PR 3498: Fix stencil support for boolean as return type
* PR 3511: Fix handling make_function literals (regression of 3414)
* PR 3514: Add missing unicode != unicode
* PR 3527: Fix complex math sqrt implementation for large -ve values
* PR 3530: This adds arg an check for the pattern supplied to Parfors.
* PR 3536: Sets list dtor linkage to `linkonce_odr` to fix visibility in AOT

Documentation Updates:

* PR 3316: Update 0.40 changelog with additional PRs
* PR 3318: Tweak spacing to avoid search box wrapping onto second line
* PR 3321: Add note about memory leaks with exceptions to docs. Fixes 3263
* PR 3322: Add FAQ on CUDA + fork issue. Fixes 3315.
* PR 3343: Update docs for argsort, kind kwarg partially supported.
* PR 3357: Added mention of njit in 5minguide.rst
* PR 3434: Fix parallel reduction example in docs.
* PR 3452: Fix broken link and mark up problem.
* PR 3484: Size Numba logo in docs in em units. Fixes 3313
* PR 3502: just two typos
* PR 3506: Document string support
* PR 3513: Documentation for parallel diagnostics.
* PR 3526: Fix 5 min guide with respect to njit decl

Contributors:

* Alex Ford
* Andreas Sodeur
* Anton Malakhov
* Daniel Stender
* Ehsan Totoni (core dev)
* Henry Schreiner
* Marcel Bargull
* Matt Cooper
* Nick White
* Nicolas Hug
* rjenc29
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Todd A. Anderson (core dev)

0.40.1

--------------

This is a PyPI-only patch release to ensure that PyPI wheels can enable the
TBB threading backend, and to disable the OpenMP backend in the wheels.
Limitations of manylinux1 and variation in user environments can cause
segfaults when OpenMP is enabled on wheel builds.  Note that this release has
no functional changes for users who obtained Numba 0.40.0 via conda.

Patches:

* PR 3338: Accidentally left Anton off contributor list for 0.40.0
* PR 3374: Disable OpenMP in wheel building
* PR 3376: Update 0.40.1 changelog and docs on OpenMP backend

0.40.0

--------------

This release adds a number of major features:

* A new GPU backend: kernels for AMD GPUs can now be compiled using the ROCm
driver on Linux.
* The thread pool implementation used by Numba for automatic multithreading
is configurable to use TBB, OpenMP, or the old "workqueue" implementation.
(TBB is likely to become the preferred default in a future release.)
* New documentation on thread and fork-safety with Numba, along with overall
improvements in thread-safety.
* Experimental support for executing a block of code inside a nopython mode
function in object mode.
* Parallel loops now allow arrays as reduction variables
* CUDA improvements: FMA, faster float64 atomics on supporting hardware,
records in const memory, and improved datatime dtype support
* More NumPy functions: vander, tri, triu, tril, fill_diagonal


General Enhancements:

* PR 3017: Add facility to support with-contexts
* PR 3033: Add support for multidimensional CFFI arrays
* PR 3122: Add inliner to object mode pipeline
* PR 3127: Support for reductions on arrays.
* PR 3145: Support for np.fill_diagonal
* PR 3151: Keep a queue of references to last N deserialized functions.  Fixes 3026
* PR 3154: Support use of list() if typeable.
* PR 3166: Objmode with-block
* PR 3179: Updates for llvmlite 0.25
* PR 3181: Support function extension in alias analysis
* PR 3189: Support literal constants in typing of object methods
* PR 3190: Support passing closures as literal values in typing
* PR 3199: Support inferring stencil index as constant in simple unary expressions
* PR 3202: Threading layer backend refactor/rewrite/reinvention!
* PR 3209: Support for np.tri, np.tril and np.triu
* PR 3211: Handle unpacking in building tuple (BUILD_TUPLE_UNPACK opcode)
* PR 3212: Support for np.vander
* PR 3227: Add NumPy 1.15 support
* PR 3272: Add MemInfo_data to runtime._nrt_python.c_helpers
* PR 3273: Refactor. Removing thread-local-storage based context nesting.
* PR 3278: compiler threadsafety lockdown
* PR 3291: Add CPU count and CFS restrictions info to numba -s.

CUDA Enhancements:

* PR 3152: Use cuda driver api to get best blocksize for best occupancy
* PR 3165: Add FMA intrinsic support
* PR 3172: Use float64 add Atomics, Where Available
* PR 3186: Support Records in CUDA Const Memory
* PR 3191: CUDA: fix log size
* PR 3198: Fix GPU datetime timedelta types usage
* PR 3221: Support datetime/timedelta scalar argument to a CUDA kernel.
* PR 3259: Add DeviceNDArray.view method to reinterpret data as a different type.
* PR 3310: Fix IPC handling of sliced cuda array.

ROCm Enhancements:

* PR 3023: Support for AMDGCN/ROCm.
* PR 3108: Add ROC info to `numba -s` output.
* PR 3176: Move ROC vectorize init to npyufunc
* PR 3177: Add auto_synchronize support to ROC stream
* PR 3178: Update ROC target documentation.
* PR 3294: Add compiler lock to ROC compilation path.
* PR 3280: Add wavebits property to the HSA Agent.
* PR 3281: Fix ds_permute types and add tests

Continuous Integration / Testing:

* PR 3091: Remove old recipes, switch to test config based on env var.
* PR 3094: Add higher ULP tolerance for products in complex space.
* PR 3096: Set exit on error in incremental scripts
* PR 3109: Add skip to test needing jinja2 if no jinja2.
* PR 3125: Skip cudasim only tests
* PR 3126: add slack, drop flowdock
* PR 3147: Improve error message for arg type unsupported during typing.
* PR 3128: Fix recipe/build for jetson tx2/ARM
* PR 3167: In build script activate env before installing.
* PR 3180: Add skip to broken test.
* PR 3216: Fix libcuda.so loading in some container setup
* PR 3224: Switch to new Gitter notification webhook URL and encrypt it
* PR 3235: Add 32bit Travis CI jobs
* PR 3257: This adds scipy/ipython back into windows conda test phase.

Fixes:

* PR 3038: Fix random integer generation to match results from NumPy.
* PR 3045: Fix 3027 - Numba reassigns sys.stdout
* PR 3059: Handler for known LoweringErrors.
* PR 3060: Adjust attribute error for NumPy functions.
* PR 3067: Abort simulator threads on exception in thread block.
* PR 3079: Implement +/-(types.boolean) Fix 2624
* PR 3080: Compute np.var and np.std correctly for complex types.
* PR 3088: Fix 3066 (array.dtype.type in prange)
* PR 3089: Fix invalid ParallelAccelerator hoisting issue.
* PR 3136: Fix 3135 (lowering error)
* PR 3137: Fix for issue3103 (race condition detection)
* PR 3142: Fix Issue 3139 (parfors reuse of reduction variable across prange blocks)
* PR 3148: Remove dead array equal infer code
* PR 3153: Fix canonicalize_array_math typing for calls with kw args
* PR 3156: Fixes issue with missing pygments in testing and adds guards.
* PR 3168: Py37 bytes output fix.
* PR 3171: Fix 3146.  Fix CFUNCTYPE void* return-type handling
* PR 3193: Fix setitem/getitem resolvers
* PR 3222: Fix 3214.  Mishandling of POP_BLOCK in while True loop.
* PR 3230: Fixes liveness analysis issue in looplifting
* PR 3233: Fix return type difference for 32bit ctypes.c_void_p
* PR 3234: Fix types and layout for `np.where`.
* PR 3237: Fix DeprecationWarning about imp module
* PR 3241: Fix 3225.  Normalize 0nd array to scalar in typing of indexing code.
* PR 3256: Fix 3251: Move imports of ABCs to collections.abc for Python >= 3.3
* PR 3292: Fix issue3279.
* PR 3302: Fix error due to mismatching dtype

Documentation Updates:

* PR 3104: Workaround for 3098 (test_optional_unpack Heisenbug)
* PR 3132: Adds an ~5 minute guide to Numba.
* PR 3194: Fix docs RE: np.random generator fork/thread safety
* PR 3242: Page with Numba talks and tutorial links
* PR 3258: Allow users to choose the type of issue they are reporting.
* PR 3260: Fixed broken link
* PR 3266: Fix cuda pointer ownership problem with user/externally allocated pointer
* PR 3269: Tweak typography with CSS
* PR 3270: Update FAQ for functions passed as arguments
* PR 3274: Update installation instructions
* PR 3275: Note pyobject and voidptr are types in docs
* PR 3288: Do not need to call parallel optimizations "experimental" anymore
* PR 3318: Tweak spacing to avoid search box wrapping onto second line

Contributors:

* Anton Malakhov
* Alex Ford
* Anthony Bisulco
* Ehsan Totoni (core dev)
* Leonard Lausen
* Matthew Petroff
* Nick White
* Ray Donnelly
* rjenc29
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Stuart Reynolds
* Todd A. Anderson (core dev)

0.39.0

--------------

Here are the highlights for the Numba 0.39.0 release.

* This is the first version that supports Python 3.7.
* With help from Intel, we have fixed the issues with SVML support (related
issues 2938, 2998, 3006).
* List has gained support for containing reference-counted types like NumPy
arrays and `list`.  Note, list still cannot hold heterogeneous types.
* We have made a significant change to the internal calling-convention,
which should be transparent to most users, to allow for a future feature that
will permitting jumping back into python-mode from a nopython-mode function.
This also fixes a limitation to `print` that disabled its use from nopython
functions that were deep in the call-stack.
* For CUDA GPU support, we added a `__cuda_array_interface__` following the
NumPy array interface specification to allow Numba to consume externally
defined device arrays.  We have opened a corresponding pull request to CuPy to
test out the concept and be able to use a CuPy GPU array.
* The Numba dispatcher `inspect_types()` method now supports the kwarg `pretty`
which if set to `True` will produce ANSI/HTML output, showing the annotated
types, when invoked from ipython/jupyter-notebook respectively.
* The NumPy functions `ndarray.dot`, `np.percentile` and `np.nanpercentile`, and
`np.unique` are now supported.
* Numba now supports the use of a per-project configuration file to permanently
set behaviours typically set via `NUMBA_*` family environment variables.
* Support for the `ppc64le` architecture has been added.

Enhancements:

* PR 2793: Simplify and remove javascript from html_annotate templates.
* PR 2840: Support list of refcounted types
* PR 2902: Support for np.unique
* PR 2926: Enable fence for all architecture and add developer notes
* PR 2928: Making error about untyped list more informative.
* PR 2930: Add configuration file and color schemes.
* PR 2932: Fix encoding to 'UTF-8' in `check_output` decode.
* PR 2938: Python 3.7 compat: _Py_Finalizing becomes _Py_IsFinalizing()
* PR 2939: Comprehensive SVML unit test
* PR 2946: Add support for `ndarray.dot` method and tests.
* PR 2953: percentile and nanpercentile
* PR 2957: Add new 3.7 opcode support.
* PR 2963: Improve alias analysis to be more comprehensive
* PR 2984: Support for namedtuples in array analysis
* PR 2986: Fix environment propagation
* PR 2990: Improve function call matching for intrinsics
* PR 3002: Second pass at error rewrites (interpreter errors).
* PR 3004: Add numpy.empty to the list of pure functions.
* PR 3008: Augment SVML detection with llvmlite SVML patch detection.
* PR 3012: Make use of the common spelling of heterogeneous/homogeneous.
* PR 3032: Fix pycc ctypes test due to mismatch in calling-convention
* PR 3039: Add SVML detection to Numba environment diagnostic tool.
* PR 3041: This adds needs_blas to tests that use BLAS
* PR 3056: Require llvmlite>=0.24.0

CUDA Enhancements:

* PR 2860: __cuda_array_interface__
* PR 2910: More CUDA intrinsics
* PR 2929: Add Flag To Prevent Unneccessary D->H Copies
* PR 3037: Add CUDA IPC support on non-peer-accessible devices

CI Enhancements:

* PR 3021: Update appveyor config.
* PR 3040: Add fault handler to all builds
* PR 3042: Add catchsegv
* PR 3077: Adds optional number of processes for `-m` in testing

Fixes:

* PR 2897: Fix line position of delete statement in numba ir
* PR 2905: Fix for 2862
* PR 3009: Fix optional type returning in recursive call
* PR 3019: workaround and unittest for issue 3016
* PR 3035: [TESTING] Attempt delayed removal of Env
* PR 3048: [WIP] Fix cuda tests failure on buildfarm
* PR 3054: Make test work on 32-bit
* PR 3062: Fix cuda.In freeing devary before the kernel launch
* PR 3073: Workaround 3072
* PR 3076: Avoid ignored exception due to missing globals at interpreter teardown

Documentation Updates:

* PR 2966: Fix syntax in env var docs.
* PR 2967: Fix typo in CUDA kernel layout example.
* PR 2970: Fix docstring copy paste error.

Contributors:

The following people contributed to this release.

* Anton Malakhov
* Ehsan Totoni  (core dev)
* Julia Tatz
* Matthias Bussonnier
* Nick White
* Ray Donnelly
* Siu Kwan Lam  (core dev)
* Stan Seibert  (core dev)
* Stuart Archibald  (core dev)
* Todd A. Anderson  (core dev)
* Rik-de-Kort
* rjenc29

0.38.1

--------------

This is a critical bug fix release addressing:
https://github.com/numba/numba/issues/3006

The bug does not impact users using conda packages from Anaconda or Intel Python
Distribution (but it does impact conda-forge). It does not impact users of pip
using wheels from PyPI.

This only impacts a small number of users where:

* The ICC runtime (specifically libsvml) is present in the user's environment.
* The user is using an llvmlite statically linked against a version of LLVM
that has not been patched with SVML support.
* The platform is 64-bit.

The release fixes a code generation path that could lead to the production of
incorrect results under the above situation.

Fixes:

* PR 3007: Augment SVML detection with llvmlite SVML patch detection.

Contributors:

The following people contributed to this release.

* Stuart Archibald (core dev)

0.38.0

--------------

Following on from the bug fix focus of the last release, this release swings
back towards the addition of new features and usability improvements based on
community feedback. This release is comparatively large! Three key features/
changes to note are:

* Numba (via llvmlite) is now backed by LLVM 6.0, general vectorization is
improved as a result. A significant long standing LLVM bug that was causing
corruption was also found and fixed.
* Further considerable improvements in vectorization are made available as
Numba now supports Intel's short vector math library (SVML).
Try it out with `conda install -c numba icc_rt`.
* CUDA 8.0 is now the minimum supported CUDA version.

Other highlights include:

* Bug fixes to `parallel=True` have enabled more vectorization opportunities
when using the ParallelAccelerator technology.
* Much effort has gone into improving error reporting and the general usability
of Numba. This includes highlighted error messages and performance tips
documentation. Try it out with `conda install colorama`.
* A number of new NumPy functions are supported, `np.convolve`, `np.correlate`
`np.reshape`, `np.transpose`, `np.permutation`, `np.real`, `np.imag`, and
`np.searchsorted` now supports the`side` kwarg. Further, `np.argsort` now
supports the `kind` kwarg with `quicksort` and `mergesort` available.
* The Numba extension API has gained the ability operate more easily with
functions from Cython modules through the use of
`numba.extending.get_cython_function_address` to obtain function addresses
for direct use in `ctypes.CFUNCTYPE`.
* Numba now allows the passing of jitted functions (and containers of jitted
functions) as arguments to other jitted functions.
* The CUDA functionality has gained support for a larger selection of bit
manipulation intrinsics, also SELP, and has had a number of bugs fixed.
* Initial work to support the PPC64LE platform has been added, full support is
however waiting on the LLVM 6.0.1 release as it contains critical patches
not present in 6.0.0.
It is hoped that any remaining issues will be fixed in the next release.
* The capacity for advanced users/compiler engineers to define their own
compilation pipelines.

Enhancements:

* PR 2660: Support bools from cffi in nopython.
* PR 2741: Enhance error message for undefined variables.
* PR 2744: Add diagnostic error message to test suite discovery failure.
* PR 2748: Added Intel SVML optimizations as opt-out choice working by default
* PR 2762: Support transpose with axes arguments.
* PR 2777: Add support for np.correlate and np.convolve
* PR 2779: Implement np.random.permutation
* PR 2801: Passing jitted functions as args
* PR 2802: Support np.real() and np.imag()
* PR 2807: Expose `import_cython_function`
* PR 2821: Add kwarg 'side' to np.searchsorted
* PR 2822: Adds stable argsort
* PR 2832: Fixups for llvmlite 0.23/llvm 6
* PR 2836: Support `index` method on tuples
* PR 2839: Support for np.transpose and np.reshape.
* PR 2843: Custom pipeline
* PR 2847: Replace signed array access indices in unsiged prange loop body
* PR 2859: Add support for improved error reporting.
* PR 2880: This adds a github issue template.
* PR 2881: Build recipe to clone Intel ICC runtime.
* PR 2882: Update TravisCI to test SVML
* PR 2893: Add reference to the data buffer in array.ctypes object
* PR 2895: Move to CUDA 8.0

Fixes:

* PR 2737: Fix 2007 (part 1). Empty array handling in np.linalg.
* PR 2738: Fix install_requires to allow pip getting pre-release version
* PR 2740: Fix 2208. Generate better error message.
* PR 2765: Fix Bit-ness
* PR 2780: PowerPC reference counting memory fences
* PR 2805: Fix six imports.
* PR 2813: Fix 2812: gufunc scalar output bug.
* PR 2814: Fix the build post 2727
* PR 2831: Attempt to fix 2473
* PR 2842: Fix issue with test discovery and broken CUDA drivers.
* PR 2850: Add rtsys init guard and test.
* PR 2852: Skip vectorization test with targets that are not x86
* PR 2856: Prevent printing to stdout in `test_extending.py`
* PR 2864: Correct C code to prevent compiler warnings.
* PR 2889: Attempt to fix 2386.
* PR 2891: Removed test skipping for inspect_cfg
* PR 2898: Add guard to parallel test on unsupported platforms
* PR 2907: Update change log for PPC64LE LLVM dependency.
* PR 2911: Move build requirement to llvmlite>=0.23.0dev0
* PR 2912: Fix random permutation test.
* PR 2914: Fix MD list syntax in issue template.

Documentation Updates:

* PR 2739: Explicitly state default value of error_model in docstring
* PR 2803: DOC: parallel vectorize requires signatures
* PR 2829: Add Python 2.7 EOL plan to docs
* PR 2838: Use automatic numbering syntax in list.
* PR 2877: Add performance tips documentation.
* PR 2883: Fix 2872: update rng doc about thread/fork-safety
* PR 2908: Add missing link and ref to docs.
* PR 2909: Tiny typo correction

ParallelAccelerator enhancements/fixes:

* PR 2727: Changes to enable vectorization in ParallelAccelerator.
* PR 2816: Array analysis for transpose with arbitrary arguments
* PR 2874: Fix dead code eliminator not to remove a call with side-effect
* PR 2886: Fix ParallelAccelerator arrayexpr repr

CUDA enhancements:

* PR 2734: More Constants From cuda.h
* PR 2767: Add len(..) Support to DeviceNDArray
* PR 2778: Add More Device Array API Functions to CUDA Simulator
* PR 2824: Add CUDA Primitives for Population Count
* PR 2835: Emit selp Instructions to Avoid Branching
* PR 2867: Full support for CUDA device attributes

CUDA fixes:
* PR 2768: Don't Compile Code on Every Assignment
* PR 2878: Fixes a Win64 issue with the test in Pr/2865

Contributors:

The following people contributed to this release.

* Abutalib Aghayev
* Alex Olivas
* Anton Malakhov
* Dong-hee Na
* Ehsan Totoni (core dev)
* John Zwinck
* Josh Wilson
* Kelsey Jordahl
* Nick White
* Olexa Bilaniuk
* Rik-de-Kort
* Siu Kwan Lam (core dev)
* Stan Seibert (core dev)
* Stuart Archibald (core dev)
* Thomas Arildsen
* Todd A. Anderson (core dev)

0.37.0

--------------

This release focuses on bug fixing and stability but also adds a few new
features including support for Numpy 1.14. The key change for Numba core was the
long awaited addition of the final tranche of thread safety improvements that
allow Numba to be run concurrently on multiple threads without hitting known
thread safety issues inside LLVM itself. Further, a number of fixes and
enhancements went into the CUDA implementation and ParallelAccelerator gained
some new features and underwent some internal refactoring.

Misc enhancements:

* PR 2627: Remove hacks to make llvmlite threadsafe
* PR 2672: Add ascontiguousarray
* PR 2678: Add Gitter badge
* PR 2691: Fix 2690: add intrinsic to convert array to tuple
* PR 2703: Test runner feature: failed-first and last-failed
* PR 2708: Patch for issue 1907
* PR 2732: Add support for array.fill

Misc Fixes:

* PR 2610: Fix 2606 lowering of optional.setattr
* PR 2650: Remove skip for win32 cosine test
* PR 2668: Fix empty_like from readonly arrays.
* PR 2682: Fixes 2210, remove _DisableJitWrapper
* PR 2684: Fix 2340, generator error yielding bool
* PR 2693: Add travis-ci testing of NumPy 1.14, and also check on Python 2.7
* PR 2694: Avoid type inference failure due to a typing template rejection
* PR 2695: Update llvmlite version dependency.
* PR 2696: Fix tuple indexing codegeneration for empty tuple
* PR 2698: Fix 2697 by deferring deletion in the simplify_CFG loop.
* PR 2701: Small fix to avoid tempfiles being created in the current directory
* PR 2725: Fix 2481, LLVM IR parsing error due to mutated IR
* PR 2726: Fix 2673: incorrect fork error msg.
* PR 2728: Alternative to 2620.  Remove dead code ByteCodeInst.get.
* PR 2730: Add guard for test needing SciPy/BLAS

Documentation updates:

* PR 2670: Update communication channels
* PR 2671: Add docs about diagnosing loop vectorizer
* PR 2683: Add docs on const arg requirements and on const mem alloc
* PR 2722: Add docs on numpy support in cuda
* PR 2724: Update doc: warning about unsupported arguments

ParallelAccelerator enhancements/fixes:

Parallel support for `np.arange` and `np.linspace`, also `np.mean`, `np.std`
and `np.var` are added. This was performed as part of a general refactor and
cleanup of the core ParallelAccelerator code.

* PR 2674: Core pa
* PR 2704: Generate Dels after parfor sequential lowering
* PR 2716: Handle matching directly supported functions

CUDA enhancements:

* PR 2665: CUDA DeviceNDArray: Support numpy tranpose API
* PR 2681: Allow Assigning to DeviceNDArrays
* PR 2702: Make DummyArray do High Dimensional Reshapes
* PR 2714: Use CFFI to Reuse Code

CUDA fixes:

* PR 2667: Fix CUDA DeviceNDArray slicing
* PR 2686: Fix 2663: incorrect offset when indexing cuda array.
* PR 2687: Ensure Constructed Stream Bound
* PR 2706: Workaround for unexpected warp divergence due to exception raising
code
* PR 2707: Fix regression: cuda test submodules not loading properly in
runtests
* PR 2731: Use more challenging values in slice tests.
* PR 2720: A quick testsuite fix to not run the new cuda testcase in the
multiprocess pool

Contributors:

The following people contributed to this release.

* Coutinho Menezes Nilo
* Daniel
* Ehsan Totoni
* Nick White
* Paul H. Liu
* Siu Kwan Lam
* Stan Seibert
* Stuart Archibald
* Todd A. Anderson

0.36.2

--------------

This is a bugfix release that provides minor changes to address:

* PR 2645: Avoid CPython bug with ``exec`` in older 2.7.x.
* PR 2652: Add support for CUDA 9.

0.36.1

--------------

This release continues to add new features to the work undertaken in partnership
with Intel on ParallelAccelerator technology. Other changes of note include the
compilation chain being updated to use LLVM 5.0 and the production of conda
packages using conda-build 3 and the new compilers that ship with it.

NOTE: A version 0.36.0 was tagged for internal use but not released.

ParallelAccelerator:

NOTE: The ParallelAccelerator technology is under active development and should
be considered experimental.

New features relating to ParallelAccelerator, from work undertaken with Intel,
include the addition of the `stencil` decorator for ease of implementation of
stencil-like computations, support for general reductions, and slice and
range fusion for parallel slice/bit-array assignments. Documentation on both the
use and implementation of the above has been added. Further, a new debug
environment variable `NUMBA_DEBUG_ARRAY_OPT_STATS` is made available to give
information about which operators/calls are converted to parallel for-loops.

ParallelAccelerator features:

* PR 2457: Stencil Computations in ParallelAccelerator
* PR 2548: Slice and range fusion, parallelizing bitarray and slice assignment
* PR 2516: Support general reductions in ParallelAccelerator

ParallelAccelerator fixes:

* PR 2540: Fix bug 2537
* PR 2566: Fix issue 2564.
* PR 2599: Fix nested multi-dimensional parfor type inference issue
* PR 2604: Fixes for stencil tests and cmath sin().
* PR 2605: Fixes issue 2603.

Additional features of note:

This release of Numba (and llvmlite) is updated to use LLVM version 5.0 as the
compiler back end, the main change to Numba to support this was the addition of
a custom symbol tracker to avoid the calls to LLVM's `ExecutionEngine` that was
crashing when asking for non-existent symbol addresses. Further, the conda
packages for this release of Numba are built using conda build version 3 and the
new compilers/recipe grammar that are present in that release.

* PR 2568: Update for LLVM 5
* PR 2607: Fixes abort when getting address to "nrt_unresolved_abort"
* PR 2615: Working towards conda build 3

Thanks to community feedback and bug reports, the following fixes were also
made.

Misc fixes/enhancements:

* PR 2534: Add tuple support to np.take.
* PR 2551: Rebranding fix
* PR 2552: relative doc links
* PR 2570: Fix issue 2561, handle missing successor on loop exit
* PR 2588: Fix 2555. Disable libpython.so linking on linux
* PR 2601: Update llvmlite version dependency.
* PR 2608: Fix potential cache file collision
* PR 2612: Fix NRT test failure due to increased overhead when running in coverage
* PR 2619: Fix dubious pthread_cond_signal not in lock
* PR 2622: Fix `np.nanmedian` for all NaN case.
* PR 2633: Fix markdown in CONTRIBUTING.md
* PR 2635: Make the dependency on compilers for AOT optional.

CUDA support fixes:

* PR 2523: Fix invalid cuda context in memory transfer calls in another thread
* PR 2575: Use CPU to initialize xoroshiro states for GPU RNG. Fixes 2573
* PR 2581: Fix cuda gufunc mishandling of scalar arg as array and out argument

0.35.0

--------------

This release includes some exciting new features as part of the work
performed in partnership with Intel on ParallelAccelerator technology.
There are also some additions made to Numpy support and small but
significant fixes made as a result of considerable effort spent chasing bugs
and implementing stability improvements.


ParallelAccelerator:

NOTE: The ParallelAccelerator technology is under active development and should
be considered experimental.

New features relating to ParallelAccelerator, from work undertaken with Intel,
include support for a larger range of `np.random` functions in `parallel`
mode, printing Numpy arrays in no Python mode, the capacity to initialize Numpy
arrays directly from list comprehensions, and the axis argument to `.sum()`.
Documentation on the ParallelAccelerator technology implementation has also
been added. Further, a large amount of work on equivalence relations was
undertaken to enable runtime checks of broadcasting behaviours in parallel mode.

ParallelAccelerator features:

* PR 2400: Array comprehension
* PR 2405: Support printing Numpy arrays
* PR 2438: from Support more np.random functions in ParallelAccelerator
* PR 2482: Support for sum with axis in nopython mode.
* PR 2487: Adding developer documentation for ParallelAccelerator technology.
* PR 2492: Core PA refactor adds assertions for broadcast semantics

ParallelAccelerator fixes:

* PR 2478: Rename cfg before parfor translation (2477)
* PR 2479: Fix broken array comprehension tests on unsupported platforms
* PR 2484: Fix array comprehension test on win64
* PR 2506: Fix for 32-bit machines.


Additional features of note:

Support for `np.take`, `np.finfo`, `np.iinfo` and `np.MachAr` in no Python
mode is added. Further, three new environment variables are added, two for
overriding CPU target/features and another to warn if `parallel=True` was set
no such transform was possible.

* PR 2490: Implement np.take and ndarray.take
* PR 2493: Display a warning if parallel=True is set but not possible.
* PR 2513: Add np.MachAr, np.finfo, np.iinfo
* PR 2515: Allow environ overriding of cpu target and cpu features.


Due to expansion of the test farm and a focus on fixing bugs, the following
fixes were also made.

Misc fixes/enhancements:

* PR 2455: add contextual information to runtime errors
* PR 2470: Fixes 2458, poor performance in np.median
* PR 2471: Ensure LLVM threadsafety in {g,}ufunc building.
* PR 2494: Update doc theme
* PR 2503: Remove hacky code added in 2482 and feature enhancement
* PR 2505: Serialise env mutation tests during multithreaded testing.
* PR 2520: Fix failing cpu-target override tests

CUDA support fixes:

* PR 2504: Enable CUDA toolkit version testing
* PR 2509: Disable tests generating code unavailable in lower CC versions.
* PR 2511: Fix Windows 64 bit CUDA tests.

0.34.0

--------------

This release adds a significant set of new features arising from combined work
with Intel on ParallelAccelerator technology. It also adds list comprehension
and closure support, support for Numpy 1.13 and a new, faster, CUDA reduction
algorithm. For Linux users this release is the first to be built on Centos 6,
which will be the new base platform for future releases. Finally a number of
thread-safety, type inference and other smaller enhancements and bugs have been
fixed.


ParallelAccelerator features:

NOTE: The ParallelAccelerator technology is under active development and should
be considered experimental.

The ParallelAccelerator technology is accessed via a new "nopython" mode option
"parallel". The ParallelAccelerator technology attempts to identify operations
which have parallel semantics (for instance adding a scalar to a vector), fuse
together adjacent such operations, and then parallelize their execution across
a number of CPU cores. This is essentially auto-parallelization.

In addition to the auto-parallelization feature, explicit loop based
parallelism is made available through the use of `prange` in place of `range`
as a loop iterator.

More information and examples on both auto-parallelization and `prange` are
available in the documentation and examples directory respectively.

As part of the necessary work for ParallelAccelerator, support for closures
and list comprehensions is added:

* PR 2318: Transfer ParallelAccelerator technology to Numba
* PR 2379: ParallelAccelerator Core Improvements
* PR 2367: Add support for len(range(...))
* PR 2369: List comprehension
* PR 2391: Explicit Parallel Loop Support (prange)

The ParallelAccelerator features are available on all supported platforms and
Python versions with the exceptions of (with view of supporting in a future
release):

* The combination of Windows operating systems with Python 2.7.
* Systems running 32 bit Python.


CUDA support enhancements:

* PR 2377: New GPU reduction algorithm


CUDA support fixes:

* PR 2397: Fix 2393, always set alignment of cuda static memory regions


Misc Fixes:

* PR 2373, Issue 2372: 32-bit compatibility fix for parfor related code
* PR 2376: Fix 2375 missing stdint.h for py2.7 vc9
* PR 2378: Fix deadlock in parallel gufunc when kernel acquires the GIL.
* PR 2382: Forbid unsafe casting in bitwise operation
* PR 2385: docs: fix Sphinx errors
* PR 2396: Use 64-bit RHS operand for shift
* PR 2404: Fix threadsafety logic issue in ufunc compilation cache.
* PR 2424: Ensure consistent iteration order of blocks for type inference.
* PR 2425: Guard code to prevent the use of 'parallel' on win32 + py27
* PR 2426: Basic test for Enum member type recovery.
* PR 2433: Fix up the parfors tests with respect to windows py2.7
* PR 2442: Skip tests that need BLAS/LAPACK if scipy is not available.
* PR 2444: Add test for invalid array setitem
* PR 2449: Make the runtime initialiser threadsafe
* PR 2452: Skip CFG test on 64bit windows


Misc Enhancements:

* PR 2366: Improvements to IR utils
* PR 2388: Update README.rst to indicate the proper version of LLVM
* PR 2394: Upgrade to llvmlite 0.19.*
* PR 2395: Update llvmlite version to 0.19
* PR 2406: Expose environment object to ufuncs
* PR 2407: Expose environment object to target-context inside lowerer
* PR 2413: Add flags to pass through to conda build for buildbot
* PR 2414: Add cross compile flags to local recipe
* PR 2415: A few cleanups for rewrites
* PR 2418: Add getitem support for Enum classes
* PR 2419: Add support for returning enums in vectorize
* PR 2421: Add copyright notice for Intel contributed files.
* PR 2422: Patch code base to work with np 1.13 release
* PR 2448: Adds in warning message when using 'parallel' if cache=True
* PR 2450: Add test for keyword arg on .sum-like and .cumsum-like array
methods

0.33.0

--------------

This release resolved several performance issues caused by atomic
reference counting operations inside loop bodies.  New optimization
passes have been added to reduce the impact of these operations.  We
observe speed improvements between 2x-10x in affected programs due to
the removal of unnecessary reference counting operations.

There are also several enhancements to the CUDA GPU support:

* A GPU random number generator based on `xoroshiro128+ algorithm <http://xoroshiro.di.unimi.it/>`_ is added.
See details and examples in :ref:`documentation <cuda-random>`.
* ``cuda.jit`` CUDA kernels can now call ``jit`` and ``njit``
CPU functions and they will automatically be compiled as CUDA device
functions.
* CUDA IPC memory API is exposed for sharing memory between proceses.
See usage details in :ref:`documentation <cuda-ipc-memory>`.

Reference counting enhancements:

* PR 2346, Issue 2345, 2248: Add extra refcount pruning after inlining
* PR 2349: Fix refct pruning not removing refct op with tail call.
* PR 2352, Issue 2350: Add refcount pruning pass for function that does not need refcount

CUDA support enhancements:

* PR 2023: Supports CUDA IPC for device array
* PR 2343, Issue 2335: Allow CPU jit decorated function to be used as cuda device function
* PR 2347: Add random number generator support for CUDA device code
* PR 2361: Update autotune table for CC: 5.3, 6.0, 6.1, 6.2

Misc fixes:

* PR 2362: Avoid test failure due to typing to int32 on 32-bit platforms
* PR 2359: Fixed nogil example that threw a TypeError when executed.
* PR 2357, Issue 2356: Fix fragile test that depends on how the script is executed.
* PR 2355: Fix cpu dispatcher referenced as attribute of another module
* PR 2354: Fixes an issue with caching when function needs NRT and refcount pruning
* PR 2342, Issue 2339: Add warnings to inspection when it is used on unserialized cached code
* PR 2329, Issue 2250: Better handling of missing op codes

Misc enhancements:

* PR 2360: Adds missing values in error mesasge interp.
* PR 2353: Handle when get_host_cpu_features() raises RuntimeError
* PR 2351: Enable SVML for erf/erfc/gamma/lgamma/log2
* PR 2344: Expose error_model setting in jit decorator
* PR 2337: Align blocking terminate support for fork() with new TBB version
* PR 2336: Bump llvmlite version to 0.18
* PR 2330: Core changes in PR 2318

0.33.0dev

----------

In development

0.32.1

---------------------

This is a small patch release that addresses some packaging issues:

Pull requests:

* PR 580: Trove classifiers may be out of date.
* PR 581: Add FAQ entry on LLVM version support.
* PR 582: Adds override for LLVM version check, re-formats docs.

Authors:

* Stuart Archibald (core dev)
* Valentin Haenel (core dev)

0.32.0

--------------

In this release, we are upgrading to LLVM 4.0.  A lot of work has been done
to fix many race-condition issues inside LLVM when the compiler is
used concurrently, which is likely when Numba is used with Dask.

Improvements:

* PR 2322: Suppress test error due to unknown but consistent error with tgamma
* PR 2320: Update llvmlite dependency to 0.17
* PR 2308: Add details to error message on why cuda support is disabled.
* PR 2302: Add os x to travis
* PR 2294: Disable remove_module on MCJIT due to memory leak inside LLVM
* PR 2291: Split parallel tests and recycle workers to tame memory usage
* PR 2253: Remove the pointer-stuffing hack for storing meminfos in lists

Fixes:

* PR 2331: Fix a bug in the GPU array indexing
* PR 2326: Fix 2321 docs referring to non-existing function.
* PR 2316: Fixing more race-condition problems
* PR 2315: Fix 2314.  Relax strict type check to allow optional type.
* PR 2310: Fix race condition due to concurrent compilation and cache loading
* PR 2304: Fix intrinsic 1st arg not a typing.Context as stated by the docs.
* PR 2287: Fix int64 atomic min-max
* PR 2286: Fix 2285 `overload_method` not linking dependent libs
* PR 2303: Missing import statements to interval-example.rst

0.31.0

--------------

In this release, we added preliminary support for debugging with GDB
version >= 7.0. The feature is enabled by setting the ``debug=True`` compiler
option, which causes GDB compatible debug info to be generated.
The CUDA backend also gained limited debugging support so that source locations
are showed in memory-checking and profiling tools.
For details, see :ref:`numba-troubleshooting`.

Also, we added the ``fastmath=True`` compiler option to enable unsafe
floating-point transformations, which allows LLVM to auto-vectorize more code.

Other important changes include upgrading to LLVM 3.9.1 and adding support for
Numpy 1.12.

Improvements:

* PR 2281: Update for numpy1.12
* PR 2278: Add CUDA atomic.{max, min, compare_and_swap}
* PR 2277: Add about section to conda recipies to identify license and other
metadata in Anaconda Cloud
* PR 2271: Adopt itanium C++-style mangling for CPU and CUDA targets
* PR 2267: Add fastmath flags
* PR 2261: Support dtype.type
* PR 2249: Changes for llvm3.9
* PR 2234: Bump llvmlite requirement to 0.16 and add install_name_tool_fixer to
mviewbuf for OS X
* PR 2230: Add python3.6 to TravisCi
* PR 2227: Enable caching for gufunc wrapper
* PR 2170: Add debugging support
* PR 2037: inspect_cfg() for easier visualization of the function operation

Fixes:

* PR 2274: Fix nvvm ir patch in mishandling "load"
* PR 2272: Fix breakage to cuda7.5
* PR 2269: Fix caching of copy_strides kernel in cuda.reduce
* PR 2265: Fix 2263: error when linking two modules with dynamic globals
* PR 2252: Fix path separator in test
* PR 2246: Fix overuse of memory in some system with fork
* PR 2241: Fix 2240: __module__ in dynamically created function not a str
* PR 2239: Fix fingerprint computation failure preventing fallback

0.30.1

--------------

This is a bug-fix release to enable Python 3.6 support.  In addition,
there is now early Intel TBB support for parallel ufuncs when building from
source with TBBROOT defined.  The TBB feature is not enabled in our official
builds.

Fixes:

* PR 2232: Fix name clashes with _Py_hashtable_xxx in Python 3.6.

Improvements:

* PR 2217: Add Intel TBB threadpool implementation for parallel ufunc.

0.30.0

--------------

This release adds preliminary support for Python 3.6, but no official build is
available yet.  A new system reporting tool (``numba --sysinfo``) is added to
provide system information to help core developers in replication and debugging.
See below for other improvements and bug fixes.

Improvements:

* PR 2209: Support Python 3.6.
* PR 2175: Support ``np.trace()``, ``np.outer()`` and ``np.kron()``.
* PR 2197: Support ``np.nanprod()``.
* PR 2190: Support caching for ufunc.
* PR 2186: Add system reporting tool.

Fixes:

* PR 2214, Issue 2212: Fix memory error with ndenumerate and flat iterators.
* PR 2206, Issue 2163: Fix ``zip()`` consuming extra elements in early
exhaustion.
* PR 2185, Issue 2159, 2169: Fix rewrite pass affecting objmode fallback.
* PR 2204, Issue 2178: Fix annotation for liftedloop.
* PR 2203: Fix Appveyor segfault with Python 3.5.
* PR 2202, Issue 2198: Fix target context not initialized when loading from
ufunc cache.
* PR 2172, Issue 2171: Fix optional type unpacking.
* PR 2189, Issue 2188: Disable freezing of big (>1MB) global arrays.
* PR 2180, Issue 2179: Fix invalid variable version in looplifting.
* PR 2156, Issue 2155: Fix divmod, floordiv segfault on CUDA.

0.29.0

--------------

This release extends the support of recursive functions to include direct and
indirect recursion without explicit function type annotations.  See new example
in `examples/mergesort.py`.  Newly supported numpy features include array
stacking functions, np.linalg.eig* functions, np.linalg.matrix_power, np.roots
and array to array broadcasting in assignments.

This release depends on llvmlite 0.14.0 and supports CUDA 8 but it is not
required.

Improvements:

* PR 2130, 2137: Add type-inferred recursion with docs and examples.
* PR 2134: Add ``np.linalg.matrix_power``.
* PR 2125: Add ``np.roots``.
* PR 2129: Add ``np.linalg.{eigvals,eigh,eigvalsh}``.
* PR 2126: Add array-to-array broadcasting.
* PR 2069: Add hstack and related functions.
* PR 2128: Allow for vectorizing a jitted function. (thanks to dhirschfeld)
* PR 2117: Update examples and make them test-able.
* PR 2127: Refactor interpreter class and its results.

Fixes:

* PR 2149: Workaround MSVC9.0 SP1 fmod bug kb982107.
* PR 2145, Issue 2009: Fixes kwargs for jitclass ``__init__`` method.
* PR 2150: Fix slowdown in objmode fallback.
* PR 2050, Issue 1259: Fix liveness problem with some generator loops.
* PR 2072, Issue 1995: Right shift of unsigned LHS should be logical.
* PR 2115, Issue 1466: Fix inspect_types() error due to mangled variable name.
* PR 2119, Issue 2118: Fix array type created from record-dtype.
* PR 2122, Issue 1808: Fix returning a generator due to datamodel error.

0.28.1

--------------

This is a bug-fix release to resolve packaging issues with setuptools
dependency.

0.28.0

--------------

Amongst other improvements, this version improves again the level of
support for linear algebra -- functions from the :mod:`numpy.linalg`
module.  Also, our random generator is now guaranteed to be thread-safe
and fork-safe.

Improvements:

* PR 2019: Add the ``intrinsic`` decorator to define low-level
subroutines callable from JIT functions (this is considered
a private API for now).
* PR 2059: Implement ``np.concatenate`` and ``np.stack``.
* PR 2048: Make random generation fork-safe and thread-safe, producing
independent streams of random numbers for each thread or process.
* PR 2031: Add documentation of floating-point pitfalls.
* Issue 2053: Avoid polling in parallel CPU target (fixes severe performance
regression on Windows).
* Issue 2029: Make default arguments fast.
* PR 2052: Add logging to the CUDA driver.
* PR 2049: Implement the built-in ``divmod()`` function.
* PR 2036: Implement the ``argsort()`` method on arrays.
* PR 2046: Improving CUDA memory management by deferring deallocations
until certain thresholds are reached, so as to avoid breaking asynchronous
execution.
* PR 2040: Switch the CUDA driver implementation to use CUDA's
"primary context" API.
* PR 2017: Allow ``min(tuple)`` and ``max(tuple)``.
* PR 2039: Reduce fork() detection overhead in CUDA.
* PR 2021: Handle structured dtypes with titles.
* PR 1996: Rewrite looplifting as a transformation on Numba IR.
* PR 2014: Implement ``np.linalg.matrix_rank``.
* PR 2012: Implement ``np.linalg.cond``.
* PR 1985: Rewrite even trivial array expressions, which opens the door
for other optimizations (for example, ``array ** 2`` can be converted
into ``array * array``).
* PR 1950: Have ``typeof()`` always raise ValueError on failure.
Previously, it would either raise or return None, depending on the input.
* PR 1994: Implement ``np.linalg.norm``.
* PR 1987: Implement ``np.linalg.det`` and ``np.linalg.slogdet``.
* Issue 1979: Document integer width inference and how to workaround.
* PR 1938: Numba is now compatible with LLVM 3.8.
* PR 1967: Restrict ``np.linalg`` functions to homogeneous dtypes.  Users
wanting to pass mixed-typed inputs have to convert explicitly, which
makes the performance implications more obvious.

Fixes:

* PR 2006: ``array(float32) ** int`` should return ``array(float32)``.
* PR 2044: Allow reshaping empty arrays.
* Issue 2051: Fix refcounting issue when concatenating tuples.
* Issue 2000: Make Numpy optional for setup.py, to allow ``pip install``
to work without Numpy pre-installed.
* PR 1989: Fix assertion in ``Dispatcher.disable_compile()``.
* Issue 2028: Ignore filesystem errors when caching from multiple processes.
* Issue 2003: Allow unicode variable and function names (on Python 3).
* Issue 1998: Fix deadlock in parallel ufuncs that reacquire the GIL.
* PR 1997: Fix random crashes when AOT compiling on certain Windows platforms.
* Issue 1988: Propagate jitclass docstring.
* Issue 1933: Ensure array constants are emitted with the right alignment.

0.27.1

---------------------

Bugfix release for invalid wheel hash for OSX packages.
No change to source code.

0.27.0

--------------

Improvements:

* Issue 1976: improve error message when non-integral dimensions are given
to a CUDA kernel.
* PR 1970: Optimize the power operator with a static exponent.
* PR 1710: Improve contextual information for compiler errors.
* PR 1961: Support printing constant strings.
* PR 1959: Support more types in the print() function.
* PR 1823: Support ``compute_50`` in CUDA backend.
* PR 1955: Support ``np.linalg.pinv``.
* PR 1896: Improve the ``SmartArray`` API.
* PR 1947: Support ``np.linalg.solve``.
* Issue 1943: Improve error message when an argument fails typing.4
* PR 1927: Support ``np.linalg.lstsq``.
* PR 1934: Use system functions for hypot() where possible, instead of our
own implementation.
* PR 1929: Add cffi support to ``cfunc`` objects.
* PR 1932: Add user-controllable thread pool limits for parallel CPU target.
* PR 1928: Support self-recursion when the signature is explicit.
* PR 1890: List all lowering implementations in the developer docs.
* Issue 1884: Support ``np.lib.stride_tricks.as_strided()``.

Fixes:

* Issue 1960: Fix sliced assignment when source and destination areas are
overlapping.
* PR 1963: Make CUDA print() atomic.
* PR 1956: Allow 0d array constants.
* Issue 1945: Allow using Numpy ufuncs in AOT compiled code.
* Issue 1916: Fix documentation example for ``generated_jit``.
* Issue 1926: Fix regression when caching functions in an IPython session.
* Issue 1923: Allow non-intp integer arguments to carray() and farray().
* Issue 1908: Accept non-ASCII unicode docstrings on Python 2.
* Issue 1874: Allow ``del container[key]`` in object mode.
* Issue 1913: Fix set insertion bug when the lookup chain contains deleted
entries.
* Issue 1911: Allow function annotations on jitclass methods.

0.26.0

--------------

This release adds support for ``cfunc`` decorator for exporting numba jitted
functions to 3rd party API that takes C callbacks.  Most of the overhead of
using jitclasses inside the interpreter are eliminated.  Support for
decompositions in ``numpy.linalg`` are added.  Finally, Numpy 1.11 is
supported.

Improvements:

* PR 1889: Export BLAS and LAPACK wrappers for pycc.
* PR 1888: Faster array power.
* Issue 1867: Allow "out" keyword arg for dufuncs.
* PR 1871: ``carray()`` and ``farray()`` for creating arrays from pointers.
* PR 1855: ``cfunc`` decorator for exporting as ctypes function.
* PR 1862: Add support for ``numpy.linalg.qr``.
* PR 1851: jitclass support for '_' and '__' prefixed attributes.
* PR 1842: Optimize jitclass in Python interpreter.
* Issue 1837: Fix CUDA simulator issues with device function.
* PR 1839: Add support for decompositions from ``numpy.linalg``.
* PR 1829: Support Python enums.
* PR 1828: Add support for ``numpy.random.rand() and
``numpy.random.randn()``
* Issue 1825: Use of 0-darray in place of scalar index.
* Issue 1824: Scalar arguments to object mode gufuncs.
* Issue 1813: Let bitwise bool operators return booleans, not integers.
* Issue 1760: Optional arguments in generators.
* PR 1780: Numpy 1.11 support.

0.25.0

--------------

This release adds support for ``set`` objects in nopython mode.  It also
adds support for many missing Numpy features and functions.  It improves
Numba's compatibility and performance when using a distributed execution
framework such as dask, distributed or Spark.  Finally, it removes
compatibility with Python 2.6, Python 3.3 and Numpy 1.6.

Improvements:

* Issue 1800: Add erf(), erfc(), gamma() and lgamma() to CUDA targets.
* PR 1793: Implement more Numpy functions: np.bincount(), np.diff(),
np.digitize(), np.histogram(), np.searchsorted() as well as NaN-aware
reduction functions (np.nansum(), np.nanmedian(), etc.)
* PR 1789: Optimize some reduction functions such as np.sum(), np.prod(),
np.median(), etc.
* PR 1752: Make CUDA features work in dask, distributed and Spark.
* PR 1787: Support np.nditer() for fast multi-array indexing with
broadcasting.
* PR 1799: Report JIT-compiled functions as regular Python functions
when profiling (allowing to see the filename and line number where a
function is defined).
* PR 1782: Support np.any() and np.all().
* Issue 1788: Support the iter() and next() built-in functions.
* PR 1778: Support array.astype().
* Issue 1775: Allow the user to set the target CPU model for AOT compilation.
* PR 1758: Support creating random arrays using the ``size`` parameter
to the np.random APIs.
* PR 1757: Support len() on array.flat objects.
* PR 1749: Remove Numpy 1.6 compatibility.
* PR 1748: Remove Python 2.6 and 3.3 compatibility.
* PR 1735: Support the ``not in`` operator as well as operator.contains().
* PR 1724: Support homogeneous sets in nopython mode.
* Issue 875: make compilation of array constants faster.

Fixes:

* PR 1795: Fix a massive performance issue when calling Numba functions
with distributed, Spark or a similar mechanism using serialization.
* Issue 1784: Make jitclasses usable with NUMBA_DISABLE_JIT=1.
* Issue 1786: Allow using linear algebra functions when profiling.
* Issue 1796: Fix np.dot() memory leak on non-contiguous inputs.
* PR 1792: Fix static negative indexing of tuples.
* Issue 1771: Use fallback cache directory when __pycache__ isn't writable,
such as when user code is installed in a system location.
* Issue 1223: Use Numpy error model in array expressions (e.g. division
by zero returns ``inf`` or ``nan`` instead of raising an error).
* Issue 1640: Fix np.random.binomial() for large n values.
* Issue 1643: Improve error reporting when passing an invalid spec to
``jitclass``.
* PR 1756: Fix slicing with a negative step and an omitted start.

0.24.0

--------------

This release introduces several major changes, including the ``generated_jit``
decorator for flexible specializations as with Julia's "``generated``" macro,
or the SmartArray array wrapper type that allows seamless transfer of array
data between the CPU and the GPU.

This will be the last version to support Python 2.6, Python 3.3 and Numpy 1.6.

Improvements:

* PR 1723: Improve compatibility of JIT functions with the Python profiler.
* PR 1509: Support array.ravel() and array.flatten().
* PR 1676: Add SmartArray type to support transparent data management in
multiple address spaces (host & GPU).
* PR 1689: Reduce startup overhead of importing Numba.
* PR 1705: Support registration of CFFI types as corresponding to known
Numba types.
* PR 1686: Document the extension API.
* PR 1698: Improve warnings raised during type inference.
* PR 1697: Support np.dot() and friends on non-contiguous arrays.
* PR 1692: cffi.from_buffer() improvements (allow more pointer types,
allow non-Numpy buffer objects).
* PR 1648: Add the ``generated_jit`` decorator.
* PR 1651: Implementation of np.linalg.inv using LAPACK.  Thanks to
Matthieu Dartiailh.
* PR 1674: Support np.diag().
* PR 1673: Improve error message when looking up an attribute on an
unknown global.
* Issue 1569: Implement runtime check for the LLVM locale bug.
* PR 1612: Switch to LLVM 3.7 in sync with llvmlite.
* PR 1624: Allow slice assignment of sequence to array.
* PR 1622: Support slicing tuples with a constant slice.

Fixes:

* Issue 1722: Fix returning an optional boolean (bool or None).
* Issue 1734: NRT decref bug when variable is del'ed before being defined,
leading to a possible memory leak.
* PR 1732: Fix tuple getitem regression for CUDA target.
* PR 1718: Mishandling of optional to optional casting.
* PR 1714: Fix .compile() on a JIT function not respecting ._can_compile.
* Issue 1667: Fix np.angle() on arrays.
* Issue 1690: Fix slicing with an omitted stop and a negative step value.
* PR 1693: Fix gufunc bug in handling scalar formal arg with non-scalar
input value.
* PR 1683: Fix parallel testing under Windows.
* Issue 1616: Use system-provided versions of C99 math where possible.
* Issue 1652: Reductions of bool arrays (e.g. sum() or mean()) should
return integers or floats, not bools.
* Issue 1664: Fix regression when indexing a record array with a constant
index.
* PR 1661: Disable AVX on old Linux kernels.
* Issue 1636: Allow raising an exception looked up on a module.

0.23.2

---------------------

This is a bug fix release to assist in addressing a critical Numba issue that
can affect users who download llvmlite packages from sources other than PyPI
(pip), Anaconda, or Intel Python: https://github.com/numba/numba/issues/3006

Support for SVML is now detected at compile time and baked into a function that
is exposed by llvmlite. This function can be queried at runtime to find out if
SVML is supported by the LLVM that llvmlite was compiled against, code
generation paths can then be adjusted accordingly.

The following PRs are closed in this release:

* PR 361: Add SVML detection and a function to declare support.

0.23.1

--------------

This is a bug-fix release to address several regressions introduced
in the 0.23.0 release, and a couple other issues.

Fixes:

* Issue 1645: CUDA ufuncs were broken in 0.23.0.
* Issue 1638: Check tuple sizes when passing a list of tuples.
* Issue 1630: Parallel ufunc would keep eating CPU even after finishing
under Windows.
* Issue 1628: Fix ctypes and cffi tests under Windows with Python 3.5.
* Issue 1627: Fix xrange() support.
* PR 1611: Rewrite variable liveness analysis.
* Issue 1610: Allow nested calls between explicitly-typed ufuncs.
* Issue 1593: Fix `*args` in object mode.

0.23.0

--------------

This release introduces JIT classes using the new ``jitclass`` decorator,
allowing user-defined structures for nopython mode.  Other improvements
and bug fixes are listed below.

Improvements:

* PR 1609: Speed up some simple math functions by inlining them
in their caller
* PR 1571: Implement JIT classes
* PR 1584: Improve typing of array indexing
* PR 1583: Allow printing booleans
* PR 1542: Allow negative values in np.reshape()
* PR 1560: Support vector and matrix dot product, including ``np.dot()``
and the ` operator in Python 3.5
* PR 1546: Support field lookup on record arrays and scalars (i.e.
``array['field']`` in addition to ``array.field``)
* PR 1440: Support the HSA wavebarrier() and activelanepermute_wavewidth()
intrinsics
* PR 1540: Support np.angle()
* PR 1543: Implement CPU multithreaded gufuncs (target="parallel")
* PR 1551: Allow scalar arguments in np.where(), np.empty_like().
* PR 1516: Add some more examples from NumbaPro
* PR 1517: Support np.sinc()

Fixes:

* Issue 1603: Fix calling a non-cached function from a cached function
* Issue 1594: Ensure a list is homogeneous when unboxing
* Issue 1595: Replace deprecated use of get_pointer_to_function()
* Issue 1586: Allow tests to be run by different users on the same machine
* Issue 1587: Make CudaAPIError picklable
* Issue 1568: Fix using Numba from inside Visual Studio 2015
* Issue 1559: Fix serializing a jit function referring a renamed module
* PR 1508: Let reshape() accept integer argument(s), not just a tuple
* Issue 1545: Improve error checking when unboxing list objects
* Issue 1538: Fix array broadcasting in CUDA gufuncs
* Issue 1526: Fix a reference count handling bug

0.22.1

--------------

This is a bug-fix release to resolve some packaging issues and other
problems found in the 0.22.0 release.

Fixes:

* PR 1515: Include MANIFEST.in in MANIFEST.in so that sdist still works from
source tar files.
* PR 1518: Fix reference counting bug caused by hidden alias
* PR 1519: Fix erroneous assert when passing nopython=True to guvectorize.
* PR 1521: Fix cuda.test()

0.22.0

--------------

This release features several highlights: Python 3.5 support, Numpy 1.10
support, Ahead-of-Time compilation of extension modules, additional
vectorization features that were previously only available with the
proprietary extension NumbaPro, improvements in array indexing.

Improvements:

* PR 1497: Allow scalar input type instead of size-1 array to guvectorize
* PR 1480: Add distutils support for AOT compilation
* PR 1460: Create a new API for Ahead-of-Time (AOT) compilation
* PR 1451: Allow passing Python lists to JIT-compiled functions, and
reflect mutations on function return
* PR 1387: Numpy 1.10 support
* PR 1464: Support cffi.FFI.from_buffer()
* PR 1437: Propagate errors raised from Numba-compiled ufuncs; also,
let "division by zero" and other math errors produce a warning instead
of exiting the function early
* PR 1445: Support a subset of fancy indexing
* PR 1454: Support "out-of-line" CFFI modules
* PR 1442: Improve array indexing to support more kinds of basic slicing
* PR 1409: Support explicit CUDA memory fences
* PR 1435: Add support for vectorize() and guvectorize() with HSA
* PR 1432: Implement numpy.nonzero() and numpy.where()
* PR 1416: Add support for vectorize() and guvectorize() with CUDA,
as originally provided in NumbaPro
* PR 1424: Support in-place array operators
* PR 1414: Python 3.5 support
* PR 1404: Add the parallel ufunc functionality originally provided in
NumbaPro
* PR 1393: Implement sorting on arrays and lists
* PR 1415: Add functions to estimate the occupancy of a CUDA kernel
* PR 1360: The JIT cache now stores the compiled object code, yielding
even larger speedups.
* PR 1402: Fixes for the ARMv7 (armv7l) architecture under Linux
* PR 1400: Add the cuda.reduce() decorator originally provided in NumbaPro

Fixes:

* PR 1483: Allow np.empty_like() and friends on non-contiguous arrays
* Issue 1471: Allow caching JIT functions defined in IPython
* PR 1457: Fix flat indexing of boolean arrays
* PR 1421: Allow calling Numpy ufuncs, without an explicit output, on
non-contiguous arrays
* Issue 1411: Fix crash when unpacking a tuple containing a Numba-allocated array
* Issue 1394: Allow unifying range_state32 and range_state64
* Issue 1373: Fix code generation error on lists of bools

0.21.0

--------------

This release introduces support for AMD's Heterogeneous System Architecture,
which allows memory to be shared directly between the CPU and the GPU.
Other major enhancements are support for lists and the introduction of
an opt-in compilation cache.

Improvements:

* PR 1391: Implement print() for CUDA code
* PR 1366: Implement integer typing enhancement proposal (NBEP 1)
* PR 1380: Support the one-argument type() builtin
* PR 1375: Allow boolean evaluation of lists and tuples
* PR 1371: Support array.view() in CUDA mode
* PR 1369: Support named tuples in nopython mode
* PR 1250: Implement numpy.median().
* PR 1289: Make dispatching faster when calling a JIT-compiled function
from regular Python
* Issue 1226: Improve performance of integer power
* PR 1321: Document features supported with CUDA
* PR 1345: HSA support
* PR 1343: Support lists in nopython mode
* PR 1356: Make Numba-allocated memory visible to tracemalloc
* PR 1363: Add an environment variable NUMBA_DEBUG_TYPEINFER
* PR 1051: Add an opt-in, per-function compilation cache

Fixes:

* Issue 1372: Some array expressions would fail rewriting when involved
the same variable more than once, or a unary operator
* Issue 1385: Allow CUDA local arrays to be declared anywhere in a function
* Issue 1285: Support datetime64 and timedelta64 in Numpy reduction functions
* Issue 1332: Handle the EXTENDED_ARG opcode.
* PR 1329: Handle the ``in`` operator in object mode
* Issue 1322: Fix augmented slice assignment on Python 2
* PR 1357: Fix slicing with some negative bounds or step values.

0.20.0

--------------

This release updates Numba to use LLVM 3.6 and CUDA 7 for CUDA support.
Following the platform deprecation in CUDA 7, Numba's CUDA feature is no
longer supported on 32-bit platforms.  The oldest supported version of
Windows is Windows 7.

Improvements:

* Issue 1203: Support indexing ndarray.flat
* PR 1200: Migrate cgutils to llvmlite
* PR 1190: Support more array methods: .transpose(), .T, .copy(), .reshape(), .view()
* PR 1214: Simplify setup.py and avoid manual maintenance
* PR 1217: Support datetime64 and timedelta64 constants
* PR 1236: Reload environment variables when compiling
* PR 1225: Various speed improvements in generated code
* PR 1252: Support cmath module in CUDA
* PR 1238: Use 32-byte aligned allocator to optimize for AVX
* PR 1258: Support numpy.frombuffer()
* PR 1274: Use TravisCI container infrastructure for lower wait time
* PR 1279: Micro-optimize overload resolution in call dispatch
* Issue 1248: Improve error message when return type unification fails

Fixes:

* Issue 1131: Handling of negative zeros in np.conjugate() and np.arccos()
* Issue 1188: Fix slow array return
* Issue 1164: Avoid warnings from CUDA context at shutdown
* Issue 1229: Respect the writeable flag in arrays
* Issue 1244: Fix bug in refcount pruning pass
* Issue 1251: Fix partial left-indexing of Fortran contiguous array
* Issue 1264: Fix compilation error in array expression
* Issue 1254: Fix error when yielding array objects
* Issue 1276: Fix nested generator use

0.19.2

--------------

This release fixes the source distribution on pypi.  The only change is in the
setup.py file.  We do not plan to provide a conda package as this release is
essentially the same as 0.19.1 for conda users.

0.19.1

--------------

* Issue 1196:

* fix double-free segfault due to redundant variable deletion in the
Numba IR (1195)
* fix use-after-delete in array expression rewrite pass

0.19.0

--------------

This version introduces memory management in the Numba runtime, allowing to
allocate new arrays inside Numba-compiled functions.  There is also a rework
of the ufunc infrastructure, and an optimization pass to collapse cascading
array operations into a single efficient loop.

.. warning::
Support for Windows XP and Vista with all compiler targets and support
for 32-bit platforms (Win/Mac/Linux) with the CUDA compiler target are
deprecated.  In the next release of Numba, the oldest version of Windows
supported will be Windows 7.  CPU compilation will remain supported
on 32-bit Linux and Windows platforms.

Known issues:

* There are some performance regressions in very short running ``nopython``
functions due to the additional overhead incurred by memory management.
We will work to reduce this overhead in future releases.

Features:

* Issue 1181: Add a Frequently Asked Questions section to the documentation.
* Issue 1162: Support the ``cumsum()`` and ``cumprod()`` methods on Numpy
arrays.
* Issue 1152: Support the ``*args`` argument-passing style.
* Issue 1147: Allow passing character sequences as arguments to
JIT-compiled functions.
* Issue 1110: Shortcut deforestation and loop fusion for array expressions.
* Issue 1136: Support various Numpy array constructors, for example
numpy.zeros() and numpy.zeros_like().
* Issue 1127: Add a CUDA simulator running on the CPU, enabled with the
NUMBA_ENABLE_CUDASIM environment variable.
* Issue 1086: Allow calling standard Numpy ufuncs without an explicit
output array from ``nopython`` functions.
* Issue 1113: Support keyword arguments when calling numpy.empty()
and related functions.
* Issue 1108: Support the ``ctypes.data`` attribute of Numpy arrays.
* Issue 1077: Memory management for array allocations in ``nopython`` mode.
* Issue 1105: Support calling a ctypes function that takes ctypes.py_object
parameters.
* Issue 1084: Environment variable NUMBA_DISABLE_JIT disables compilation
of ``jit`` functions, instead calling into the Python interpreter
when called.  This allows easier debugging of multiple jitted functions.
* Issue 927: Allow gufuncs with no output array.
* Issue 1097: Support comparisons between tuples.
* Issue 1075: Numba-generated ufuncs can now be called from ``nopython``
functions.
* Issue 1062: ``vectorize`` now allows omitting the signatures, and will
compile the required specializations on the fly (like ``jit`` does).
* Issue 1027: Support numpy.round().
* Issue 1085: Allow returning a character sequence (as fetched from a
structured array) from a JIT-compiled function.

Fixes:

* Issue 1170: Ensure ``ndindex()``, ``ndenumerate()`` and ``ndarray.flat``
work properly inside generators.
* Issue 1151: Disallow unpacking of tuples with the wrong size.
* Issue 1141: Specify install dependencies in setup.py.
* Issue 1106: Loop-lifting would fail when the lifted loop does not
produce any output values for the function tail.
* Issue 1103: Fix mishandling of some inputs when a JIT-compiled function
is called with multiple array layouts.
* Issue 1089: Fix range() with large unsigned integers.
* Issue 1088: Install entry-point scripts (numba, pycc) from the conda
build recipe.
* Issue 1081: Constant structured scalars now work properly.
* Issue 1080: Fix automatic promotion of booleans to integers.

0.18.2

--------------

Bug fixes:

* Issue 1073: Fixes missing template file for HTML annotation
* Issue 1074: Fixes CUDA support on Windows machine due to NVVM API mismatch

0.18.1

--------------

Version 0.18.0 is not officially released.

This version removes the old deprecated and undocumented ``argtypes`` and
``restype`` arguments to the ``jit`` decorator.  Function signatures
should always be passed as the first argument to ``jit``.

Features:

* Issue 960: Add inspect_llvm() and inspect_asm() methods to JIT-compiled
functions: they output the LLVM IR and the native assembler source of the
compiled function, respectively.
* Issue 990: Allow passing tuples as arguments to JIT-compiled functions
in ``nopython`` mode.
* Issue 774: Support two-argument round() in ``nopython`` mode.
* Issue 987: Support missing functions from the math module in nopython
mode: frexp(), ldexp(), gamma(), lgamma(), erf(), erfc().
* Issue 995: Improve code generation for round() on Python 3.
* Issue 981: Support functions from the random and numpy.random modules
in ``nopython`` mode.
* Issue 979: Add cuda.atomic.max().
* Issue 1006: Improve exception raising and reporting.  It is now allowed
to raise an exception with an error message in ``nopython`` mode.
* Issue 821: Allow ctypes- and cffi-defined functions as arguments to
``nopython`` functions.
* Issue 901: Allow multiple explicit signatures with ``jit``.  The
signatures must be passed in a list, as with ``vectorize``.
* Issue 884: Better error message when a JIT-compiled function is called
with the wrong types.
* Issue 1010: Simpler and faster CUDA argument marshalling thanks to a
refactoring of the data model.
* Issue 1018: Support arrays of scalars inside Numpy structured types.
* Issue 808: Reduce Numba import time by half.
* Issue 1021: Support the buffer protocol in ``nopython`` mode.
Buffer-providing objects, such as ``bytearray``, ``array.array`` or
``memoryview`` support array-like operations such as indexing and iterating.
Furthermore, some standard attributes on the ``memoryview`` object are
supported.
* Issue 1030: Support nested arrays in Numpy structured arrays.
* Issue 1033: Implement the inspect_types(), inspect_llvm() and inspect_asm()
methods for CUDA kernels.
* Issue 1029: Support Numpy structured arrays with CUDA as well.
* Issue 1034: Support for generators in nopython and object mode.
* Issue 1044: Support default argument values when calling Numba-compiled
functions.
* Issue 1048: Allow calling Numpy scalar constructors from CUDA functions.
* Issue 1047: Allow indexing a multi-dimensional array with a single integer,
to take a view.
* Issue 1050: Support len() on tuples.
* Issue 1011: Revive HTML annotation.

Fixes:

* Issue 977: Assignment optimization was too aggressive.
* Issue 561: One-argument round() now returns an int on Python 3.
* Issue 1001: Fix an unlikely bug where two closures with the same name
and id() would compile to the same LLVM function name, despite different
closure values.
* Issue 1006: Fix reference leak when a JIT-compiled function is disposed of.
* Issue 1017: Update instructions for CUDA in the README.
* Issue 1008: Generate shorter LLVM type names to avoid segfaults with CUDA.
* Issue 1005: Properly clean up references when raising an exception from
object mode.
* Issue 1041: Fix incompatibility between Numba and the third-party
library "future".
* Issue 1053: Fix the size attribute of CUDA shared arrays.

0.18.0

---------------------

This is a minor release that fixes several issues (263, 262, 258, 237) with
the wheel build.  In addition, we have minor fixes for running on PPC64LE
platforms (261).  And, we added CI testing against PyPy (253).

0.17.1

----------------------

This is a bugfix release that addresses issue 258 that our LLVM
binding shared library is missing from the wheel builds.

0.17.0

--------------

The major focus in this release has been a rewrite of the documentation.
The new documentation is better structured and has more detailed coverage
of Numba features and APIs.  It can be found online at
http://numba.pydata.org/numba-doc/dev/index.html

Features:

* Issue 895: LLVM can now inline nested function calls in ``nopython`` mode.
* Issue 863: CUDA kernels can now infer the types of their arguments
("autojit"-like).
* Issue 833: Support numpy.{min,max,argmin,argmax,sum,mean,var,std}
in ``nopython`` mode.
* Issue 905: Add a ``nogil`` argument to the ``jit`` decorator, to
release the GIL in ``nopython`` mode.
* Issue 829: Add a ``identity`` argument to ``vectorize`` and
``guvectorize``, to set the identity value of the ufunc.
* Issue 843: Allow indexing 0-d arrays with the empty tuple.
* Issue 933: Allow named arguments, not only positional arguments, when
calling a Numba-compiled function.
* Issue 902: Support numpy.ndenumerate() in ``nopython`` mode.
* Issue 950: AVX is now enabled by default except on Sandy Bridge and
Ivy Bridge CPUs, where it can produce slower code than SSE.
* Issue 956: Support constant arrays of structured type.
* Issue 959: Indexing arrays with floating-point numbers isn't allowed
anymore.
* Issue 955: Add support for 3D CUDA grids and thread blocks.
* Issue 902: Support numpy.ndindex() in ``nopython`` mode.
* Issue 951: Numpy number types (``numpy.int8``, etc.) can be used as
constructors for type conversion in ``nopython`` mode.

Fixes:

* Issue 889: Fix ``NUMBA_DUMP_ASSEMBLY`` for the CUDA backend.
* Issue 903: Fix calling of stdcall functions with ctypes under Windows.
* Issue 908: Allow lazy-compiling from several threads at once.
* Issue 868: Wrong error message when multiplying a scalar by a non-scalar.
* Issue 917: Allow vectorizing with datetime64 and timedelta64 in the
signature (only with unit-less values, though, because of a Numpy limitation).
* Issue 431: Allow overloading of cuda device function.
* Issue 917: Print out errors occurred in object mode ufuncs.
* Issue 923: Numba-compiled ufuncs now inherit the name and doc of the
original Python function.
* Issue 928: Fix boolean return value in nested calls.
* Issue 915: ``jit`` called with an explicit signature with a mismatching
type of arguments now raises an error.
* Issue 784: Fix the truth value of NaNs.
* Issue 953: Fix using shared memory in more than one function (kernel or
device).
* Issue 970: Fix an uncommon double to uint64 conversion bug on CentOS5
32-bit (C compiler issue).

0.16.0

--------------

This release contains a major refactor to switch from llvmpy to `llvmlite <https://github.com/numba/llvmlite>`_
as our code generation backend.  The switch is necessary to reconcile
different compiler requirements for LLVM 3.5 (needs C++11) and Python
extensions (need specific compiler versions on Windows). As a bonus, we have
found the use of llvmlite speeds up compilation by a factor of 2!

Other Major Changes:

* Faster dispatch for numpy structured arrays
* Optimized array.flat()
* Improved CPU feature selection
* Fix constant tuple regression in macro expansion code

Known Issues:

* AVX code generation is still disabled by default due to performance
regressions when operating on misaligned NumPy arrays.  We hope to have a
workaround in the future.
* In *extremely* rare circumstances, a `known issue with LLVM 3.5 <http://llvm.org/bugs/show_bug.cgi?id=21423>`_
code generation can cause an ELF relocation error on 64-bit Linux systems.

0.15.1

--------------

(This was a bug-fix release that superceded version 0.15 before it was
announced.)

Fixes:

* Workaround for missing __ftol2 on Windows XP.
* Do not lift loops for compilation that contain break statements.
* Fix a bug in loop-lifting when multiple values need to be returned to
the enclosing scope.
* Handle the loop-lifting case where an accumulator needs to be updated when
the loop count is zero.

0.15

------------

Features:

* Support for the Python ``cmath`` module.  (NumPy complex functions were
already supported.)
* Support for ``.real``, ``.imag``, and `.conjugate()`` on non-complex
numbers.
* Add support for ``math.isfinite()`` and ``math.copysign()``.
* Compatibility mode: If enabled (off by default), a failure to compile in
object mode will fall back to using the pure Python implementation of the
function.
* *Experimental* support for serializing JIT functions with cloudpickle.
* Loop-jitting in object mode now works with loops that modify scalars that
are accessed after the loop, such as accumulators.
* ``vectorize`` functions can be compiled in object mode.
* Numba can now be built using the `Visual C++ Compiler for Python 2.7 <http://aka.ms/vcpython27>`_
on Windows platforms.
* CUDA JIT functions can be returned by factory functions with variables in
the closure frozen as constants.
* Support for "optional" types in nopython mode, which allow ``None`` to be a
valid value.

Fixes:

* If nopython mode compilation fails for any reason, automatically fall back
to object mode (unless nopython=True is passed to jit) rather than raise
an exeception.
* Allow function objects to be returned from a function compiled in object
mode.
* Fix a linking problem that caused slower platform math functions (such as
``exp()``) to be used on Windows, leading to performance regressions against
NumPy.
* ``min()`` and ``max()`` no longer accept scalars arguments in nopython mode.
* Fix handling of ambigous type promotion among several compiled versions of a
JIT function.  The dispatcher will now compile a new version to resolve the
problem.  (issue 776)
* Fix float32 to uint64 casting bug on 32-bit Linux.
* Fix type inference to allow forced casting of return types.
* Allow the shape of a 1D ``cuda.shared.array`` and ``cuda.local.array`` to be
a one-element tuple.
* More correct handling of signed zeros.
* Add custom implementation of ``atan2()`` on Windows to handle special cases
properly.
* Eliminated race condition in the handling of the pagelocked staging area
used when transferring CUDA arrays.
* Fix non-deterministic type unification leading to varying performance.
(issue 797)

0.15.0

----------------------

Enhancements:

* PR 213: Add partial LLVM bindings for ObjectFile.
* PR 215: Add inline assembly helpers in the builder.
* PR 216: Allow specifying alignment in alloca instructions.
* PR 219: Remove unnecessary verify in module linkage.

Fixes:

* PR 209, Issue 208: Fix overly restrictive test for library filenames.

0.14

------------

Features:

* Support for nearly all the Numpy math functions (including comparison,
logical, bitwise and some previously missing float functions) in nopython mode.
* The Numpy datetime64 and timedelta64 dtypes are supported in nopython mode
with Numpy 1.7 and later.
* Support for Numpy math functions on complex numbers in nopython mode.
* ndarray.sum() is supported in nopython mode.
* Better error messages when unsupported types are used in Numpy math functions.
* Set NUMBA_WARNINGS=1 in the environment to see which functions are compiled
in object mode vs. nopython mode.
* Add support for the two-argument pow() builtin function in nopython mode.
* New developer documentation describing how Numba works, and how to
add new types.
* Support for Numpy record arrays on the GPU. (Note: Improper alignment of dtype
fields will cause an exception to be raised.)
* Slices on GPU device arrays.
* GPU objects can be used as Python context managers to select the active
device in a block.
* GPU device arrays can be bound to a CUDA stream.  All subsequent operations
(such as memory copies) will be queued on that stream instead of the default.
This can prevent unnecessary synchronization with other streams.

Fixes:

* Generation of AVX instructions has been disabled to avoid performance bugs
when calling external math functions that may use SSE instructions,
especially on OS X.
* JIT functions can be removed by the garbage collector when they are no
longer accessible.
* Various other reference counting fixes to prevent memory leaks.
* Fixed handling of exception when input argument is out of range.
* Prevent autojit functions from making unsafe numeric conversions when
called with different numeric types.
* Fix a compilation error when an unhashable global value is accessed.
* Gracefully handle failure to enable faulthandler in the IPython Notebook.
* Fix a bug that caused loop lifting to fail if the loop was inside an
``else`` block.
* Fixed a problem with selecting CUDA devices in multithreaded programs on
Linux.
* The ``pow()`` function (and ``**`` operation) applied to two integers now
returns an integer rather than a float.
* Numpy arrays using the object dtype no longer cause an exception in the
autojit.
* Attempts to write to a global array will cause compilation to fall back
to object mode, rather than attempt and fail at nopython mode.
* ``range()`` works with all negative arguments (ex: ``range(-10, -12, -1)``)

0.14.0

----------------------

Enhancements:

* PR 104: Add binding to get and view function control-flow graph.
* PR 210: Improve llvmdev recipe.
* PR 212: Add initializer for the native assembly parser.

0.13.4

--------------

Features:

* Setting and deleting attributes in object mode
* Added documentation of supported and currently unsupported numpy ufuncs
* Assignment to 1-D numpy array slices
* Closure variables and functions can be used in object mode
* All numeric global values in modules can be used as constants in JIT
compiled code
* Support for the start argument in enumerate()
* Inplace arithmetic operations (+=, -=, etc.)
* Direct iteration over a 1D numpy array (e.g. "for x in array: ...")
in nopython mode

Fixes:

* Support for NVIDIA compute capability 5.0 devices (such as the GTX 750)
* Vectorize no longer crashes/gives an error when bool\_ is used as return type
* Return the correct dictionary when globals() is used in JIT functions
* Fix crash bug when creating dictionary literals in object
* Report more informative error message on import if llvmpy is too old
* Temporarily disable pycc --header, which generates incorrect function
signatures.

0.13.3

--------------

Features:

* Support for enumerate() and zip() in nopython mode
* Increased LLVM optimization of JIT functions to -O1, enabling automatic
vectorization of compiled code in some cases
* Iteration over tuples and unpacking of tuples in nopython mode
* Support for dict and set (Python >= 2.7) literals in object mode

Fixes:

* JIT functions have the same __name__ and __doc__ as the original function.
* Numerous improvements to better match the data types and behavior of Python
math functions in JIT compiled code on different platforms.
* Importing Numba will no longer throw an exception if the CUDA driver is
present, but cannot be initialized.
* guvectorize now properly supports functions with scalar arguments.
* CUDA driver is lazily initialized

0.13.2

--------------

Features:

* vectorize ufunc now can generate SIMD fast path for unit strided array
* Added cuda.gridsize
* Added preliminary exception handling (raise exception class)

Fixes:

* UNARY_POSITIVE
* Handling of closures and dynamically generated functions
* Global None value

0.13.1

--------------

Features:

* Initial support for CUDA array slicing

Fixes:

* Indirectly fixes numbapro when the system has a incompatible CUDA driver
* Fix numba.cuda.detect
* Export numba.intp and numba.intc

0.13

------------

Features:

* Opensourcing NumbaPro CUDA python support in `numba.cuda`
* Add support for ufunc array broadcasting
* Add support for mixed input types for ufuncs
* Add support for returning tuple from jitted function

Fixes:

* Fix store slice bytecode handling for Python2
* Fix inplace subtract
* Fix pycc so that correct header is emitted
* Allow vectorize to work on functions with jit decorator

0.13.0

----------------------

Enhancements:

* PR 176: Switch from LLVM 3.7 to LLVM 3.8.
* PR 191: Allow setting the alignment of a global variable.
* PR 198: Add missing function attributes.
* PR 160: Escape the contents of metadata strings, to allow embedding
any characters.
* PR 162: Add support for creating debug information nodes.
* PR 200: Improve the usability of metadata emission APIs.
* PR 200: Allow calling functions with metadata arguments
(such as ``llvm.dbg.declare``).

Fixes:

* PR 190: Suppress optimization remarks printed out in some cases by LLVM.
* PR 200: Allow attaching metadata to a ``ret`` instruction.

0.12.2

--------------

Fixes:

* Improved NumPy ufunc support in nopython mode
* Misc bug fixes

0.12.1

--------------

This version fixed many regressions reported by user for the 0.12 release.
This release contains a new loop-lifting mechanism that specializes certains
loop patterns for nopython mode compilation.  This avoid direct support
for heap-allocating and other very dynamic operations.

Improvements:

* Add loop-lifting--jit-ing loops in nopython for object mode code. This allows
functions to allocate NumPy arrays and use Python objects, while the tight
loops in the function can still be compiled in nopython mode. Any arrays that
the tight loop uses should be created before the loop is entered.

Fixes:

* Add support for majority of "math" module functions
* Fix for...else handling
* Add support for builtin round()
* Fix tenary if...else support
* Revive "numba" script
* Fix problems with some boolean expressions
* Add support for more NumPy ufuncs

0.12

this refactor was to simplify the code base to create a better foundation for
further work. A secondary objective was to improve the worst case performance
to ensure that compiled functions in object mode never run slower than pure
Python code (this was a problem in several cases with the old code base). This
refactor is still a work in progress and further testing is needed.

Main improvements:

* Major refactor of compiler for performance and maintenance reasons
* Better fallback to object mode when native mode fails
* Improved worst case performance in object mode

The public interface of numba has been slightly changed. The idea is to
make it cleaner and more rational:

* jit decorator has been modified, so that it can be called without a signature.
When called without a signature, it behaves as the old autojit. Autojit
has been deprecated in favour of this approach.
* Jitted functions can now be overloaded.
* Added a "njit" decorator that behaves like "jit" decorator with nopython=True.
* The numba.vectorize namespace is gone. The vectorize decorator will
be in the main numba namespace.
* Added a guvectorize decorator in the main numba namespace. It is
similiar to numba.vectorize, but takes a dimension signature. It
generates gufuncs. This is a replacement for the GUVectorize gufunc
factory which has been deprecated.

Main regressions (will be fixed in a future release):

* Creating new NumPy arrays is not supported in nopython mode
* Returning NumPy arrays is not supported in nopython mode
* NumPy array slicing is not supported in nopython mode
* lists and tuples are not supported in nopython mode
* string, datetime, cdecimal, and struct types are not implemented yet
* Extension types (classes) are not supported in nopython mode
* Closures are not supported
* Raise keyword is not supported
* Recursion is not support in nopython mode

0.12.0

---------------------

Enhancements:

* PR 179: Let llvmlite build on armv7l Linux.
* PR 161: Allow adding metadata to functions.
* PR 163: Allow emitting fast-math ``fcmp`` instructions.
* PR 159: Allow emitting verbose assembly in TargetMachine.

Fixes:

* Issue 177: Make setup.py compatible with ``pip install``.

0.11

------------
* Experimental support for NumPy datetime type

0.11.0

----------------------

Enhancements:

* PR 175: Check LLVM version at build time
* PR 169: Default initializer for non-external global variable
* PR 168: add ir.Constant.literal_array()

0.10

------------
* Annotation tool (./bin/numba --annotate --fancy) (thanks to Jay Bourque)
* Open sourced prange
* Support for raise statement
* Pluggable array representation
* Support for enumerate and zip (thanks to Eugene Toder)
* Better string formatting support (thanks to Eugene Toder)
* Builtins min(), max() and bool() (thanks to Eugene Toder)
* Fix some code reloading issues (thanks to Björn Linse)
* Recognize NumPy scalar objects (thanks to Björn Linse)

0.10.0

----------------------

Enhancements:

* PR 146: Improve ``setup.py clean`` to wipe more leftovers.
* PR 135: Remove some llvmpy compatibility APIs.
* PR 151: Always copy TargetData when adding to a pass manager.
* PR 148: Make errors more explicit on loading the binding DLL.
* PR 144: Allow overriding ``-flto`` in Linux builds.
* PR 136: Remove Python 2.6 and 3.3 compatibility.
* Issue 131: Allow easier creation of constants by making type instances
callable.
* Issue 130: The test suite now ensures the runtime DLL dependencies
are within a certain expected set.
* Issue 121: Simplify build process on Unix and remove hardcoded linking
with LLVMOProfileJIT.
* Issue 125: Speed up formatting of raw array constants.

Fixes:

* PR 155: Properly emit IR for metadata null.
* PR 153: Remove deprecated uses of ``TargetMachine::getDataLayout()``.
* PR 156: Move personality from LandingPadInstr to FunctionAttributes.
It was moved in LLVM 3.7.
* PR 149: Implement LLVM scoping correctly.
* PR 141: Ensure no CMakeCache.txt file is included in sdist.
* PR 132: Correct constant in ``llvmir.py`` example.

0.9

-----------
* Improved math support
* Open sourced generalized ufuncs
* Improved array expressions

0.9.0

---------------------

Enhancements:

* PR 73: Add get_process_triple() and get_host_cpu_features()
* Switch from LLVM 3.6 to LLVM 3.7.  The generated IR for some memory
operations has changed.
* Improved performance of IR serialization.
* Issue 116: improve error message when the operands of a binop have
differing types.
* PR 113: Let Type.get_abi_{size,alignment} not choke on identified types.
* PR 112: Support non-alphanumeric characters in type names.

Fixes:

* Remove the libcurses dependency on Linux.

0.8

-----------
* Support for autojit classes
* Inheritance not yet supported
* Python 3 support for pycc
* Allow retrieval of ctypes function wrapper
* And hence support retrieval of a pointer to the function
* Fixed a memory leak of array slicing views

0.8.0

---------------------

* Update LLVM to 3.6.2
* Add an *align* parameter to IRBuilder.load() and IRBuilder.store().
* Allow setting visibility, DLL storageclass of ValueRef
* Support profiling with OProfile

0.7.2

-------------
* Official Python 3 support (python 3.2 and 3.3)
* Support for intrinsics and instructions
* Various bug fixes (see https://github.com/numba/numba/issues?milestone=7&state=closed)

0.7.1

-------------
* Various bug fixes

0.7

-----------
* Open sourced single-threaded ufunc vectorizer
* Open sourced NumPy array expression compilation
* Open sourced fast NumPy array slicing
* Experimental Python 3 support
* Support for typed containers
* typed lists and tuples
* Support for iteration over objects
* Support object comparisons
* Preliminary CFFI support
* Jit calls to CFFI functions (passed into autojit functions)
* TODO: Recognize ffi_lib.my_func attributes
* Improved support for ctypes
* Allow declaring extension attribute types as through class attributes
* Support for type casting in Python
* Get the same semantics with or without numba compilation
* Support for recursion
* For jit methods and extension classes
* Allow jit functions as C callbacks
* Friendlier error reporting
* Internal improvements
* A variety of bug fixes

0.7.0

---------------------

* PR 88: Provide hooks into the MCJIT object cache
* PR 87: Add indirect branches and exception handling APIs to ir.Builder.
* PR 86: Add ir.Builder APIs for integer arithmetic with overflow
* Issue 76: Fix non-Windows builds when LLVM was built using CMake
* Deprecate .get_pointer_to_global() and add .get_function_address() and
.get_global_value_address() in ExecutionEngine.

0.6.1

--------------
* Support for bitwise operations

0.6

--------------
* Python 2.6 support
* Programmable typing
* Allow users to add type inference for external code
* Better NumPy type inference
* outer, inner, dot, vdot, tensordot, nonzero, where,
binary ufuncs + methods (reduce, accumulate, reduceat, outer)
* Type based alias analysis
* Support for strict aliasing
* Much faster autojit dispatch when calling from Python
* Faster numerical loops through data and stride pre-loading
* Integral overflow and underflow checking for conversions from objects
* Make Meta dependency optional

0.6.0

---------------------

Enhancements:

* Switch from LLVM 3.5 to LLVM 3.6.  The generated IR for metadata nodes
has slightly changed, and the "old JIT" engine has been remove (only
MCJIT is now available).
* Add an optional flags argument to arithmetic instructions on IRBuilder.
* Support attributes on the return type of a function.

0.5.1

--------------------

Fixes:

* Fix implicit termination of basic block in nested if_then()

0.5

--------------
* SSA-based type inference
* Allows variable reuse
* Allow referring to variables before lexical definition
* Support multiple comparisons
* Support for template types
* List comprehensions
* Support for pointers
* Many bug fixes
* Added user documentation

0.5.0

---------------------

New documentation hosted at http://llvmlite.pydata.org

Enhancements:

* Add code-generation helpers from numba.cgutils
* Support for memset, memcpy, memmove intrinsics

Fixes:

* Fix string encoding problem when round-triping parse_assembly()

0.4

--------------

0.4.0

---------------------

Enhancements:

* Add Module.get_global()
* Renamd Module.global_variables to Module.global_values
* Support loading library parmanently
* Add Type.get_abi_alignment()

Fixes:

* Expose LLVM version as a tuple

0.3.2

--------------

* Add support for object arithmetic (issue 56).
* Bug fixes (issue 55).

0.3

--------------
* Changed default compilation approach to ast
* Added support for cross-module linking
* Added support for closures (can jit inner functions and return them) (see examples/closure.py)
* Added support for dtype structures (can access elements of structure with attribute access) (see examples/structures.py)
* Added support for extension types (numba classes) (see examples/numbaclasses.py)
* Added support for general Python code (use nopython to raise an error if Python C-API is used to avoid unexpected slowness because of lack of implementation defaulting to generic Python)
* Fixed many bugs
* Added support to detect math operations.
* Added with python and with nopython contexts
* Added more examples

Many features need to be documented still.  Look at examples and tests for more information.

0.2.2

---------------------

Enhancements:

* Support for addrspacescast
* Support for tail call, calling convention attribute
* Support for IdentifiedStructType

Fixes:

* GEP addrspace propagation
* Various installation process fixes

0.2

--------------
* Added an ast approach to compilation
* Removed d, f, i, b from numba namespace (use f8, f4, i4, b1)
* Changed function to autojit2
* Added autojit function to decorate calls to the function and use types of the variable to create compiled versions.
* changed keyword arguments to jit and autojit functions to restype and argtypes to be consistent with ctypes module.
* Added pycc -- a python to shared library compiler

0.2.0

---------------------

This is the first official release. It contains a few feature additions
and bug fixes. It meets all requirements to replace llvmpy in numba and
numbapro.

0.1.0

---------------------

This is the first release. This is released for beta testing llvmlite
and numba before the official release.