Blis

Latest version: v0.9.1

Safety actively analyzes 628936 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 7

0.1.8

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 29 13:31:09 2015 -0500

Version file update (0.1.8)

commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e
Merge: fdfe14f1 d4b89136
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 9 13:54:54 2015 -0500

Merge branch 'master' of github.com:flame/blis

commit fdfe14f1e17ba5a2f8dfa0bdb799c6b0e730211b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 9 13:52:39 2015 -0500

Added support for Intel Haswell/Broadwell.

Details:
- Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors
and FMA instructions. (Complex support is currently provided by default
induced method, 4m1a.)
- Added a 'haswell' configuration, which uses the aforementioned kernels.
- Inserted auto-detection support for haswell configuration in
build/auto-detect/cpuid_x86.c.
- Modified configure script to explicitly echo when automatic or manual
configuration is in progress.
- Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.

commit d4b891369c1eb0879ade662ff896a5b9a7fca207
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 7 10:06:53 2015 -0500

Added 'carrizo' configuration.

Details:
- Added a new configuration for AMD Excavator-based hardware also known
as Carrizo when referring to the entire APU. This configuration uses
the same micro-kernels as the piledriver, but with different
cache blocksizes.

commit 0b7255a642d56723f02d7ca1f8f21809967b8515
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 19 12:01:50 2015 -0500

CHANGELOG update (0.1.7)

0.1.7

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 19 12:01:49 2015 -0500

Version file update (0.1.7)

commit 7cd01b71b5e757a6774625b3c9f427f5e7664a76
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 19 11:31:53 2015 -0500

Implemented dynamic allocation for packing buffers.

Details:
- Replaced the old memory allocator, which was based on statically-
allocated arrays, with one based on a new internal pool_t type, which,
combined with a new bli_pool_*() API, provides a new abstract data
type that implements the same memory pool functionality but with blocks
from the heap (ie: malloc() or equivalent). Hiding the details of the
pool in a separate API also allows for a much simpler bli_mem.c family
of functions.
- Added a new internal header, bli_config_macro_defs.h, which enables
sane defaults for the values previously found in bli_config. Those
values can be overridden by defining them in bli_config.h the same
way kernel defaults can be overridden in bli_kernel.h. This file most
resembles what was previously a typical configuration's bli_config.h.
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
blocks in the memory pool. Also added a corresponding query routine to
the bli_info API.
- Deprecated (once again) the micro-panel alignment feature. Upon further
reflection, it seems that the goal of more predictable L1 cache
replacement behavior is outweighed by the harm caused by non-contiguous
micro-panels when k % kc != 0. I honestly don't think anyone will even
miss this feature.
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
bli_cntl_init() instead of bli_init().
- Removed query functions from bli_info.c that are no longer applicable
given the dynamic memory allocator.
- Removed unnecessary definitions from configurations' bli_config.h files,
which are now pleasantly sparse.
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
modules. Thanks to Devangi Parikh for pointing out these
miscalculations.
- Comment, whitespace changes.

commit 9848f255a3bab17d1139c391cca13ff3f1ffe6ed
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 11 19:14:22 2015 -0500

Added early return to API-level _init() routines.

Details:
- Added conditional code that returns early from the API-level _init()
routines if the API is already initialized. Actually meant for this to
be included in 5f93cbe8.

commit 5f93cbe870f3478870e15581e7fd450dad5bba1e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 11 18:52:12 2015 -0500

Introduced API-level initialization.

Details:
- Added API-level initialization state to _const, _error, _mem, _thread,
_ind, and _cntl APIs. While this functionality will mostly go unused,
adding miniscule overhead at init-time, there will be at least once
instance in the near future where, in order to avoid an infinite loop,
a certain portion of the initialization will call a query function that
itself attempts to call bli_init(). API-level initialization will allow
this later stage to verify that an earlier stage of initialization has
completed, even if the overall call to bli_init() has not yet returned.
- Added _is_initialized() functions for each API, setting the underlying
bool_t during _init() and unsetting it during _finalize().
- Comment, whitespace changes.

commit ee129c6b028bc5ac88da7c74fde72c49803742ff
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 10 12:53:28 2015 -0500

Fixed bugs in _get_range(), _get_range_weighted().

Details:
- Fixed some bugs that only manifested in multithreaded instances of
some (non-gemm) level-3 operations. The bugs were related to invalid
allocation of "edge" cases to thread subpartitions. (Here, we define
an "edge" case to be one where the dimension being partitioned for
parallelism is not a whole multiple of whatever register blocksize
is needed in that dimension.) In BLIS, we always require edge cases
to be part of the bottom, right, or bottom-right subpartitions.
(This is so that zero-padding only has to happen at the bottom, right,
or bottom-right edges of micro-panels.) The previous implementations
of bli_get_range() and _get_range_weighted() did not adhere to this
implicit policy and thus produced bad ranges for some combinations of
operation, parameter cases, problem sizes, and n-way parallelism.
- As part of the above fix, the functions bli_get_range() and
_get_range_weighted() have been renamed to use _l2r, _r2l, _t2b,
and _b2t suffixes, similar to the partitioning functions. This is
an easy way to make sure that the variants are calling the right
version of each function. The function signatures have also been
changed slightly.
- Comment/whitespace updates.
- Removed unnecessary '/' from macros in bli_obj_macro_defs.h.

commit 9135dfd69d39f3bbd75034f479f27a78dbfebcce
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 5 13:37:44 2015 -0500

Minor updates to test/3m4m files.

commit d62ceece943b20537ec4dd99f25136b9ba2ae340
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 3 12:56:45 2015 -0500

Minor update to test/3m4m/runme.sh.

Details:
- Removed some stale script code that should have been removed
during 590bb3b8c.

commit b6ee82a3d421c9c4f1eb6848c7c6e37aa46de799
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 3 12:14:23 2015 -0500

Minor cleanup to bli_init() and friends.

Details:
- Spun-off initialization of global scalar constants to bli_const_init()
and of threading stuff to bli_thread_init().
- Added some missing _finalize() functions, even when there is nothing
to do.

commit 1213f5cebabc1637ce9dd45c4bfa87bb93677c29
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 2 13:27:47 2015 -0500

POSIX thread bugfixes/edits to bli_init.c, _mem.c.

Details:
- Fixed a sort-of bug in bli_init.c whereby the wrong pthread mutex
was used to lock access to initialization/finalization actions.
But everything worked out okay as long as bli_init() was called by
single-threaded code.
- Changed to static initialization for memory allocator mutex in
bli_mem.c, and moved mutex to that file (from bli_init.c).
- Fixed some type mismatches in bli_threading_pthreads.c that resulted
in compiler warnings.
- Fixed a small memory leak with allocated-but-never-freed (and unused)
pthread_attr_t objects.
- Whitespace changes to bli_init.c and bli_mem.c.

commit 590bb3b8c5c0389159c5a9451b6c156c5f237e8a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun May 24 16:02:53 2015 -0500

Backed-out adjusted dim changes to test/3m4m.

Details:
- Reverted most changes applied during commit ec25807b.

commit ec25807b26da943868f0d0517c3720e50181b8f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 10 13:23:50 2015 -0500

Tweaks to test/3m4m to test with adjusted dims.

Details:
- Updated test/3m4m driver files to build test drivers that allow
comparision of real "asm_blis" results to complex "asm_blis" results,
except with the latter's problem sizes adjusted so that problems are
generated with equal flop counts.

commit 426b6488580a92bf071a62dc319a9c837ce39821
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 8 15:12:21 2015 -0500

Fixed a packing bug that manifested in trsm_r.

Details:
- Fixed a bug that caused a memory leak in the contiguous memory
allocator. Because packm_init() was using simple aliasing when
a subpartition object was marked as zeros by bli_acquire_mpart_*(),
the "destination" pack object's mem_t entry was being overwritten
by the corresponding field of the "source" object (which was likely
NULL). This prevented the block from being released back to the
memory allocator. But this bug only manifested when changing the
location of packing B from outside the var1 loop to inside the
var3 loop, and only for trsm with triangular B (side = right). The
bug was fixed by changing the type of alias used in packm_init()
when handling zero partition cases. Specifically, we now use
bli_obj_alias_for_packing(), which does not clobber the destination
(pack) object's mem_t field. Thanks to Devangi Parikh for this bug
report.

commit c84286d5cef48f16d83831baac1f46b9856b9a36
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 4 15:39:14 2015 -0500

More minor tweaks to test/3m4m.

Details:
- Added a line of output that forces matlab to allocate the entire array
up-front.
- Re-enabled real domain benchmarks in runme.sh, which were temporarily
disabled.

commit 309717c8ebf4ef1369f15cf41340e13c25b41573
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 3 19:28:49 2015 -0500

More tweaks to test/3m4m, configurations.

Details:
- Fixed incorrect number of mc_x_kc memory blocks in
sandybridge/bli_config.h.
- Enabled OpenMP multithreding in piledriver/bli_config.h.
- More updates to test/3m4m driver files.

commit 4baf3b9c69b2f648be9e46e07ccc9859dd675828
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 3 16:44:32 2015 -0500

Tweaked test/3m4m driver, including acml support.

Details:
- Added ACML support to test/3m4m driver Makefile and runme.sh script.

commit a32f7c49ca4ea869d2a6c66818780f4321743d67
Merge: 349e075a 4bfd1ce8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 3 08:28:11 2015 -0500

Merge pull request 23 from xianyi/master

Add auto-detecting CPU on configure stage.

commit 349e075ad6a8e2a1211d94f36d24828c9d44b052
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 2 18:12:28 2015 -0500

Tweaks to sandybridge config, test/3m4m driver.

Details:
- Enable OpenMP support by default in sandybridge's bli_config.h.
- Reorganized sandybridge's bli_kernel.h.
- Updated 3m4m Makefile, runme.sh to also test MKL implementation.

commit 4bfd1ce8ca93f93d170dd2715f0a32027b417b46
Author: Zhang Xianyi <traits.zhanggmail.com>
Date: Thu Apr 2 16:40:21 2015 -0500

Detect NEON for cortex-a9 and cortex-a15.

commit aa6eec4f43137057276fe6119bdbfb5c52682527
Author: Zhang Xianyi <traits.zhanggmail.com>
Date: Thu Apr 2 16:03:44 2015 -0500

Detect the CPU architecture. Support ARM cores.

Detect the CPU architecture by compiler's predefined macros.
Then, detect the CPU cores.

Support detecting x86 and ARM architectures.

commit 2947cfb749c937b0f62fac36cc92f123bd45b53c
Author: Zhang Xianyi <traits.zhanggmail.com>
Date: Wed Apr 1 12:24:00 2015 -0500

Add auto-detecting CPU on configure stage.
e.g. /Path_to_BLIS/configure auto

Now, it only support detecting x86 CPUs.

commit 26a4b8f6f985597f80e0174990bf541f1d9bafac
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 1 10:44:54 2015 -0500

Implemented 3m2, 3m3 induced algorithms (gemm only).

Details:
- Defined a new "3ms" (separated 3m) pack schema and added appropriate
support in packm_init(), packm_blk_var2().
- Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
as an argument instead of computing it locally. Exception: for trmm,
is_p must be computed locally, since it changes for triangular
packed matrices. Also exposed is_p in interface to dt-specific
packm_blk_var2 (and _var1, even though it does not use imaginary
stride).
- Renamed many functions/variables from _3mi to _3mis to indicate that
they work for either interleaved or separated 3m pack schemas.
- Generalized gemm and herk macro-kernels to pass in imaginary stride
rather than compute them locally.
- Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
and 3m3-specific virtual micro-kernels.
- Added special gemm macro-kernels to support 3m2 and 3m3.
- Added support for 3m2 and 3m3 to testsuite.
- Corrected the type of the panel dimension (pd_) in various macro-
kernels from inc_t to dim_t.
- Renamed many functions defined in bli_blocksize.c.
- Moved most induced-related macro defs from frame/include to
frame/ind/include.
- Updated the _ukernel.c files so that the micro-kernel function pointers
are obtained from the func_t objects rather than the cpp macros that
define the function names.
- Updated test/3m4m driver, Makefile, and run script.

commit ddf62ba7d2da08225b201585b85e06c967767dea
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Mar 27 14:27:51 2015 -0500

Refuse to free the packm thread info if it uses the single threaded version

commit 016fc587584d958a0e430a56a5e2c05022ac2f17
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Mar 27 14:23:02 2015 -0500

Don't free packm thread info if it is null

commit 00a443c529a60862a57b93e303a0b3212c9b1df4
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Mar 27 14:11:07 2015 -0500

Use bli_malloc instead of malloc for the thread info paths

commit f1a6b7d02861ccebdc500ea98778cc0f6cddad17
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 18 15:37:10 2015 -0500

Reorganized code for induced complex methods.

Details:
- Consolidated most of the code relating to induced complex methods
(e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods
are now enabled on a per-operation basis. The current "available"
(enabled and implemented) implementation can then be queried on
an operation basis. Micro-kernel func_t objects as well as blksz_t
objects can also be queried in a similar maner.
- Redefined several micro-kernel and operation-related functions in
bli_info_*() API, in accordance with above changes.
- Added mr and nr fields to blksz_t object, which point to the mr
and nr blksz_t objects for each cache blocksize (and are NULL for
register blocksizes). Renamed the sub-blocksize field "sub" to
"mult" since it is really expressing a blocksize multiple.
- Updated bli_*_determine_kc_[fb]() for gemm/hemm/symm, trmm, and
trsm to correctly query mr and nr (for purposes of nudging kc).
- Introduced an enumerated opid_t in bli_type_defs.h that uniquely
identifies an operation. For now, only level-3 id values are defined,
along with a generic, catch-all BLIS_NOID value.
- Reworked testsuite so that all induced methods that are enabled
are tested (one at a time) rather than only testing the first
available method.
- Reformated summary at the beginning of testsuite output so that
blocksize and micro-kernel info is shown for each induced method
that was requested (as well as native execution).
- Reduced the number of columns needed to display non-matlab
testsuite output (from approx. 90 to 80).

commit 8d5169ccda954e5f72944308a036dcb7ebfc9097
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 18 11:38:08 2015 -0500

Fixed bug in release of mem_t buffer.

Details:
- Fixed a bug that affects all level-2 and level-3 blocked variants. The
bug only manifested, however, if the packing of operands (A and B in
gemm, for example) spanned multiple nodes in the control tree. Until
recently, the main consumers of packm were level-3 operations, all of
which packed both input operands from blocked variant 1 (B outside of
the loop, and A within the loop). This particular usage masked a flaw
in the code whereby bli_obj_release_pack() would always release the
underlying mem_t buffer (provided it was allocated), even if the buffer
was not allocated in the current variant. This has been fixed by
replacing all calls to bli_obj_release_pack() with calls to a new
function, bli_packm_release(), which takes the same control tree node
argument passed into the object's corresponding call to packm_init()
or packv_init(). bli_packm_release() then proceeds to invoke
bli_obj_release_pack() only if the control tree node indicates that
packing was requested. Thanks to Devangi Parikh for identifying this
bug.

commit c0acca0f5182ba96fd39c9d10b34a896a6e74206
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 3 10:56:22 2015 -0600

Clarified comments in testsuite input.operations.

commit 03ba9a6b17861d9e1adc0cf924439c4d7e860d19
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 24 10:33:28 2015 -0600

Removed some 'old' directories.

commit a86db60ee270cdeb745ae7cf68f9e0becc9f522d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Feb 23 18:42:39 2015 -0600

Extensive renaming of 3m/4m-related files, symbols.

Details:
- Renamed all remaining 3m/4m packing files and symbols to 3mi/4mi
('i' for "interleaved"). Similar changes to 3M/4M macros.
- Renamed all 3m/4m files and functions to 3m1/4m1.
- Whitespace changes.

commit 8cf8da291a0fb2f491f410969a76ec0fbda47faf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 20 15:24:27 2015 -0600

Minor updates to induced complex mode management.

Details:
- Relocated bli_4mh.c, bli_4mb.c, bli_4m.c, bli_3mh.c, bli_3m.c (and
associated headers) from frame/base to frame/base/induced.
- Added bli_xm.? to frame/base/induced, which implements
bli_xm_is_enabled(), which detects whether ANY induced complex method
is currently enabled.
- The new function bli_xm_is_enabled() is now used in bli_info.c to
detect when an induced complex method is used, so we know when to
return blocksizes from one of the induced methods' blocksize objects.

commit 411e637ee7d1083a84f58f08938d51e63d7c3c9a
Merge: c2569b88 fc0b7712
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Fri Feb 20 20:39:25 2015 -0600

Merge branch 'master' of http://github.com/flame/blis

commit c2569b8803d4ccc1d7b6f391713461b51443601d
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Fri Feb 20 20:38:19 2015 -0600

Fixed a memory leak in freeing the thread infos

commit fc0b771227abf86d81f505b324f69f6e83db1d8f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 20 11:47:44 2015 -0600

Added max(mr,nr) to kc in static mem pools.

Details:
- Changed the static memory definitions to compute the maximum register
blocksize for each datatype and add it to kc when computing the size
of blocks of A and B. This formally accounts for the nudging of kc
up to a multiple of mr or nr at runtime for triangular operations
(e.g. trmm).

commit af32e3a608631953ef770341df10a14a991bf290
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Thu Feb 19 22:51:11 2015 -0600

Fixed a bug with get_range_weighted would return end = 0 for small problem sizes

commit 441d47542a64e131578d00da7404c1ed387a721c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 19 17:06:10 2015 -0600

Renamed 3m and 4m symbols/macros to 3mi and 4mi.

Details:
- Renamed several variables and macros from 3m/4m to 3mi/4mi. This is
because those packing schemas were always implicitly "interleaved".
This new naming scheme will make way for new schemas that separate
instead of interleve the real and imaginary (and summed) parts.
- Expanded the pack format sub-field of the pack schema field of the
info_t to 4 bits (from 3). This will allow for more schema types
going forward.
- Removed old _cntl.c files for herk3m, herk4m, trmm3m, trmm4m.

commit 518a1756ccf02122b96fc437b538604a597df42a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 19 14:27:09 2015 -0600

Fixed indexing bug for trmm3 via 3mh, 4mh.

Details:
- Fixed a bug that only affected trmm3 when performed via 3mh or 4mh,
whereby micro-panels of the triangular matrix were packed with "dead
space" between them due to failing to adjust for the fact that pointer
arithmetic was occurring in units of complex elements while the data
being packed consisted of real elements. It turns out that the macro-
kernel suffered from the same bug, meaning the panels were actually
being packed and read consistently. The only way I was able to
discover the bug in the first place was because the packed block of A
was overflowing into the beginning of the packed row panel of B using
the sandybridge configuration.

commit 493087d730f01d5169434f461644e5633f48a42f
Merge: 650d2a6f 25021299
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 18 09:45:51 2015 -0600

Merge branch 'master' of github.com:flame/blis

commit 25021299b670775df8ca9c87910c63d7e74ed946
Merge: fe2b8d39 f05a5763
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 11 20:03:21 2015 -0600

Merge branch 'master' of github.com:flame/blis

commit fe2b8d39a445ac848686e78c7540fd046cb95492
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 11 19:33:10 2015 -0600

Fixed an obscure bug in 3mh/3m/4mh/4m packing.

Details:
- Modified bli_packm_blk_var1.c and _var2.c to increase the triangular
case's panel increment by 1 if it would otherwise be odd. This is
particularly necessary in _var2.c when handling the interleaved 3m
or ro/io/rpi pack schemas, since division of an odd number by 2 can
happen if both the panel length and the panel packing dimension
(register packing blocksize) are odd, thus making their product odd.
- Modified bli_packm_init.c so that panel strides are increased by 1
if they would otherwise be odd, even for non-3m related packing.
- Modified the trmm and trsm macro-kernels so that triangular packed
micro-panels are traversed with this new "increment by 1 if odd"
policy.
- Added sanity checks in trmm and trsm macro-kernels that would result
in an abort() if the conditions that would lead to a "divide odd
integer by 2" scenario ever manifest.
- Defined bli_is_odd(), _is_even() macros in bli_scalar_macro_defs.h.

commit 650d2a6ff2e593151a296ca86b5214afcc747afc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Feb 9 14:59:20 2015 -0600

Added initial support for imaginary stride.

Details:
- Added an imaginary stride field ("is") to obj_t.
- Renamed bli_obj_set_incs() macro to bli_obj_set_strides().
- Defined bli_obj_imag_stride() and bli_obj_set_imag_stride() and
added invocations in key locations.
- Added some basic error-checking related to imaginary stride.
- For now, imaginary stride will not be exposed into the most-used
BLIS APIs such as bli_obj_create(), and certainly not the
computational APIs such as bli_dgemm().

commit f05a57634a7c8e3864b25b3335d1194c1ea1aeb9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Feb 8 19:40:34 2015 -0600

Defined gemm cntl function to query ukrs func_t.

Details:
- Added a new function, bli_gemm_cntl_ukrs(), that returns the func_t*
for the gemm micro-kernels from the leaf node of the control tree.
This allows all the func_t* fields from higher-level nodes in the tree
to be NULL, which makes the function that builds the control trees
slightly easier to read.
- Call bli_gemm_cntl_ukrs() instead of the cntl_gemm_ukrs() macro in
all bli_*_front() functions (which is needed to apply the row/column
preference optimization).
- In all level-3 bli_*_cntl_init() functions, changed the _obj_create()
function arguments corresponding to the gemm_ukrs fields in higher-
level cntl tree nodes to NULL.
- Removed some old her2k macro-kernels.

commit cefd3d5d2001264de17cf63dae541f890cb9daaf
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 5 11:09:12 2015 -0600

A couple of functions were incorrectly ifdeffed away on Xeon Phi. Fixed this

commit 7574c9947d57a19f613880e3b9f62f8c8f6df4ec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 4 12:11:55 2015 -0600

Added basic flop-counting mechanism (level-3 only).

Details:
- Added optional flop counting to all level-3 front-ends, which is
enabled via BLIS_ENABLE_FLOP_COUNT. The flop count can be
reset at any time via bli_flop_count_reset() and queried via
bli_flop_count(). Caveats:
- flop counts are approximate for her[2]k, syr[2]k, trmm, and
trsm operations;
- flop counts ignore extra flops due to non-unit alpha;
- flop counts do not account for situations where beta is zero.

commit ceda4f27d1f1bcf19320e09848e0f2e3b9941e6c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jan 29 13:22:54 2015 -0600

Implemented bli_obj_imag_equals().

Details:
- Implemented a new function, bli_obj_imag_equals(), which compares the
imaginary part of the first argument to the second argument, which may
be a BLIS_CONSTANT or of a regular real datatype.

commit 81114824a05a9053229efd577a8a94a856deda93
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jan 6 12:15:21 2015 -0600

Minor 4m/3m consolidation to mem_pool_macro_defs.h.

Details:
- Merged the 4m and 3m definitions in bli_mem_pool_macro_defs.h to
reduce code and improve readability.

commit 36a9b7b7436d9423ba4de2a9f85cfcd43577b783
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Wed Dec 17 21:53:50 2014 +0000

reduced the default number of MC by KC blocks for bgq

commit c60619c7c3568f044a849abbab60209aa7455423
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 16 17:08:22 2014 -0600

Minor tweaks for 3m4m test drivers.

Details:
- Changed gemm_kc blocksizes to be reduced by two-thirds instead of
half.
- Changed 3m4m/test_gemm.c driver to divide by 3 instead of 2 when
computing the fixed k dimension.
- Fixed runme.sh so that it would use multiple threads for s/dgemm
cases.

commit c6929ba6a5e6f633a7295e979a2b8df8c7ecdb1b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 16 11:27:50 2014 -0600

Added 4m_1b to test/3m4m test driver and script.

commit 785d480805fc0d6f4251b5499933515740b6b2a7
Merge: 9456f330 4156c088
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Dec 12 14:34:19 2014 -0600

Merge branch 'master' of github.com:flame/blis

commit 9456f330af4617f9ee32972d51f974aa2d84f97b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Dec 12 14:31:57 2014 -0600

Added 4m_1b implementation for gemm.

Details:
- Added yet another 4m-based implementation for complex domain level-3
operations. This method, which the 3m/4m paper identifies as Algorithm
"4m_1b" fissures the first loop around the micro-kernel so that the
real sub-panel of the current micro-panel of B is multiplied against
(both sub-panels of) all micro-panels of A, before doing the same for
the imaginary sub-panel of the micro-panel of B. For now, only gemm is
supported, and 4m_1b (labeled "4mb" within the framework) is not yet
integrated into the test suite.

commit 4156c0880d9aea4ff04a9c4fa139ba8c437d8bfb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 9 16:03:14 2014 -0600

Fixed obscure level-2 packing / general stride bug.

Details:
- Fixed a bug in certain structured level-2 operations that manifested
only when the structured matrix was provided to BLIS as matrix stored
with general stride. The bug was introduced in c472993b when the
densify field was removed from the packm control tree node and
associated APIs. Since then, the packed object was unconditionally
marked with an uplo field of BLIS_DENSE. This is fine for level-3
operations where micro-panels are always densified, but in level-2
contexts, the underlying unblocked variant (fused or unfused) of
structured operations (e.g. trmv) still needs to know whether to
execute its "lower" or "upper" branches of code. Since this field
was unconditionally being set to BLIS_DENSE, the unblocked variants
were always executed the "else" branch, which happened to be the
"lower" case code. Thus, running an upper case produced the wrong
answer. This most obviously manifested in the form of failures for
trmm, trmm3, and trsm in the test suite.
The bug was fixed by setting the packed object's uplo field to
BLIS_DENSE only if the schema indicated that micro-panels were to be
packed. Otherwise, we can assume we are packing to regular row or
column storage, as is the case with level-2 packing. Thanks to
Francisco Igual for reporting the testsuite failures and ultimately
leading us to this bug.

commit 689f60a578b461119e9ea90c74f642b9eb79addb
Merge: bef24e67 483e4d6a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Dec 7 14:03:30 2014 -0600

Merge pull request 21 from figual/master

Adding armv8a configuration and micro-kernels.

commit 483e4d6a3fdbef9d9ab47fb674c9476c70ca9f0f
Author: Francisco D. Igual <figualucm.es>
Date: Sun Dec 7 20:27:49 2014 +0100

Adding armv8a configuration and micro-kernels.

Only sgemm micro-kernel is fully functional at this point.

commit bef24e67e0f93579c2a80315348dc2e227f72a72
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Nov 26 18:00:56 2014 -0600

Fixed a type of race condition exposed by pthreads implementation.
Lead thread of the inner thread communicator could exit subproblem, move on the next iteration of the loop and modify a1_pack, b1_pack, or c1_pack while other threads were still using those.

Barriers were inserted to fix this.

commit 76bde44411f0e34266bab9d666a54ef22be97320
Merge: e56e6143 f3d729e5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 26 17:25:24 2014 -0600

Merge branch 'master' of github.com:flame/blis

commit f3d729e504ec012e7dc7e02b2ecd42e004c6894d
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Wed Nov 26 22:25:24 2014 -0600

Added static mutex to bli_init and bli_finalize

commit d71cc797866ff502ad1127527016f463267eef80
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Wed Nov 26 21:35:39 2014 -0600

Refactored bli_threading files and added support for pthreads

commit e56e61438ff7fcf25a48c0b7603f18df782b50b6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 26 17:20:35 2014 -0600

Minor cleanups to bli_threading.h and friends.

Details:
- No longer need to define BLIS_ENABLE_MULTITHREADING manually in
bli_config.h; it now gets defined when BLIS_ENABLE_OPENMP or
BLIS_ENABLE_PTHREADS is defined.
- Added sanity check to prevent both BLIS__ENABLE_OPENMP and
BLIS_ENABLE_PTHREADS from being enabled simultaneously.
- Reorganization of bli_threading*.h header files, which led to
simplification of threading-related part of blis.h.
- added "-fopenmp -lpthread" to LDFLAGS of sandybridge make_defs.mk
file.

commit 3be2744cbe2c56d38c23fd818aa5c1f10cc7ea51
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 21 12:28:08 2014 -0600

Update to template gemm ukernel comments.

Details:
- Updated comments on alignment of a1 and b1 to match wiki.

commit 994429c6881b2ade92d9d7949bcaebfbf2cc65eb
Merge: 58796abd 694029d9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Nov 20 13:55:35 2014 -0600

Merge pull request 20 from TimmyLiu/master

define PASTEF773 required by cblas compatibility layer

commit 694029d9d7db857d642ab536955c0621791108c8
Author: Timmy <timmy.liuamd.com>
Date: Wed Nov 19 15:25:14 2014 -0600

define PASTEF773 required by cblas compatiility layer

commit 58796abda66b133346f8d523b39178afc336351f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Nov 6 14:31:52 2014 -0600

Removed KC constraint comments from _kernel.h files.

Details:
- Since 4674ca8c, the constraint that KC be a multiple of both MR and
NR have been relaxed, and thus it was time to remove the comments
from the top of the bli_kernel.h files of all configurations.

commit 7bbc95a54f706d43c7f7951f0e5995f86130cd52
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 29 10:52:23 2014 -0500

Added new piledriver micro-kernels.

Details:
- Added new micro-kernels for the AMD piledriver architecture (one
for each datatype).
- Updates and tweaks to piledriver configuration.
- Added 3xk packm micro-kernel support.
- Explicitly unrolled some of the smaller packm micro-kernels.
- Added notes to avx/sandybridge and piledriver micro-kernel files
acknowledging the influence of the corresponding kernel code in
OpenBLAS.

commit 59613f1d5500f6279963327db2fbc84bc9135183
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 23 17:21:37 2014 -0500

Added separeate micro-panel alignment for A and B.

Details:
- Changed the recently-added micro-panel alignment macros so that we now
have two sets--one for micro-panels of matrix A and one for micro-
panels of matrix B: BLIS_UPANEL_[AB]_ALIGN_SIZE_?.
- Store each set of alignment values into a separate blksz_t object in
bli_gemm_cntl_init().
- Adjusted packm_init() to use the separate alignment values.
- Added query routines for the new alignment values to bli_info.c.
- Modified test suite output accordingly.

commit a8e12884ee1fddd3fd77ca5a68aa0cb857f3af57
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 23 11:35:48 2014 -0500

CHANGELOG update (0.1.6)

0.1.6

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 23 11:35:45 2014 -0500

Version file update (0.1.6)

commit a3e6341bdb0e28411f935d6b4708a6389663e004
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 23 11:13:28 2014 -0500

Factored common code from blocksize functions.

Details:
- Split bli_determine_blocksize_[fb]() into two functions each, the
newer ones ending with the _sub suffix. These new sub-functions are
now called from bli_[gemm|trmm|trsm]_determine_kc_[fb](), which
eliminates redundant code and will allow any future tweaks to the
core sub-functions to automatically be inherited by the operation-
specific versions.

commit 4674ca8cffb58331ff7edf23bbe0e3f6a7558489
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 23 10:50:59 2014 -0500

Extended newly relaxed KC to hemm, symm.

Details:
- These changes were intended for the previous commit.
- Defined bli_gemm_determine_kc_[fb]() and bli_gemm_determine_kc_[fb](),
which determine blocksizes for gemm-based operations, taking special
care to "nudge" the kc dimension up to a multiple of MR or NR for
hemm and symm operations, as needed.
- Changed bli_gemm_blk_var3f.c to call bli_gemm_determine_kc_f().
instead of bli_determine_blocksize_f().
- Comment updates to bli_trmm_blocksize.c, bli_trsm_blocksize.c.

commit ab954ba6f874eaca7b001804491f866ef6b9b327
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 22 17:21:58 2014 -0500

Relaxed constraint that KC be multiple of MR, NR.

Details:
- Relaxed a long-held requirement in register blocksizes that required
the kernel programmer to choose a KC that was divisible by both MR
and NR. This was very constraining on some architectures that did not
use register blocksizes that were powers of two. The constraint is
now enforced only for trmm and trsm, where it is needed, and it is
now handled by "nudging" kc upward at runtime, if necessary, to be a
multiple of MR or NR, as needed.
- Defined bli_trmm_determine_kc_[fb]() and bli_trsm_determine_kc_[fb](),
which determine blocksizes for trmm and trsm, taking special care to
"nudge" the kc dimension up to a multiple of MR or NR, as needed.
- Changed bli_trmm_blk_var3[fb].c to call bli_trmm_determine_kc_[fb]()
instead of bli_determine_blocksize_[fb]().
- Added safeguard to bli_align_dim_to_mult() that returns the dimension
unmodified if the dimension multiple is zero (to avoid division by
zero).
- Removed cpp guard/check for KC % MR == 0 and KC % NR == 0 from
bli_kernel_macro_defs.h.
- Whitespace, variable name changes to bli_blocksize.c.
- Removed old commented code from bli_gemm_cntl.c.

commit 95cdae65d6b88e043ee14bcd53cd2e800d7aecb4
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Oct 22 16:30:16 2014 -0500

Fixed bug in KNC microkernel where k=0 and beta != 1

commit e64dba5633fc49b768b5edc7762f2b5d8a4d0588
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 20 19:23:06 2014 -0500

Re-implemented micro-panel alignment.

Details:
- This commit re-implements a feature that was removed in commit
c2b2ab62. It was removed because, at the time, I wasn't sure how the
micro-panel alignment feature would interact with the 4m method (when
applied at the micro-kernrel level), and so it seemed safer to disable
the feature entirely rather than allow possible breakage. This commit
revisits the issue and safely re-implements the feature in a way that
is compatible with 4m, 3m, 4mh, and 3mh (and native execution).
- Modified the static memory pool to account for micro-panel alignment
space.
- Modified packm_init and blocked variants to align whole micro-panels
by a datatype-specific alignment value that may be set by the
configuration. (If it is not set by the configuration, it will default
to BLIS_SIZEOF_?.)
- Modified macro-kernels so that:
- storage stride is handled properly given the new micro-panel
alignment behavior;
- indexing through 3m/4m/rih-type sub-panels, as is done by trmm and
trsm, is more robust (e.g. will work if the applicable packing
register blocksize is odd);
- imaginary strides are computed and stored within auxinfo_t structs,
which allows the virtual micro-kernels to more easily determine how
to index into the micro-panel operands.
- Modified virtual 3m and 4m micro-kernels to use the imaginary strides
within the auxinfo_t structs instead of panel strides.
- Deprecated the panel stride fields from the auxinfo_t structs.
- Updated test suite to print out the micro-panel alignment values.

commit add16b0e5402924301e7078e4ca5e3ef725bff0b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 17 11:49:24 2014 -0500

Added 3m4m test driver subdir of 'test'.

Details:
- Added a modified test driver for [cz]gemm that will test all 3m/4m
as well as assembly-based and OpenBLAS implementations of gemm
in single and multithreaded modes.

commit e171504a72406c61a173241d8bccf0a5ceb10582
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 17 11:25:59 2014 -0500

Use correct definition of bli_is_last_iter().

Details:
- As intended for previous commit, the new definition of
bli_is_last_iter() is now disabled in favor of the old
definition.

commit 0d954087b2b55d2f5f3c5e57d702b318ca2300f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 17 11:19:34 2014 -0500

Minor changes and fixes.

Details:
- Redefined bli_is_last_iter() to take thread_id and num_thread
arguments, which allows the macro to correctly compute whether a
given iteration is the last that the thread will compute in that
particular loop. The new definition, however, remains disabled
(commented out) until someone can look at this more closely, as
the new definition seems to actually hurt performance slightly.
- Whitespace and related updates to level-3 macro-kernels.
- Updated test suite so that performance results in the hundreds of
gigaflops does not disrupt the column alignment of the output.

commit d1e86e1876e433f54b501ec5a005b4ba7c5ce4e6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Oct 12 13:43:47 2014 -0500

More minor tweaks to sandybridge/avx micro-kernel.

Details:
- Re-enabled use of b_next for dgemm and cgemm micro-kernels.

commit 7b6fe4cae57cb22c09c1a97595e1a201a02cbcd2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Oct 12 12:01:51 2014 -0500

Minor tweaks to sandybridge/avx micro-kernels.

Details:
- Changed the MC blocksize for zgemm micro-kernel from 128 to 64.
- Removed usage of b_next in all x86_64/avx gemm micro-kernels.

commit a6a156e9feec47154e7a0fd43bcc006b1fc04aba
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 10 14:26:41 2014 -0500

Added cgemm ukernel for avx/sandybridge.

Details:
- Implemented AVX-based cgemm micro-kernel (via GNU extended inline
assembly syntax).
- Updated sandybridge configuration accordingly.

commit 6f8575ab2580e167a022293b76ddf0514f71b613
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 10 10:01:45 2014 -0500

Added zgemm ukernel for avx/sandybridge.

Details:
- Implemented AVX-based zgemm micro-kernel (via GNU extended inline
assembly syntax).
- Updated sandybridge configuration accordingly.

commit 23ce7ee542a12ca40b4b6090ad2558d180e16d37
Merge: 99fd9a39 7a8ad47f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 9 16:41:22 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit 99fd9a39718cb7281f6fb23f9fef7cca4fe514f4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 9 16:38:04 2014 -0500

Fixed two minor bugs.

Details:
- Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
modules whereby the uplo bits of some packed matrix objects were not
being set properly, resulting in false FAILURE results for those
tests. Thanks to Tyler Smith for bringing this issue to my attention.
- Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
"not yet implemented" abort() when creating a 1x1 object with non-unit
strides.

commit 7a8ad47fb2d100a9da93aa8cab774fcceeaab733
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Oct 8 15:52:13 2014 -0500

Minor changes to knc configuration, including preference row major storage
Also fixed a bug in the knc micro-kernel where it would fail if k == 0

commit 76b7c34af0c09f47d9615b18857a356acddc788a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 2 14:15:38 2014 -0500

Fixed a bug in the pack schema-related bit macros.

Details:
- Expanded the BLIS_PACK_SCHEMA_BITS value in bli_type_defs.h to
include all six bits presently used in the pack schema bitfield of
the info field of obj_t structs. Prior to this commit, the macro
constant only included the lowest five bits, which excluded the
"is or is not packed" bit. This manifested as a strange bug in
probably many level-2 codes that invoked packing, though we only
observed it in ger before fixing. Thanks to Devin Matthews for
finding and reporting this bug.

commit a5763e332226598d70c47dfa9cad4578e15ef5f4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 2 13:28:17 2014 -0500

Added extra output to bli_obj_print().

Details:
- Print extra values from info field of obj_t struct within
bli_obj_print().

commit 9bba209fc44fbfce943ba6a51cd8278a0cb6b159
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Sep 29 14:56:36 2014 -0500

Fixed bug when packing anywhere besides in blk_var_1 for gemm.

commit 614a4afc9272adb47e5a8b83b39d56c2804d95d6
Merge: b541b667 4a7df04e
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Sep 26 10:49:57 2014 -0500

Merge branch 'master' of http://github.com/flame/blis

commit 4a7df04e8a4ffdb9561d26426afd35e4fe15b013
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 22 16:06:15 2014 -0500

Added 30xk support for packm ukernels.

Details:
- Updated bli_kernel_*_macro_defs.h headers to include default
definitions for 30xk packm kernels.
- Extended function pointer arrays in bli_packm_cxk_*() out to 31 and
included 30xk kernels.
- Addex 30xk kernels to frame/1m/packm/ukernels/bli_packm_ref_cxk_*.c.

commit b6d4bd792e0d44ce4b28afef343f5ff3ba89c285
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 22 16:02:37 2014 -0500

Fixed missing tabs from Makefile patch.

commit 32630f9b6f0d5ba28d5b56dae4c7288a37158743
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 19 17:18:20 2014 -0500

Comment update to virtual micro-kernels.

commit 13447cffead7c6d137a7a3ccbf9e552ed0477467
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 19 13:00:48 2014 -0500

Minor bugfix to top-level Makefile.

Details:
- Applied a patch that allows the top-level Makefile to work on certain
systems. The patch simply separates out the source-to-object code
generation rules for .c and .S files into two separate rules. Thanks
to Devin Matthews for submitting this patch.

commit e80a4537846416719c067ae08a53aeda978c572d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 18 10:24:20 2014 -0500

Fixed bug introduced by bugfix in 25b258d.

Details:
- We actually need to check alignment of lda*sizeof(double) and NOT
a+lda because in the latter case, alignment could cancel out and
still allow the optimized code to run when it shouldn't. Thanks
to Devin for pointing this out.

commit 25b258d61f9c8cee64e922f4131784b6edb196dd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 18 10:10:49 2014 -0500

Fixed a non-fatal problem with bugfix in a68b316c.

Details:
- The bugfix in a68b316c was inadvertantly checkin alignment of the
leading dimension itself, rather than the byte size of the leading
dimension. Now, we simply check alignment of a+lda.

commit 96302d4fc81363410e41c3a3c43a65df44d97ad9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 18 09:43:40 2014 -0500

Renamed bli_info_get_*_ukr_type() functions.

Details:
- Added _string() suffix to bli_info_get_*_ukr_type() function names.
This makes them consistent with the bli_info_get_*_impl_string()
functions.

commit a68b316ca4852509f84ed50e01afac486bf70f58
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Sep 17 11:10:07 2014 -0500

Fixed alignment bugs in level-1f kernels.

Details:
- Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
were attempting to compute problems with unaligned leading dimensions
with optimized code, rather than (correctly) using the reference
implementations. Thanks to Devin Matthews for reporting this bug.

commit 870761eb902e4866090d1d3446a345df3d6d4599
Merge: e9899be0 a2b59a37
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 16 18:20:49 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit e9899be09044829e23386bd73e394f1dd7778210
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 16 18:19:32 2014 -0500

Added high-level implementations of 4m, 3m.

Details:
- Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at
high levels, respectively. APIs for trmm and trsm were NOT added due
to the fact that these approaches are inherently incompatible with
implementing 4m or 3m at high levels (because the input right-hand
side matrix is overwritten).
- Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and
3m so that all are stylistically consistent.
- Added new "rih" packing kernels (both low-level and structure-aware)
to support both 4mh and 3mh.
- Defined new pack_t schemas to support real-only, imaginary-only, and
real+imaginary packing formats.
- Added various level0 scalar macros to support the rih packm kernels.
- Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh.
- Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted
level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in
that order) and execute the first one that is enabled, or the native
implementation if none are enabled.
- Added implementation query functions for each level-3 operation so
that the user can query a string that describes the implementation
that is currently enabled.
- Updated test suite to output implementation types for reach level-3
operation, as well as micro-kernel types for each of the five micro-
kernels.
- Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX.
- Fixed an obscure bug when packing Hermitian matrices (regular packing
type) whereby the diagonal elements of the packed micro-panels could
get tainted if the source matrix's imaginary diagonal part contained
garbage.

commit a2b59a37f166f70a6dd5793db2530823ef590c2b
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Sep 15 10:44:44 2014 -0500

Fixed make defs so that they actually compile for bulldozer

commit 86fc7e40764f78ec217f50216ef4fa5b57dbfbc7
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Sep 15 10:35:46 2014 -0500

Added bulldozer configuration and updated piledriver micro-kernel

commit 0644e61a79a57f136be5f4c47b9099cff2af06e0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 11 12:55:34 2014 -0500

Minor updates to bli_packm_init.c.

commit 9dc9b44a057a08e20ad4d423344f0ecad54c1eb2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 11 12:03:28 2014 -0500

Renamed bli_obj_pack_status() to _pack_schema().

Details:
- Renamed the bli_obj_pack_status() macro to bli_obj_pack_schema() in
order to help avoid confusion as to what the macro returns.

commit cf5efdde0588a0d5b6ea57fe7d7be5000be06f8e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 11 11:47:56 2014 -0500

Pass pack_t schemas into ukernels via auxinfo_t.

Details:
- Modified macro-kernels to pass the pack_t schema values for matrices
A and B into the datatype-specific functions, where they are now
inserted into a newly-expanded auxinfo_t struct. This gives gives the
micro-kernels access to the pack_t schema values embedded in the
control trees, which determine the precise format into which the
matrix elements are packed.
- Updated a call to bli_packm_init_pack() in src/test_libblis.c to
remove densify argument. Meant to include this in commit c472993b.

commit cc8d2b82775cca3c2d51bf427f4e77c8024a6d15
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 9 13:48:22 2014 -0500

Updated old test drivers in 'test'.

commit c472993bbccb69e9ffc409c79b742426c8ad2ad4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 9 13:42:04 2014 -0500

Removed densify argument to packm_cntl_obj_create().

Details:
- Removed the "densify" bool_t argument to bli_packm_cntl_obj_create().
This argument was inserted very early in BLIS's development, when it
was anticipated that the developer may sometimes wish to pack a
Hermitian, symmetric, or triangular matrix without making it dense.
But as it turns out, if we are packing a matrix, we always want to
make it dense in some way or another due to the fact that the micro-
kernel only multiplies dense micro-panels. Thus, unless/until there
is a real need for the feature, it seems reasonable to remove it from
the packm_cntl API.

commit 5c43ee387146cd76dc59b730dac6683a8446b834
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 8 15:19:29 2014 -0500

Moved trmm4m/3m_cntl files to 'old' directory.

Details:
- Meant to include this in previous commit.

commit 7b2f469d5465ed73b1ca88124bc9a1987388aa27
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 8 14:49:50 2014 -0500

Retired trmm_t control tree definitions, usage.

Details:
- Replaced all trmm_t control tree instances and usage with that of
gemm_t. This change is similar to the recent retirement of the herk_t
control tree.
- Tweaked packm blocked variants so that the triangular code does NOT
assume that k is a multiple of MR (when A is triangular) or NR (when
B is triangular). This means that bottom-right micro-panels packed for
trmm will have different zero-padding when k is not already a multiple
of the relevant register blocksize. While this creates a seemingly
arbitrary and unnecessary distinction between trmm and trsm packing,
it actually allows trmm to be handled with one control tree, instead
of one for left and one for right side cases. Furthermore, since only
one tree is required, it can now be handled by the gemm tree, and thus
the trmm control tree definitions can be disposed of entirely.
- Tweaked trmm macro-kernels so that they do NOT inflate k up to a
multiple of MR (when A is triangular) or NR (when B is triangular).
- Misc. tweaks and cleanups to bli_packm_struc_cxk_4m.c and _3m.c, some
of which are to facilitate above-mentioned changes whereby k is no
longer required to be a multiple of register blocksize when packing
triangular micro-panels.
- Adjusted trmm3 according to above changes.
- Retired trmm_t control tree creation/initialization functions.

commit 576e9e9255a79dba9cd3c804267f51e0b4aa6e8a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Sep 7 16:12:52 2014 -0500

Retired herk_t control tree definitions, usage.

Details:
- Replaced all herk_t control tree instances and usage with that of
gemm_t, since the two types presently have the same fields. This means
that herk, her2k, syrk, and syr2k can simply use the gemm control tree
as-is, just as hemm and symm have been doing for some time now.
- Retired herk_t control tree creation/initialization functions.
- Retired many _target.c and .h files into 'old' directories.

commit b2fed052c9a23d858ef0afbe220b342bce9aa7f7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Sep 3 17:07:25 2014 -0500

Minor code cleanup to bli_packm_struc_cxk*.c

Details:
- Realized that we don't need to track rs_p11 and cs_p11 for
Hermitian/symmetric case of bli_packm_struc_cxk*(). They are always
equal to rs_p and cs_p.

commit 023ce770966b3b5a98bba729c5af1f45e15ebb97
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Sep 3 10:47:53 2014 -0500

Minor update to packm_cxk kernels.

Details:
- Changed m and n dimension parameter names to panel_dim and panel_len,
respectively, in packm_cxk, packm_cxk_3m, packm_cxk_4m kernel wrapper
functions. This makes the code a little easier to read since "m" and
"n" have connotations that are not applicable here.
- Comment updates.

commit 189def3667d9218adbeec45e2801fd074341a679
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 1 16:23:17 2014 -0500

Retired portions of bli_kernel_3m/4m_macro_defs.h.

Details:
- Removed sections of bli_kernel_[4m|3m]_macro_defs.h that defined
4m/3m-specific blocksizes after realizing that this can be done in
bli_gemm[4m|3m]_cntl.c, since that is (mostly) the only place they
are used.
- The maximum cache values for 4m/3m are stll needed when computing mem
pool dimensions in bli_mem_pool_macro_defs.h. As a workaround, "local"
definitions in terms of the regular cache blocksizes are now in place.
- Similarly, the register blocksizes for 4m/3m are still needed in
bli_kernel_post_macro_defs.h. As a workaround, "local" definitions in
terms of the regular register blocksizes are now in place.

commit af521ee6f2a77d61c98b833e85c09969987bc00d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 1 14:06:46 2014 -0500

Changed semantics of blocksize extensions.

Details:
- Changed semantics of cache and register blocksize extensions so that
the extended values are tracked, rather than just the marginal
extensions.
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
that these "max" query routines grab the maximum value for cache
blocksizes and the packdim value for register blocksizes.
- bli_info_*() API has been updated accordingly.
- All configurations have been updated accordingly.

commit 07f23aefd52f5ba4960dbd46e59b180a2136b8e9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 31 11:58:50 2014 -0500

Pass pack schema into packm_struc_cxk*().

Details:
- Changed the interface to the packm_struc_cxk*() kernels to include
the pack_t schema. This allows the implementation to more easily
determine how the micro-panel is stored (row-stored column panel
or column-stored row panel).
- Updated packm blocked variants to pass in the schema.
- Updated packm_ker_t function pointer definition accordingly.

commit f032ba9b1186cb02184574d339565f53d733aa42
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 30 16:21:20 2014 -0500

Reorganized packm implementation.

Details:
- Reorganized packm variants and structure-aware kernels so that all
routines for a given pack format (4m, 3m, regular) reside in a single
file.
- Renamed _blk_var4 to _blk_var2 and generalized so that it will work
for
both 4m and 3m, and adjusted 4m/3m _cntl_init() functions accordingly.
- Added a new packm_ker_t function pointer type to
bli_kernel_type_defs.h
to facilitate function pointer typecasting in the datatype-specific
packm_blk_var2() functions.
- Deprecated _blk_var3.
- Fixed a bug in the triangular micro-panel packing facility that
affected trmm and trmm3 with unit diagonals.

commit c6793cecb70788bdf2c76ab8102504ea97be9d2a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 28 17:14:48 2014 -0500

Reorganized includes for scalar macro headers.

Details:
- Reordered the include statements in bli_scalar_macro_defs.h so that
conventional, ri-, and ri3-based macros are grouped together.
- Renamed bli_eqri.h (and macros within) to end with 'ris' suffix.

commit b4da8907284345be4374f87a88679c4886ab866e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 28 14:10:32 2014 -0500

Whitespace, comments updates on packm_blk_var?.c.

commit 46e46a1d83da586c3dd9fd7a01eb16067abbaee1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 28 12:05:45 2014 -0500

Minor updates to packm blocked, cxk_3m/4m code.

Details:
- Added 'const' qualifier to inlined packing code that handles
micro-panel packing that is too large for an existing packm ukernel.
- Comment updates.

commit 908dc688b5979995eaacb3aa937f241551a8df00
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 28 11:55:12 2014 -0500

Pass pack schema into blocked packm routines.

Details:
- Rather than passing the packm blocked routines a boolean value that
represents whether the matrix is being packed to row or column storage,
we now pass in the pack schema itself.

commit a0ff6066e06075ab5f92b19247b39b92ed15f1bf
Merge: c4c99c48 d40b32bc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 24 15:56:21 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit c4c99c4813bf9817592a7899c5d33412fe22313f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 24 15:52:22 2014 -0500

Renamed packm scalar from beta to kappa.

Details:
- The packm implementation (i.e. sources files in frame/1m/packm and
frame/1m/packm/ukernels), interchangeably used the names "beta" and
"kappa" to refer to the optional scalar to be applied during packing.
This commit renames all uses of "beta" to be "kappa", since "beta"
sometimes evokes the scalar specifically on the output matrix of a
level-2 or level-3 operation.

commit d40b32bc24ffbae24123e054307b3138969bb095
Merge: 9331f794 6c25c379
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 24 13:46:36 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit 6c25c379fadb50834146e1614f7b80c093c2aad0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 24 13:44:10 2014 -0500

Consolidated unpackm ukernels into single file.

Details:
- Reorganized unpackm ukernels into a single file,
bli_unpackm_ref_cxk.c, in a manner similar to what was done for packm
ukernels in commit 4cc2b46.

commit 9331f79443223fe267676ee54c439e1ed320380c
Merge: 7fc48a7d 670b6392
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 24 10:54:21 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit 670b63926a7f4fc694abc5b1582ef8a4f367f5a8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 24 10:46:27 2014 -0500

Added whitespace to bli_obj_scalar_ routine calls.

Details:
- Added extra spaces to align arguments of
bli_obj_scalar_init_detached_copy_of(). This misalignment was due to
the fact that the function was previously named
bli_obj_init_scalar_copy_of() and the name change, performed in
b444489f, was done via recursive sed commands which left subsequent
lines untouched.

commit 7fc48a7d920e07fd8e9528ab2565123f8f4e67f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 23 16:50:58 2014 -0500

Combined 4m/3m bits into an expanded bitfield.

Details:
- Combined the 4m/3m bits into an expanded bitfield, which will encode
the packing "format" of the micro-panels. This will allow for more
easily and compactly encoding additional formats.
- Other minor comment/whitespace updates to bli_type_defs.h.
- Updated bli_obj_macro_defs.h and bli_param_macro_defs.h to use the new
format bitfield.
- Comment update to bli_kernel_post_macro_defs.h.
- Whitespace changes to bli_kernel_3m_macro_defs.h, _4m_macro_defs.h.

commit ef0143cc1417e4815e4cafd5a464cc83fe7a1e86
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 23 14:02:27 2014 -0500

Renamed _ri, _ri3 packm ukernels to _4m, _3m.

Details:
- Renamed packm ukernels, _cxk dispatcher, and structure-aware _cxk
helper functions to use _4m and _3m instead of _ri and _ri3 suffixes.
- Updated names of cpp macros that correspond to packm ukernels.

commit b0ccac116158b5ed3316d34798748ba0c6d78672
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 21 19:21:52 2014 -0500

Cleaned up front-end layering for 4m/3m.

Details:
- Added an extra layer to level-3 front-ends (examples: bli_gemm_entry()
and bli_gemm4m_entry()) to hide the control trees from the code that
decides whether to execute native or 4m-based implementations. The
layering was also applied to 3m.
- Branch to 4m code based on the return value of bli_4m_is_enabled(),
rather than the cpp macros BLIS_ENABLE_?COMPLEX_VIA_4M. This lays
the groundwork for users to be able to change at runtime which
implementation is called by the main front-ends (e.g. bli_gemm()).
- Retired some experimental gemm code that hadn't been touched in
months.

commit bedec95451cabfa7a8906b51018a5e0572998a5e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 21 18:25:48 2014 -0500

Added bli_4m API for querying 4m enabled state.

Details:
- Added bli_4m.c (and header), which defines a simple API that can be
used to query, enable, and disable 4m-based complex support in BLIS.
The macros BLIS_ENABLE_?COMPLEX_VIA_4M are now used to initialize
the variable that determines the state (enabled or disabled).
- Changed bli_info*() API so that all cache and register blocksize-
related query routines return the blksz_t objects' values as they
exist at runtime, rather than return the values as determined by the
configuration system (e.g. bli_kernel.h, or defaults for those values
not specified). This sets the foundation for being able to change
those blocksizes at runtime.

commit b541b667cabfa6d41b50ad1e49209651ee6812cc
Merge: 699a8151 dd61307f
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Aug 20 14:44:51 2014 -0500

Merge branch 'master' of http://github.com/flame/blis

Conflicts:
frame/3/trsm/bli_trsm_blk_var2b.c
frame/3/trsm/bli_trsm_blk_var2f.c

commit 699a8151ca3d5021e834a1784ef45dcc3a3d17cd
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Aug 20 14:43:17 2014 -0500

Some improvements to trsm parallelism

commit dd61307f55bb6bc762fe0ef0446479d6c0536723
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 20 09:52:16 2014 -0500

Minor update to sandybridge MC_S, KC_S.

Details:
- Changed sandybridge MC and KC for single-precision real to 128 and 384,
respectively.
- Updated comments in template configuration's gemm micro-kernel file
to document the new "contiguous row preference" macro.

commit d0eec4bddd740ce360d0f655362c551287cf925b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 19 15:49:19 2014 -0500

Added optional row preference to ukernel config.

Details:
- Added the ability for the kernel developer to indicate the gemm micro-
kernel as having a preference for accessing the micro-tile of C via
contiguous rows (as opposed to contiguous columns). This property may
be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS,
which may be defined or left undefined. Leaving it undefined leads to
the default assumption of column preference.
- Changed conditionals in frame/3/*/*_front.c that induce transposition
of the operation so that the transposition is induced only if there
is disagreement between the storage of C and the preference of the
micro-kernel. Previously, the only conditional that needed to be met
was that C was row-stored, which is to say that we assumed the micro-
kernel preferred column-contiguous access on C.
- Added a "prefers_contig_rows" property to func_t objects, and updated
calls to bli_func_obj_create() in _cntl.c files in order to support
the above changes.
- Removed the row-storage optimization from bli_trsm_front.c because
it is actually ineffective. This is because the right-side case of
trsm flips the A and B micro-panel operands (since BLIS only requires
left-side gemmtrsm/trsm kernels), meaning any transposition done
at the high level is then undone at the low level.
- Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant
invocation of the bli_obj_swap() macro.

commit 4cc2b464f29cafbfef9295b073b857fe0752f710
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 15 11:49:15 2014 -0500

Reorganized packm ukernels.

Details:
- Previously, packm micro-kernels were organized by the implied register
blocksize (panel dimension) assumed by the kernel, meaning conventional,
ri, and ri3 variations of some micro-kernel size were housed in the same
file. This commit reorganizes the micro-kernels so that all sizes reside
in the same file for each format type (conventional, ri, and ri3).

commit fcc10054a11b6fc3976986f57feccf741596cbf6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 13 12:32:06 2014 -0500

Tweaks to gemm4m, gemm3m virtual ukernels.

Details:
- Fixed a potential, but as-yet unobserved bug in gemm3m that would
allow undesirable inf/NaN propogation, since C was being scaled by
beta even if it was equal to zero.
- In gemm3m micro-kernel, we now avoid copying C to the temporary
micro-tile if beta is zero.
- Rearranged computation in gemm4m so that the temporary C micro-tile
is accessed less, and C is accessed only after the micro-kernel
calls. This improves performance marginally in most situations.
- Comment updates to both gemm4m and gemm3m micro-kernels.

commit cdcbacc2fa871317c8e7ef961ecc6d70ab22dc34
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 12 12:45:38 2014 -0500

Removed redundant redef of packm ukr prototypes.

Details:
- Removed redundant macro code that redefined packm ukernel prototypes
when the previous macro was already sufficient. This helps de-clutter
the packm ukernel prototyping headers a little bit.

commit 82dac98d9032ccb598068a55ddf23d7898491e9e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 12 12:36:25 2014 -0500

Relocated packm ukernel includes.

Details:
- Consolidated the include statements for packm ukernel headers from
bli_packm_cxk.h, bli_packm_cxk_ri.h, and bli_packm_cxk_ri3.h to
bli_packm.h.
- Comment/whitespace updates to bli_packm_blk_var3.c, _var4.c.

commit 7f77856e25aad5fc6f172ed3e57b6351804e31a4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 12 12:20:15 2014 -0500

Removed unused 4m/3m-related packm macro defs.

Details:
- Removed unused and unneeded s- and d-flavored macro definitions for
packm ukernels related to the complex 4m and 3m methods, as
implemented in BLIS.

commit bc1d86b2d4d436b1dfba2d0098501aaca9cbb8b5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 7 19:01:20 2014 -0500

Sandy Bridge configuration, micro-kernel update.

Details:
- Minor updates to bli_config and bli_kernel.h for sandybridge
configuration.
- Renamed existing AVX intrinsic-based micro-kernel file to
bli_gemm_int_d8x4.c.
- Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
gemm micro-kernels for single- and double-precision real.

commit 98ec95877a95242e159b2bf0c879115a59e4c6e2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 7 18:28:32 2014 -0500

Corrected comment for _obj_is_[row|col]_stored().

Details:
- Fixed a mistake in the comments introduced in the previous commit for
bli_obj_is_row_stored() and bli_obj_is_col_stored().

commit 43d5e419e1b424d2143817103dbee8ead797e8aa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 7 18:20:40 2014 -0500

Reverted _obj_is_[row|col]_stored() macros.

Details:
- Rolled back recent changes to bli_obj_is_row_stored() and
bli_obj_is_col_stored() so that those macros now only inspect the
strides (row or column). It turns out that the more sophisticated
definitions introduced in a51e32e are not necessary, because these
"obj" macros are virtually never used on packed matrices, and when
they are, they can use bli_obj_is_[row|col}_packed() macros, which
inspect the info bitfield.

commit 45692e3ad4b7e1d05ac4302398df4efce04b4284
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 7 13:21:15 2014 -0500

Reverted some accidental changes.

Details:
- Reverted some changes that were unintentionally included in the
previous commit (9526ce98). Thanks to Tony Kelman for pointing
this out. (Note: a few select changes were not reverted.)

commit 9526ce98812be908bc4915f2849b657fb6ce1b49
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 6 14:13:46 2014 -0500

Updated copyright headers of emscripten configuration files.

commit 30833ed71d56f231ddba21e632bcbbc90b12a97c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 6 12:12:03 2014 -0500

Minor edits to configurations' make_defs.mk files.

Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
is defined first and then the other two are defined in terms of
CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
little easier to read.

commit 9d61afeae2ba70fe1df07e7546f6954ea83aed12
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 4 16:01:59 2014 -0500

CHANGELOG update (0.1.5)

0.1.5

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 4 16:01:58 2014 -0500

Version file update (0.1.5)

commit 4c6ceea4be35d089630986eb5b959b9e97214077
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 4 15:49:59 2014 -0500

Added CBLAS compatibility layer.

Details:
- Added a new section in bli_config.h files of all configurations for
enabling CBLAS support. (Currently, the default is for the CBLAS layer
to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
subdirectory 'f77_sub' holds subroutine wrappers corresponding to
subroutines found in CBLAS that allow calling some BLAS routines with
the return value passed as the last argument rather than as an actual
(function) return value. This was probably intended to allow CBLAS to
avoid the whole f2c debacle altogether. However, since BLIS does not
assume the presence of a Fortran compiler, we had to provide similar
routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.

commit caab62dac0fb0bd0d674118f409c81680db94d29
Merge: 383631b5 db97ce97
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 3 14:36:18 2014 -0500

Merge pull request 19 from kevinoid/fix-install-perms-error

Fix permissions error installing to non-owned directory

commit db97ce979b88c051922c2f946ce52d523c7a12c6
Author: Kevin Locke <kevinkevinlocke.name>
Date: Sun Aug 3 12:48:04 2014 -0600

Fix permissions error installing to non-owned directory

When installing to a directory which is not owned by the installing
user, even when the user has write permission for the directory, the
installation can fail with an error similar to the following:

Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
install: cannot change permissions of ‘/usr/local/lib’: Operation not permitted
Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1

In the example case, the error occurred because the user attempted to
install to /usr/local and /usr/local/lib is owned by root with mode 2755
which the Makefile unsuccessfully attempted to change to 0755.

Given that installing to /usr/local is likely to be quite common and the
ownership/permissions are the default for Debian and Debian-derived
Linux distributions (perhaps others as well), this commit attempts to
support that use case by using mkdir rather than install to create the
directory (which is the same approach as Automake).

Signed-off-by: Kevin Locke <kevinkevinlocke.name>

commit 383631b514c3d42b724640f57644eea276cc418c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 31 14:51:48 2014 -0500

Redefined bit field macros with bitshift operator.

Details:
- Redefined many of the macros that define bit fields and bit values in
the obj_t info field using the bitshift operator (<<). This makes it
easier to reorder bit fields, or expand existing bit fields, or add
new fields. The bitshifting should be evaluated by the compiler at
compile-time.

commit 137143345dc93cc9a83da5ba88b25bac7502de86
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 31 12:12:45 2014 -0500

Reimplemented unit blocksize fix in prev commit.

Details:
- Instead of inferring the storage format of the micro-panels from within
the packm variants, we now pass in a bool_t value that denotes whether
the packed matrix contains row-stored column panels or column-stored
row panels. This value can then be tested more easily inside the main
packm variant loop.
- Renumbered pack_t schema values in bli_type_defs.h so that there are
now five bits, each with different meaning:
- 4: packed or not packed?
- 3: packed for 3m?
- 2: packed for 4m?
- 1: packed to panels?
- 0: stored by rows or columns?
- Added new macros that test for status of above bits in schema bit
subfield, and renamed some existing macros related to 4m/3m.

commit a51e32ec061941cd10119ea80115c82a40b1673f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 30 10:41:48 2014 -0500

Fixed unit register blocksize brokenness.

Details:
- Fixed a breakdown in BLIS's ability to differentiate between row-stored
and column-stored micro-panels when MR or NR is unit. When either
register blocksize (or both) is equal to one, inspecting the strides of
the affected packed micro-panel is no longer sufficient to determine
whether the micro-panel is a row-stored column panel or a column-stored
row panel (because both strides are unit). At that point, dimension
information is necessary when invoking the bli_is_row_stored_f() and
bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
Ilya Polkovnichenko for reporting this bug.
- Added panel dimensions (m and n) to obj_t, which are set in
packm_init() and then passed into the blocked variants to support the
aforementioned update.

commit c2732272f0ac680a0ad19fa9db5d587398a1479a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 29 16:37:18 2014 -0500

Removed old/unused packm variants.

commit b97fa9a5a70fe0123e5eebd999b947461d38445f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 27 18:54:09 2014 -0500

Minor usage update to build/bump-version.sh.

commit b18ba5f62d98629cdd519ff4c96fc67ec1a62fb9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 27 18:52:05 2014 -0500

Added missing 'bla_' prefix to r_imag(), d_imag().

Details:
- Added "bla_" to f2c functions r_imag() and d_imag(). Thanks to Murtaza
Ali for pointing the mis-named functions.

commit af7a8e6c042cade452130a6729377f1a3ef4e19e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 27 18:20:13 2014 -0500

CHANGELOG update (0.1.4)

0.1.4

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 27 18:20:12 2014 -0500

Version file update (0.1.4)

commit acff74041bf02c7b9fdfa24b507bca782a4c5fce
Merge: cdb9413e 47b243ef
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Jul 23 15:07:30 2014 -0500

Merge branch 'master' of https://github.com/flame/blis

commit cdb9413e140f8a198666250ec88fa34b5425a9c3
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Jul 23 15:05:15 2014 -0500

Enabled threading for a couple more loops in TRSM

JC loop is now enabled for the left-sided case
IC loop is now enabled for the right-sided case

commit 47b243ef08f4101de3d936f2373343e67eaa4dd5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 23 13:41:13 2014 -0500

Call setid for early return from herk/her2k.

Details:
- Added setid call (to zero imaginary parts of diagonal elements) to
early return branches of herk_front() and her2k_front() for cases
where alpha is zero. Thanks to Murtaza Ali for suggesting this fix.
- Comment update.

commit 3e7b0db5b0e24f5fd66c60bacabc019885ddbec5
Merge: 2f8a357d ed3e33d5
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Jul 23 13:40:44 2014 -0500

Merge branch 'master' of https://github.com/flame/blis

commit 2f8a357de5fb55163a969d888cf059f24b78125c
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Jul 23 13:40:12 2014 -0500

Some TRSM threading fixes/additions

commit ed3e33d548047be3283ff41268fdf716563bc542
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 22 14:40:43 2014 -0500

Tweaked behavior of herk, her2k for BLAS compat.

Details:
- Updated herk_front() and her2k_front() to explicitly set the imaginary
components of the diagonal entries of C to zero after the computation
is complete. This is needed in case downstream applications read the
full diagonal entries (i.e., including imaginary part), which could, in
the absence of this modification, accumulate numerical error from
subsequent rank-k/rank-2k updates.
- Updated BLAS compatibility wrappers for herk and her2k to return early
if:
n == 0 || ( ( alpha == 0 || k == 0 ) && beta == 1 )
This also results in the imaginary components of diagonal entries NOT
being set to zero (see above), which is consistent with BLAS.
- Updated mkherm to use setid instead of an inlined loop over the
diagonal.

commit ea59a5c93cde1467a3715abc53dda4aecf961873
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 22 14:36:02 2014 -0500

Added new level-1d operation: setid.

Details:
- Defined a new level-1d operation, setid, which sets the imaginary
elements of an object's diagonal to a single scalar. This can be
useful, for example, when trying to make the diagonal of a Hermitian
matrix real-valued.

commit 8965a965931318619ceaebd7c32edccf3022d0c7
Merge: 1785efb5 5b73e80b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 22 14:34:32 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit 1785efb5420bc7b9c850a068cb5d99837071e877
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 22 14:33:01 2014 -0500

Minor improvements to invertd and setd.

Details:
- Added missing call to invertd_check() from front-end.
- Changed setd front-end call of scald_check() to setd_check().

commit 5b73e80b71c054c1945a06aff044ef629bc1a9a0
Merge: a41e68e0 20690fe3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 18 12:21:20 2014 -0500

Merge pull request 16 from Maratyszcza/emscripten

Emscripten port

commit a41e68e09e73b999fab0bb430a43dccfc63aab45
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 17 13:25:56 2014 -0500

Reimplemented BLIS initialization/finalization.

Details:
- Rewrote bli_init() and bli_finalize() with OpenMP critical sections
for thread-safety. Also added lots of explanatory comments.
- Renamed bli_init_safe() and bli_finalize_safe() with the _auto()
suffix, and reimplemented for simplicity. Updated all invocations
in BLAS compatibility layer to use _auto() suffix.

commit 36358948ea75074bda32a9f8c008f835b87d21db
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 17 10:58:10 2014 -0500

Retired frame/3/gemm/other directory.

Details:
- Removed frame/3/gemm/other directory, which contained some outdated
and/or experimental variants.

commit c73261f17edf589e76bdbe297702a1fbbd69275f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 14 16:23:51 2014 -0500

More minor cleanups post-copyright update.

commit 2a09d24463d358be6243b24f112fad057c2aefe0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 14 16:17:09 2014 -0500

Reverted power7 symlinks destroyed by sed script.

Details:
- Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
after recursive-sed.sh mistakenly replaced them with copies of the
actual files to which they referred. Meant to include this in previous
commit.

commit 7ed415824d3b2e78541b6f64e404ca5347c06d3d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 14 16:14:33 2014 -0500

Updated copyright headers (continued).

Details:
- Inserted "at Austin" into third clause of license declarations.
Meant to include this change in previous commit.

commit 5c2c6c85616834ff2716ece083118201d9df6dde
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 14 16:05:03 2014 -0500

Updated copyright headers to contain "at Austin".

Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).

commit fcec68cda3f6e90ae055e7304e6674c1c5c8d010
Merge: 94c0df79 4a20ed1a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 14 11:35:34 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit 94c0df797eda377931f29a41ba6a89c0ed58daca
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 14 11:24:36 2014 -0500

Changed order of zero dim / error checking.

Details:
- Updated level-2 and level-3 internal back-ends so that the operation's
_check() function is called BEFORE any attempt to return early due to
the presence of zero dimensions. This ordering makes more sense because
(for example) object dimensions should match even if one of them is
zero. Previously, a dimension mismatch could result in an early return
with no error message.
- Updated bli_check_object_buffer() so that NULL buffers result in an
error only if the object is dimensionally non-empty (i.e., only if both
of the object's dimensions are non-zero). This allows BLIS operations
to be performed on dimensionally empty objects (i.e., where at least one
dimension is zero).
- Updated the error message associated with bli_check_object_buffer()
to mention the newly relaxed constraint mentioned above, vis-a-vis
non-zero dimensions.

commit 20690fe3018ce17c8df61ce0bffecaa7911dc3a5
Author: Marat Dukhan <maratekgmail.com>
Date: Sun Jul 13 22:50:56 2014 -0700

Emscripten port

commit 4a20ed1a3f5e9e5232df30aa0e568e6c00c56ce1
Merge: 6a515e98 8ccdfaef
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 13 17:45:01 2014 -0500

Merge pull request 14 from Maratyszcza/master

Support "make test" for PNaCl configuration

commit 6a515e988f2ae1628258a6dec2c0e9cf2d04790f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 13 17:38:33 2014 -0500

Implemented dsdot() and sdsdot() in compat layer.

Details:
- Replaced "not yet implemented" error messages in dsdot() and sdsdot()
with actual implementations. (These routines are so rarely used that
this log message will probably lead to some people learning of their
existence for the first time.)

commit 255668ddd1004552c6cc65035ec6486671ce99bb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 13 17:30:44 2014 -0500

Inserted gemv beta-scaling bug into compat layer.

Details:
- BLAS has a peculiar bug (or feature) whereby calling gemv on a vector
y of non-zero length and a vector x of zero length results in no action.
Given that the operation is y := beta*y + A*x, many (most?) individuals
would expect vector y to still be scaled by beta. BLIS, when called
natively, handles these cases intuitively (with beta scaling).
Unfortunately, many BLAS test suites actually check for the way this
situation is handled. Therefore, we have decided to implement this "bug"
in the compatibility layer so as to provide "bug-for-bug" compatibility
with BLAS.

commit 570a154581bdb353fa13a219c7cb3c81d3dceffd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 12 17:51:05 2014 -0500

Comment/formatting updates to build scripts.

Details:
- Minor updates to comments and formatting in bump-version.sh and
update-version-file.sh scripts.

commit 26cd81990631ff799791629206e068126ff9e3a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 10 13:16:07 2014 -0500

Added bli_info_*() query functions.

Details:
- Added a new API family, bli_info_*(), which can be used to query
information about how BLIS was configured. Most of these values are
returned as gint_t, with the exception of the version string which
is char*.
- Changed how the testsuite driver queries information about how BLIS
was configured (from using macro constants directly to using the
new bli_info API).
- Removed bli_version.c and its header file.
- Added STRINGIFY_INT() macro to bli_macro_defs.h
- Renamed info_t type in bli_type_defs.h to objbits_t (not because of
an actual naming conflict, but because the name 'info_t' would now be
somewhat misleading in the presence of the new bli_info API, as the
two are unrelated).

commit 970b43141697d8c31a033f59513bb59d7cc78ab0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 10 09:30:00 2014 -0500

Minor bugfixes to BLAS compatibility layer.

Details:
- Changed bla_amax.c so that i?amax() routines now correctly return 0
if ( n < 1 || incx <= 0 ).
- Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
f2c's abs() macro for float and double cases.
- Thanks to Murtaza Ali for suggesting the two fixes above.
- Updated label of fnormv to normfv in testsuite/input.operations.

commit 8ccdfaef4c42ad8957af8607a1a9ee29b9277d4b
Author: Marat Dukhan <maratekgmail.com>
Date: Tue Jul 8 23:14:36 2014 -0700

Replicated logic from testsuite/Makefile in top-level Makefile to support make test

commit caa6507ff3724c80d60987f309b8bbc5b50a9841
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 8 10:25:27 2014 -0500

Minor cleanup to standalone test drivers.

Details:
- Very minor code changes to standalone test drivers in 'test' directory.
- Added *.so files to '.gitignore'.

commit 6c65e9a58fe55990ebb99ec3986443e18af35338
Merge: cb12e456 daca500d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 8 10:13:49 2014 -0500

Merge branch 'master' of github.com:flame/blis

commit cb12e456f94c196c093e52f02a7cbca0032fc86e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 8 10:07:46 2014 -0500

Fixed possible level-3 inf/NaN issue when beta=0.

Details:
- Redefined xpbys_mxn and xpbys_mxn_u/_l macros to employ a copy
(instead of scaling by beta) when beta is zero. This will stamp out
any possible infs or NaNs in the output matrix, if it happens to be
uninitialized. Thanks to Tony Kelman for isolating this bug.

commit daca500db5e2448ba0da8047b75eb0f88d9f40e3
Merge: ab3bc915 47023502
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Jul 3 12:52:52 2014 -0500

Merge branch 'master' of http://github.com/flame/blis

commit 4702350278af31f662b458127777dd4d85a3192f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 3 11:48:23 2014 -0500

Defined _ukernel_void() wrappers to micro-kernels.

Details:
- Added wrappers for micro-kernels so that users may invoke the
micro-kernels without knowing what the function names actually are.
This is useful when an application wishes to call the micro-kernel
from a shared library instance of BLIS, where the application may not
necessarily have the luxury of grabbing the micro-kernel name(s) from
C preprocessor macros at compile-time. Also, since the wrappers use
void* pointers, one's environment does not need to be aware of some
BLIS types such as scomplex and dcomplex. These wrappers now join the
level-1 and level-1f kernel wrappers, which pre-dated this commit.
- Removed the wrapper definitions and prototypes from the micro-kernel
test suite modules, and replaced calls to them with calls to the new
wrappers mentioned above.

commit ab3bc9153b914fbaf259e15b66c91d628e7c8661
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Jul 3 11:19:43 2014 -0500

Fixed a bug for TRSM when BLIS_ENABLE_MULTITHREADING is not set but the multithreading environment variables are turned on

commit b8134b720b985783ee6a582a3eb5d6c51f00d051
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Jul 2 16:02:39 2014 -0500

Quick and dirty multithreading for TRSM

Should work fine for small number of threads (up to 8 or maybe even 16).
However, performance is yet untested.
This parallelizes the "JR" loop for the left sided cases
and the "IR" loop for the right sided cases.

Future work is to parallelize the outer loops as well.

commit e8ef69692831db07ddbe9485a5e504ac3f03e496
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 2 14:59:27 2014 -0500

Added shared library support to build system.

Details:
- Modified top-level Makefile to support building shared (dynamic)
libraries.
- Updated most configurations' make_defs.mk files to include necessary
compiler/linker flags needed by top-level Makefile.
- Note that by default, all configurations presently do NOT build
shared libraries. To enable, one must change the value of
BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.

commit b80df0f2cffb015da02e70a82b8512da9891ab67
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 23 13:52:39 2014 -0500

Added bump-version.sh script to 'build' directory.

Details:
- Added a bash script, bump-version.sh, to aid in incrementing the BLIS
version string.

commit 9ef1f1e21d083697fc730e48d7d9169c201f3da2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 23 13:48:17 2014 -0500

CHANGELOG update (0.1.3)

0.1.3

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 23 13:48:17 2014 -0500

Version file update (0.1.3)

commit 09d9a3bf6763932d9f571085b2cfd1b8631eccba
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 23 13:43:26 2014 -0500

Reverting version file to test new version script.

Details:
- Changed version file contents to 0.1.2 so that I can test out a new
version file bumping script.

commit ebb33965981dcb2b0bdee5fc7fdf6c959420f311
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 23 11:22:50 2014 -0500

Added 'version' file.

commit 2cb9a5501a3cbeb6692cf68e896087ba73b6af69
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 23 10:42:29 2014 -0500

Removed 'version' from .gitignore file.

commit b40dcefc5ee31f67aa3990e2e9d2ef8ed1386a25
Merge: 7101a8ee b693b0cd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 23 10:39:05 2014 -0500

Merge pull request 11 from Maratyszcza/stable

[sc]axpy kernels for PNaCl

commit b693b0cddcfb41450e3c09a3ab97acb44c1ccdec
Author: Marat Dukhan <maratekgmail.com>
Date: Sun Jun 22 13:44:25 2014 -0700

[SC]AXPY kernels for PNaCl

commit 7101a8eec0327d6c3a7eb36eb4b0fd45c1c6d162
Merge: ad48dca2 020a831b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 19 21:46:50 2014 -0500

Merge pull request 10 from Maratyszcza/stable

Portable Native Client port

commit 020a831bc5f61744cb8354886aa679b99b1285f6
Author: Marat Dukhan <maratekgmail.com>
Date: Thu Jun 19 00:58:26 2014 -0700

Code clean-up in PNaCl port

commit 491be4f91ed725522f5cc7184053857c6c376ada
Author: Marat Dukhan <maratekgmail.com>
Date: Thu Jun 19 00:45:44 2014 -0700

Optimized dot product kernels for PNaCl

commit 4b8e71aab80182873a2e138eb07902b8d8fd5480
Author: Marat Dukhan <maratekgmail.com>
Date: Thu Jun 19 00:43:25 2014 -0700

Use AR rcs flags for PNaCl target to avoid warning

commit 031deb2a5c718d569bde842590a791b812f4cf1d
Author: Marat Dukhan <maratekgmail.com>
Date: Wed Jun 18 03:11:34 2014 -0700

PNaCl configuration: use pnacl-ar instead or ar (fixes build issue on Mac)

commit 68a02976e3c3638f0a9821342e269a1743e3ace3
Author: Marat Dukhan <maratekgmail.com>
Date: Wed Jun 18 03:10:25 2014 -0700

Compile pnacl configuration in GNU11 mode to avoid warning about non-standard features

commit 6f8462eb0ec278b89731e73ef583386a3371d095
Author: Marat Dukhan <maratekgmail.com>
Date: Wed Jun 18 03:08:46 2014 -0700

Fix inconsistent VERBOSE macro in Makefile

commit b2ffb4de8b6872cb23537ad282e557d11dcd9c8b
Author: Marat Dukhan <maratekgmail.com>
Date: Sun Jun 15 18:41:30 2014 -0400

Reformatted PNaCl GEMM kernels

commit 6de2d472d98baa215264a776f3d5291780a6a085
Author: Marat Dukhan <maratekgmail.com>
Date: Sun Jun 15 08:44:31 2014 -0400

CGEMM and ZGEMM kernels for PNaCl

commit f064711a5e6fb3852c17c7520909b09dc27665f2
Author: Marat Dukhan <maratekgmail.com>
Date: Sun Jun 15 06:27:37 2014 -0400

SGEMM and DGEMM kernels for PNaCl

commit ad48dca22913a363899f0bef45553898718eebb1
Merge: ee2b6792 7118f87e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 14 15:10:13 2014 -0500

Merge pull request 9 from tkelman/memalign_windows

Use _aligned_malloc instead of posix_memalign on Windows

commit 7118f87e18b4941423472afc00215c1d1f2a1fcd
Author: Tony Kelman <tonykelman.net>
Date: Sat Jun 14 06:53:20 2014 -0700

Use _aligned_malloc instead of posix_memalign on Windows

commit ee2b679281ca45fb40b2198e293bc3bc3d446632
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Jun 6 12:41:55 2014 -0500

Only include omp.h if BLIS_ENABLE_OPENMP is set

commit 19c05dfaac43c627f86e897c8c00f1f9440754aa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 5 10:54:16 2014 -0500

CHANGELOG update (for 0.1.2).

Page 4 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.