Blis

Latest version: v0.9.1

Safety actively analyzes 629004 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 7

0.1.2

Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Jun 2 13:40:57 2014 -0500

Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi

commit 3fc60e491426f6248c0feae88d971e4d1f88fb95
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 21 11:34:42 2014 -0500

Fixed ldim alignment bug in core2 gemm ukernel.

Details:
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
a segmentation fault if a column-stored matrix's starting address was
aligned, but its leading dimension was such that its second column was
unaligned. Basically, the micro-kernel was assuming that aligned load
instructions were safe when they actually were not. An extra condition
that checks the alignment of cs_c (ie: the leading dimension in the
column storage case) has now been added. Thanks to Michael Lehn for
reporting this bug.

commit 77a2d8dac8b242d7a202c9aabda3927ab68cf987
Merge: 8c5d6071 21fb0893
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 20 09:53:19 2014 -0500

Merge pull request 8 from tlrmchlsmth/master

Added multithreading to most level-3 operations.

commit 21fb089387ee7c87f6dc53b0f60f68b48d3ff3e8
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon May 19 20:38:55 2014 -0700

Reverting changes dunnington and reference configs

Now they are unchanged from the main branch of BLIS

commit 8a0ef0e0db5880730425926f8ba56b457a2ba764
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri May 16 13:44:14 2014 -0500

Fixed rounding error in bli_get_range_weighted

commit 0b4b1680334528b1b60bc696537600f763198e92
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri May 16 12:23:37 2014 -0500

Fixed bug with disabling JC loop threading for right sided trmm

commit 5c048a90d8dfa1dbde4e45fbc10ffcbdfe59d960
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed May 14 16:20:06 2014 -0500

Disabled parallelism for right-sided TRMM JC loop

The loop has dependent iterations.

commit 13a4c717ed0e273359dbaf5554cc4fa70b087d71
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed May 14 14:59:04 2014 -0500

Fixed bug with bli_get_range_weighted

commit 45957cc7745e9bb1698408d72f53ef192e960820
Author: Tyler Smith <tmscs.utexas.edu>
Date: Tue May 13 17:14:46 2014 -0500

Allowed threading to be turned off

No longer requires OpenMP to compile
Define the following in bli_config.h in order to enable multithreading:
BLIS_ENABLE_MULTITHREADING
BLIS_ENABLE_OPENMP

Also fixes a bug with bli_get_range_weighted

commit bd1dc98ce599d74513a553fe3b37a2ebca1c3812
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon May 12 17:26:19 2014 -0500

Disabled multithreading of the kc loop

commit 456df0372170bd7ca2c7e2d85365a69f1f04de88
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Apr 30 12:28:00 2014 -0500

Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity

commit f4fdfe8fc573553eb36795b79cdf681270dab71b
Merge: 31bb065b 8c5d6071
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Apr 30 11:46:35 2014 -0500

Merge http://github.com/flame/blis

commit 8c5d6071e24ba10a53669390a47287e86ff354ce
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 29 12:26:12 2014 -0500

Added _check() routines for fprint[mv], rand[mv].

Details:
- Added _check() routines for fprintm, fprintv, randm, and randv.
- Added invocations to the above routines from their respective
front-ends.

commit 262cdabcc885bcf6636f4d8bb7d320f95e81d820
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 28 16:48:25 2014 -0500

Changed treatment of NULL object buffers.

Details:
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
the buffer address being attached to be non-NULL. This is acceptable
because the user was already able to create and use objects with NULL
buffers (via bli_obj_create_without_buffer(), which initializes the
buffer to NULL).
- Inserted calls to newly defined function, bli_check_object_buffer(),
into nearly all operations' _check() or _int_check() functions. This
allows BLIS to abort peacefully if a computational routine is called
with an object containing a NULL buffer. By contrast, under such
conditions, BLAS would typically fail with a segmentation fault.
- Within operation front-ends, moved the calls to _check()/_int_check()
so that zero dimensions are checked first (and if found, execution
returns with trivial or no computation). This resolves issue 7. Thanks
to Jack Poulson for reporting this bug.

commit 31bb065ba40ae0c5a614e743b8025abca012b99e
Merge: 20e24430 7c619599
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Apr 23 12:30:19 2014 -0500

Merge http://github.com/flame/blis

commit 7c61959955c8ba78160d0ed4d1979022029d963b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 10 17:18:36 2014 -0500

Can now query register blocksizes from blk algs.

Details:
- Added a new field to blksz_t objects that allows one to attach a
sub-object. Doing this allows us to associate a register blocksize with
any given cache blocksize. That way, the register blocksize can be
queried wherever the cache blocksize would normally be accessible
(e.g. a blocked algorithm).
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
blocksizes are attached to the cache blocksizes after they are created.

commit 58671597d3d450817b2eda576c05ed6dadd8af6d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 10 15:35:30 2014 -0500

Minor cleanups to level-2 _cntl.c files.

Details:
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
imported and used, rather than blocksizes being declared locally.
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
4m/3m variants).
- Removed test/old/test_blis2.c.

commit 20e24430a772bc0fbaf24dec2f8c544096fd3f4e
Author: Tyler Michael Smith <tmsmithvestalac1.ftd.alcf.anl.gov>
Date: Tue Apr 8 17:50:44 2014 +0000

Some fixes for the bgq kernels

commit bde697f75ec1e7f2decebee0c9bd620b4c134cd5
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 16:43:44 2014 -0500

Add -openmp to ldflags as well

commit c332be8cd471eeace7b4fa4ae7443088b6a68ec3
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 16:37:50 2014 -0500

Added -openmp flag to Xeon Phi build for convenience

commit e7ca9e4b4a24d585c9aec8293fc7bb79e4171ad0
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 16:31:15 2014 -0500

Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC

commit 7b9b228c6fa4cfb70b1ebb855b009a036e85fac3
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 16:29:10 2014 -0500

Fix for tree barrier freeing bug

commit 5ec93bd9a76096312d51c326ccde1e9bd0a436ab
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 15:09:10 2014 -0500

Bunch of minor fixes

Removed barrier after unpackm in all level3 blocked variants
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)

Moved the enabling of the tree barriers into bli_config.h
Fed the default MR and NR for double precision into bli_get_range instead of the number 8

commit 575fb9b0b08f3bdb56ccde056da619d1585617c1
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 12:13:29 2014 -0500

Changed default blocking factor to default double precision MR and NR

commit ab9c7880335c281432d5809fe0dec46753d22569
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 11:38:11 2014 -0500

Added faster tree barriers necessary for performance for Xeon Phi

Fixed up some stuff in the thread info free functions
Disabled threading for TRSM so that it actually works when threading environment variables are set

commit ec58a7923cccac08632670caadf3cf6ff5dce766
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 10:22:48 2014 -0500

Freeing thread info paths.

Also made herk IC and JC loops do weighted partitioning

commit 2b6848b2397d6d84ca4e5f792fc51ad05e351a36
Merge: 4e3eb39a 21a0efb3
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Apr 4 09:54:54 2014 -0500

Merge http://github.com/flame/blis

Conflicts:
kernels/bgq/1/bli_axpyv_opt_var1.c
kernels/bgq/1/bli_dotv_opt_var1.c

commit 4e3eb39aca4df0b9fdc003d468f368a2f2ba597d
Author: Tyler Michael Smith <tmsmithvestalac1.ftd.alcf.anl.gov>
Date: Fri Apr 4 14:50:03 2014 +0000

Some fixes to the bgq config
MR and NR for double complex were wrong
Default fusing factor for double precision was wrong as well

commit 21a0efb33d7435139e9c43c1a4787a6bff533e26
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 3 16:38:44 2014 -0500

Fixed follow-up to issue 6.

commit c318157a9bee8ea6e59be16f99f65d9271fe0d27
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 3 16:24:34 2014 -0500

Fixed issue 6 (incorrect 'restrict' usage).

Details:
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
(However, there may be other instances of similar misuse elsewhere in
BLIS.) Thanks to Jeff Hammond for reporting this issue.

commit b5150a1bf3bd89598e2b3aeac110eb5b44ac6c12
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 3 12:25:45 2014 -0500

Added include "arm_neon.h" to ARM gemm ukernel.

Details:
- Inserted include "arm_neon.h" into gemm ukernel source file for
arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.

commit 2041c264517b6c590fd4f7e8253e6911b622d1c3
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Apr 3 10:30:03 2014 -0500

Added barriers needed prior to doing scalar reset for rank-k updates.

commit 47a90e69dfde3f4f8fdf90654248a6b499fbadbc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 1 14:34:31 2014 -0500

Attempted to fix uninitialized variable warnings.

Details:
- Added initialization statements to various macros used in level 1m and
1m-like operations. I wasn't able to reproduce the reported behavior,
so hopefully this takes care of it. Thanks to Jeff Hammond for the
report.

commit d27b4f690c14b1f836f8c7a3c0e91e09d852f02e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 1 12:57:24 2014 -0500

Use generic paths for toolchain in POWER7.

Details:
- Fixed issue 4. Thanks to Jeff Hammond for contributing changes.

commit 1584ae1c83c3a8c1af76acb46404747507650f19
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Mar 28 15:15:48 2014 -0500

Fixed race condition involving scalar reset

commit 459dde4acc09e49380da58fb7b246db488884ad9
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Mar 27 17:06:45 2014 -0500

Made barrier after packing implicit.

This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
but not the outer packing routines.
This allowed, for instance, the block of B to not be finished being packed before computation to occur.

commit 9f78ec6e7e95fcad89a167b27cad7e2d74b6d122
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Mar 27 14:18:46 2014 -0500

Some fixes for the internal functions,
was innappropriately only having thread chief do some things.

commit a6fd48345424e097f71652be013aa897e098b41e
Author: Tyler Michael Smith <tmsmithvestalac1.ftd.alcf.anl.gov>
Date: Wed Mar 26 17:19:46 2014 +0000

Added test drivers for level 3 BLAS that run tests in parallel using MPI

commit 73b3db594864be0f9be9a0eb29bf961fa9c95f29
Author: Tyler Michael Smith <tmsmithvestalac1.ftd.alcf.anl.gov>
Date: Wed Mar 26 15:39:05 2014 +0000

Some fixes for the bgq configuration

commit f0824a04fc75e231c3a3d7757fa4e7294173282f
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Mar 24 15:21:42 2014 -0500

Initial commit to enable threading in TRSM,

Also enabled weighted partitioning for herk, trmm
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
Correctly computed a_next and b_next for gemm, herk macrokernels
a_next and b_next point to the current micropanels in trmm

commit 23d9eab354fbc88165889832955e126772bf8488
Merge: 5d5dc2ee fd3e32a5
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Mar 20 16:54:35 2014 -0500

Merge https://github.com/flame/blis

commit 5d5dc2eedef2f7c90d61371a1b457be5c06cf583
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Mar 20 16:43:36 2014 -0500

Parallelized trmm and trmm3

Also fixed bugs in packm

commit fd3e32a5f419fa412f46afe4dd1c3a26e15f3eb4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 20 13:59:48 2014 -0500

Refined INSERT_GENTFUNC macro usage.

Details:
- Defined new INSERT_GENTFUNC macros so that the macro always takes
exactly the number of arguments needed for the particular operation or
variant being defined. Many operations were using INSERT_GENTFUNC
macros that expected one auxiliary argument even though none were
needed. Those instances have now been updated. Most of these instances
were in the level-0 and -1v operations, as well as some operations
defined in frame/util.

commit 9b0e715f29338a1a1d6445907d2445c35f011121
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 19 15:47:54 2014 -0500

Minor simplifications to trmm, trsm macro-kernels.

Details:
- Simplified some code that would have allowed the diagonal of a trmm
or trsm triangular matrix to intersect the short end of a micro-panel.
This is disallowed via higher-level constraints on cache blocksizes, so
this code was never needed and only served to obfuscate.
- Updated some comments in trmm, trsm macro-kernels.

commit a3902750b9ab4923433f7e353f3669c3c419f8e4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 19 12:35:17 2014 -0500

Reorganized norm operations.

Details:
- Completely reoganized norm operations:
- Renames:
- fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
- absumv -> norm1v (vector 1-norm)
- New operations:
- norm1m (matrix 1-norm)
- normiv, normim (infinity-norm)
- amaxv (BLAS-like absolute maximum value index)
- asumv (BLAS-like absolute sum)
- Deprecated absumm, as it did not correspond to any actual norm.
(However, an inlined version now exists in the testsuite module for
randm.)

commit c0140cb752f27e99742f85d23be2181c00a1335e
Author: Tyler Smith <tmscs.utexas.edu>
Date: Wed Mar 19 11:21:16 2014 -0500

Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state

Now just performed by the master thread.

commit fb42983bd9943711baa7d1c6496de1215bb816ef
Author: Tyler Smith <tmscs.utexas.edu>
Date: Tue Mar 18 16:37:28 2014 -0500

Fixed a barrier bug and a thread decorator bug

commit aa2405f8b23d0f8d2ec04790882f2176ef2e8fd8
Author: Tyler Smith <tmscs.utexas.edu>
Date: Tue Mar 18 15:23:09 2014 -0500

Fixing function pointer issues with thread decorator

commit ec8b88f93533942d3711191873310e7ff281bda6
Author: Tyler Smith <tmscs.utexas.edu>
Date: Tue Mar 18 14:35:37 2014 -0500

Enabled threading for packm blocked variants 3 and 4

commit 0ac534cdf657bbf04601abfe719ba2887aab5da7
Author: Tyler Smith <tmscs.utexas.edu>
Date: Tue Mar 18 13:26:27 2014 -0500

Added decorator for calling parallelized intermal functions

Will allow for easy support for different threading models

commit 5296f58975f7d351f88909cc80b6d0cffd73def7
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Mar 17 17:15:35 2014 -0500

Fixing some bugs with herk parallelization

commit c51d0110831eb89361b4720bf7ed75edbd26ebce
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Mar 17 15:00:47 2014 -0500

Initial multithreading support for HERK

commit c720b141568d1f289146bf34ded08001f2c0dfbb
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Mar 17 11:39:32 2014 -0500

Switched to using environment variables to control threading.

The environment variables all follow the format BLIS_X_NT,
where X is the index of the loop as described in our paper
Anatomy of High Performance Many-Threaded Matrix Multiplication.
These indices are IR, JR, IC, KC, and JC.

Also enabled parallelism for hemm and symm, but these are currently untested.

commit 92233cf64274b27b2217c5cfffe75443ff6137a4
Author: Tyler Smith <tmscs.utexas.edu>
Date: Tue Mar 11 14:16:08 2014 -0500

Some fixes to gemm thread info tree creation,
Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
instead of BLIS_SINGLE_THREADED

commit 020f80c30289d8bcaa688bf600b01fae9b23b54f
Author: Tyler Smith <tmscs.utexas.edu>
Date: Tue Mar 11 12:08:17 2014 -0500

Added files specific to threading for gemm and packm operations

commit 8d8f4352a41926bc923e47be836365b6b726aff2
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Mar 10 15:47:28 2014 -0500

Added single threaded thread info data structures specifically for gemm and packm

commit 0e8677761175189583ca7d855e24b2bbdd2dada8
Merge: 2e727a02 b3bff631
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Mar 10 15:16:21 2014 -0500

Merge branch 'master' of https://github.com/tlrmchlsmth/blis

commit 2e727a025a8f796d2b6bd14f489d0ee72e7d1fc7
Author: Tyler Smith <tmscs.utexas.edu>
Date: Mon Mar 10 15:14:33 2014 -0500

Modifying the thread info data structures

This change makes each operation have its own thread info type,
allowing more fine control of threading in operations that have different types of suboperations

commit a770590cf21a459f04bf941c58ee2afd272cc441
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 3 14:31:44 2014 -0600

Minor fixes to sumsqv, abmaxv.

Details:
- Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with
LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when
the vector (or matrix) contains a NaN.
- Minor change to bli_abmaxv_unb_var1() to more closely mimic the
behavior of netlib BLAS's izamax(). There, a "less than or equal to"
operator is used in the search instead of "less than", which would
change the element index returned if there were multiple maximum values.
- Added macro function definitions for bli_isinf() and bli_isnan(), which
are currently implemented in terms of isinf() and isnan() from math.h.

commit b3bff631eadf98b15cb422fb4a8e2f855c23e8a7
Merge: 2c158fb8 e8757b03
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 27 16:53:24 2014 -0600

Merge https://github.com/flame/blis

commit 2c158fb885c27f7b599dc1e85b57edd684f19223
Merge: e4738c48 c2b2ab62
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 27 16:46:23 2014 -0600

Merge https://github.com/flame/blis

Conflicts:
frame/1m/packm/bli_packm_blk_var1.c

commit e8757b03a74f9891632242e9a90efb32150826f5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 27 16:40:07 2014 -0600

Use "%ld" as int format specifier in fprintm.

Details:
- Changed "%d" to "%ld" when printing integers via bli_fprintm().
- Meant to include this in previous commit.

commit c663ce3b5170fee7dfb5b528b650d70c8e932cac
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 27 16:32:57 2014 -0600

Fixed various bugs when C99 complex is enabled.

Details:
- Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and
elsewhere in the framework that were not yet set up to work properly
when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h
- Extensive changes to f2c-derived files in frame/compat/f2c to allow
C99 complex storage. Most of these changes center around accessing
real and imaginary components via bli_?real()/bli_?imag() accessor
macros, and setting of values via bli_?sets() assignment macros.
(Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX
was broken.)

commit e4738c48e00b89391d9baa1fd0aa62d1ea2f95e6
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 27 16:29:46 2014 -0600

Added support for parallelism in gemm micro-kernel

commit bfe214b633765ed40b57b330fbb84c332663aa40
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 27 15:53:10 2014 -0600

Fixed bug with parallel packing, and bug with allocating an array of thread infos

In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency.
This dependeny was removed, allowing each iteration to be executed in parallel.

Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs.

commit 6193d9ceea552e67170dba45abde04c64271c705
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 27 14:09:19 2014 -0600

Fixed bug in thread trees

commit ac5a2de1d17ffd460b00fee9757898525a09abae
Merge: 01b125e8 bd3c7ecf
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 27 11:59:33 2014 -0600

Merge branch 'master' of https://github.com/tlrmchlsmth/blis

commit 01b125e815f19410e8e0611d088b84570e499e93
Author: Tyler Smith <tmscs.utexas.edu>
Date: Thu Feb 27 11:55:45 2014 -0600

First pass at adding parallelism to BLIS.

Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.

commit c2b2ab62707e4174892aff3ce65f36f54878fae5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 26 12:46:45 2014 -0600

Deprecated panel stride alignment in bli_config.h.

Details:
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
configurations. It was already going unused in packm_init() since the
recent 4m/3m commit. This setting was rarely, if ever, useful, and its
existence only posed a potential risk for 4m/3m-based implementations.
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
micro-kernels.

commit f18aee83a5ac1b14808686fc3c5a3c846a1d99b9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 25 17:58:42 2014 -0600

CHANGELOG update (for 0.1.1).

0.1.1

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 25 13:34:56 2014 -0600

Added extensive support for configuration defaults.

Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
macro constants. Examples:
BLIS_SAXPYV_KERNEL_REF
BLIS_DDOTXF_KERNEL_REF
BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
with a common base name; [sdcz] datatype flavors of each kernel or
micro-kernel (level-1v, -1f, or 3) may now be named independently.
This means you can now, if you wish, encode the datatype-specific
register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
undefined in bli_kernel.h will default to the corresponding reference
implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
datatype chars to match the number of types the kernel WOULD take in
a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
your level-1v/-1f kernels. The framework now prvides a _kernel()
function which serves as the obj_t wrapper for whatever kernels are
specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
longer need to include any prototyping headers from within
bli_kernel.h. The framework now generates kernel prototypes, with the
proper type signature, based on the kernel names defined (or defaulted
to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
kernel is left undefined by bli_kernel.h, but its same-precision real
domain equivalent IS defined, BLIS will use a 4m-based implementation
for the datatype x implementations of all level-3 operations, using
only the real gemm micro-kernel.

commit 15b51e990f1d21333b5f7af97c211756247336e5
Merge: 6363a9f6 fc04b5eb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 21 09:04:32 2014 -0600

Merge branch 'master' of github.com:fgvanzee/blis

commit fc04b5eb69868c341ce03f5ef1f02de4b8c121b0
Merge: b29e1c2b d1813c9d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 21 09:04:13 2014 -0600

Merge pull request 3 from figual/master

New ARM armv7a kernels and Assembly file consideration in Makefile

commit d1813c9dee34410833db5061e6588ec1a6c9ecd4
Author: Francisco Igual <figualpandaboard.(none)>
Date: Fri Feb 21 15:14:31 2014 +0100

Added new armv7a micro-kernels and configuration files from Werner Saar.

commit 0cd098c03a000ed9426a7e9135190696da8cadbc
Author: Francisco Igual <figualpandaboard.(none)>
Date: Fri Feb 21 15:12:30 2014 +0100

o Modified Makefile to consider .S assembly microkernels.

commit 6363a9f658257fe3d814a3dce5308f807adb54a2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 19 17:00:52 2014 -0600

Added level-3 support for complex via 4m-/3m.

Details:
- Added the ability to induce complex domain level-3 operations via new
virtual complex micro-kernels which are implemented via only real
domain micro-kernels. Two new implementations are provided: 4m and 3m.
4m implements complex matrix multiplication in terms of four real
matrix multiplications, where as 3m uses only three and thus is
capable of even higher (than peak) performance. However, the 3m method
has somewhat weaker numerical properties, making it less desirable
in general.
- Further refined packing routines, which were recently revamped, and
added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.

commit b29e1c2b278c177e104c84ba462820ee8296df6c
Merge: ee60377e bd3c7ecf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 14 14:11:54 2014 -0600

Merge pull request 2 from tlrmchlsmth/master

Fixes and improvements to xeon phi implementation.

commit bd3c7ecfb54a9b9851c7d364f41c21e4cff52f6f
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Feb 14 14:05:57 2014 -0600

Removing changes to input.general and input.operations

commit ce066863683cb4e910270cf8ab8e138b01ff3358
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Feb 14 13:40:24 2014 -0600

Fixed more Xeon Phi bugs, especially with scattered update

commit 31134b5c7076423aee1b4f494e925f27171d97e6
Author: Tyler Smith <tmscs.utexas.edu>
Date: Fri Feb 14 11:19:44 2014 -0600

Some fixes, changes, and improvements to the microkernel to the Xeon Phi

commit ee60377e467862b9d8a7205c45dce5cf66c78c46
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 13 14:03:31 2014 -0600

Shifted some fields in info_t.

Details:
- Shifted the pack order, pack buffer type, and structure type fields
to make room for an extra bit in the pack type/status field.

commit bd3ab1ad4cf42f8bc30ab262acf8eccb49bb1a08
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 13 09:29:55 2014 -0600

Minor fixes to trsm consistent with prev on trmm.

Details:
- Removed use of bli_min() and bli_max() that were only being used to
try to support situations where the diagonal would intersect the
short end of some micro-panels, which is situation that is disallowed
at a higher level by various constraints on the register and cache
blocksize. This only affected trsm_ll and trsm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
it via k and PACKMR/PACKNR. This affects all macro-kernels of trsm.

commit 6260b0b5f8bd248f3f66e5a1c6854bdbd9d02ad0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 13 09:19:56 2014 -0600

Fixed obscure bug in trmm_ll, trmm_lu.

Details:
- Fixed an obscure bug in left-hand trmm that would only manifest when
non-zero register blocksize extensions (PACKMR > MR or PACKNR > NR)
are used.
- Removed use of bli_min() and bli_max() that were only being used to
try to support situations where the diagonal would intersect the
short end of some micro-panels, which is situation that is disallowed
at a higher level by various constraints on the register and cache
blocksize. This only affected trmm_ll and trmm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
it via k and PACKMR/PACKNR. This affects all macro-kernels of trmm.

commit 16915c1c1e55c660bf82141cdadf7c0860d5b464
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 11 10:54:19 2014 -0600

Fixed an obscure bug in packm_cxk().

Details:
- Fixed a bug in packm_cxk() whereby the packm ukernel was being chosen
from ldp, which is always equal to PACKMR or PACKNR. The problem with
this is that the pack ukernels were implicitly assuming that the
panel dimension of the panel being packed was equal to ldp, which
is not the case when the register blocksizes extensions are non-zero
(ie: when PACKMR > MR or PACKNR > NR, whichever is applicable). This
problem has been fixed by passing ldp into the pack ukernels, which
now walk through the packed micro-panel region by incrementing by this
value, rather than incrementing by the inherent panel dimension value
assumed by each packm ukernel (e.g. 4 in the case of packm_ref_4xk).
- Also fixed a very minor edge case inefficiency whereby pack ukernels
smaller than the default were not being used in edge cases, and instead
those situations were being handled by scal2m. This is related to the
issue above, because the pack ukernel itself was being chosen based on
ldp instead of the panel dimension.

commit b7da57b282c5a5e2208946e60309d2352f55351d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 11 10:28:23 2014 -0600

Updated calls to packm_blk_var2() in testsuite.

Details:
- In ukernel testsuite modules, replaced calls to packm_blk_var2() with
_var1(). Meant to include this in previous commit.

commit c255a293e25b2223c88e8800267cd06ad2a90041
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Feb 10 14:31:24 2014 -0600

Consolidated packm_blk_var2 and var3.

Details:
- Consolidated the functionality previously supported by packm_blk_var2()
and packm_blk_var3() into a new variant, packm_blk_var1().
- Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
to accommodate above changes.
- Removed packm_blk_var3() and retired packm_blk_var2() to
frame/1m/packm/old.
- Updated all level-3 _cntl_init() functions so that the new, more
versatile packm_blk_var1 is used for all level-3 matrix packing.

commit 32d8f264ae7b28155f5d7b21dcc5ecb78da2e0ab
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Feb 9 10:07:37 2014 -0600

Refactored packm variants.

Details:
- Revised packm_blk_var2() and _var3() by encapsulating the general,
hermitian/symmetric, and triangular panel-packing subproblems into
separate functions: packm_gen_cxk(), packm_herm_cxk(), and
packm_tri_cxk(), respectively. Also, homogenized the packm code as
well as the new specialized packm_*_cxk() code to further improve
readability.

commit 6c8067028707947fcdf4f856a272e15bb9ed91e3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Feb 7 11:27:15 2014 -0600

Renamed enumerated type in testsuite and modules.

Details:
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
renamed all corresponding "impl" variables to "iface".

commit 6c12598b1bc567f0b08f58aebdc753a1c1390378
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 6 18:26:35 2014 -0600

Employ simpler INSERT_ macro for ref ukernels.

Details:
- Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
argument--the base name of the function--and employed this macro
in the reference micro-kernel files instead of the _BASIC macro,
which takes one auxiliary argument. That argument was not being
used and probably just acted to unnecessarily obfuscate.

commit 32cae66326b68706d0e695cfd60c9ca5bc32c534
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 6 18:06:42 2014 -0600

Fixed some instances of sloppy 'restrict' usage.

Details:
- Fixed some technical incorrectness with some usage of the 'restrict'
keyword in the reference trsm micro-kernels.
- Tweak to testsuite/Makefile that causes rebuild if libblis was
touched.

commit 7aceef7683e2a2aff3c7ec2a73508036af2e19e2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 6 17:31:19 2014 -0600

Updated comments in macro-kernels.

Details:
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
section of macro-kernels.
- Changed register blocksizes of reference configuration to MR = 8 and
NR = 4. It's always good for MR != NR in the reference configuration
since it may help uncover bugs related to non-square micro-kernels.

commit 8fd292aa78950bcdf556605718f09d13f9575abc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Feb 6 14:32:21 2014 -0600

Pass panel dimensions into macro-kernels.

Details:
- Modified the interfaces to the datatype-specific macro-kernels so that:
- pd_a and pd_b are passed in (which contain the panel dimensions of
packed panels of a and b).
- rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
- Modified implementations of datatype-specific macro-kernels so pd_a,
pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
and PACKNR, respectively.
- Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
header file bli_kernel_post_macro_defs.h.

commit 3404e6657eabb017cd1580a2f1dd8e6fb13df923
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Feb 5 11:19:10 2014 -0600

Deprecated incremental blocksize macro const defs.

Details:
- Removed macro constant definitions related to incremental blocksizes
from all configurations' bli_kernel.h files. This change is minor and
is mostly a cleanup related to a previous commit.

commit 1e9afd39a63e0a58167d4439c1a0a880a4a35657
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 4 20:15:19 2014 -0600

Comment updates (removed vestiges of "bd").

commit 5cf58f7c2d5bc0d2d94d9576f7158d8f133b7aac
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Feb 4 09:15:19 2014 -0600

Added early returns for "object is zeros" case.

Details:
- Added some logic to packm_init(), pack_int() and gemm_int() so that
(a) objects marked as BLIS_ZEROS are not packed, and (b) those
objects are not computed with. This functionality is not currently
needed by any existing implementations, but may be used in the
future.

commit 6bbd4be769a9b344a55abe5ddaca1a99fd29f7b4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Feb 3 13:15:25 2014 -0600

Added 'f' on some gemm and trmm blocked variants.

Details:
- Added 'f' to some block variant files/functions to be consistent with
other file/functions' naming convention. Here, the f indicates
partitioning in the "forward" direction.

commit eb13cb2c6b182df5e2a9b88c76f50e2cee25b9e0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Feb 3 11:07:01 2014 -0600

Removed redundant non-gemm blksz_t creation.

Details:
- Removed code that creates duplicate blksz_t objects for herk, trmm,
and trsm. Instead, the gemm blksz_t objects are accessed via extern
and used directly. This reduces the amount of code associated with
each of the three _cntl_init() and _cntl_finalize() function.

commit 0a023a7d9e58e53b8c204a5f49aa8ca9afeba938
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jan 29 14:02:08 2014 -0600

Introduced new level-3 front-end layer.

Details:
- Added new _front() functions for each level-3 operation. This is done
so that the choosing of the control tree (and *only* the choosing of
the control tree) happens in what was previously the "front end"
(e.g. bli_gemm()). That control tree is then passed into the _front()
function, which then performs up-front tasks such as parameter
checking.

commit 251c5d112196d37b183e554bc9d406104aed65fb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jan 28 19:40:29 2014 -0600

Removed redundant hemm, her2k control trees.

Details:
- Removed code that generated a control tree specifically for hemm and
symm. Instead, the gemm control tree is now configured so that it
works for gemm, hemm, or symm.
- Retired most her2k code, as it was not being used. (Currently, her2k is
implemented as two invocations of herk.) I couldn't think of many
situations where her2k variants were needed.
- Removed some older her2k code.

commit 5a36e5bf2f59d1e85d6dbce32a07d604c5e82d11
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jan 27 11:13:00 2014 -0600

Embed func_t microkernel objects in control trees.

Details:
- Modified all control tree node definitions to include a new field of
type func_t*, which is similar to a blksz_t except that it contains
one function pointer (each typed simply as void*) for each datatype.
We use the func_t* to embed pointers to the micro-kernels to use for
the leaf-level nodes of each control tree. This change is a natural
extension of control trees and will allow more flexibility in the
future.
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
from the incomming (previously ignored) control tree node and then pass
the queried pointer into the datatype-specific macro-kernel code, which
then casts the pointer to the appropriate type (new typedefs residing
in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
is, determined when the datatype-specific macro-kernel functions are
instantiated by the C preprocessor).
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
base names if they do not exist already, and then uses those to build
datatype-specific micro-kernel function names. This will allow
developers extra flexibility if they wanted to, for example, name each
of their datatype-specific micro-kernels differently (e.g. double
real might be named bli_dgemm_opt_4x4() while double complex might be
named bli_zgemm_opt_2x2()).
- Inserted appropriate code into _cntl_init() functions that allocates
and initializes a func_t object for the corresponding micro-kernels.
The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
and then reused via extern wherever possible.

commit 6cbd6f1c7f1915180aa28939833afde48665c5ae
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jan 24 10:38:29 2014 -0600

Removed commented mixed domain macro-kernel code.

Details:
- Removed commented-out code from macro-kernels that was supposed to
facilitate implementing mixed domain (complex times real) matrix
multiplication. This functionality is still (probably possible),
but I'm getting tired of looking at the code every time I edit
a macro-kernel. Plus, there are probably ways of doing it at a
higher level, via control trees.

commit 29778be1119f1a884330d7f8dc424a2df4101d58
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jan 22 16:03:11 2014 -0600

Removed b_aux field from cntl nodes.

Details:
- Removed b_aux field from all control tree node definitions. This field
was being used in certain optimizations (incremental blocking) that were
not actually being employed within BLIS, and are probably not employed
by others.
- Updated all _cntl_obj_create() function definitions and invocations
according to above change.
- Retired bli_gemm_blk_var4.c, which was one such function that employed
incremental blocking, but which was never called by BLIS itself.

commit 06ac727a42ec9e832c7832745036702014638f99
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jan 15 16:44:52 2014 -0600

Updated some comments in level-3 front ends.

commit d628bf1da1560f1f5126a1ddfed8714f0a4b8da3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jan 15 11:40:12 2014 -0600

Consolidated pack_t enums; retired VECTOR value.

Details:
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
makes room in the three pack_t bits of the info field of obj_t so that
two values are now unused, and may be used for other future purposes.
- Updated sloppy terminology usage in comments in level-2 front-ends.
(Replaced "is contiguous" with more accurate "has unit stride".)

commit ddc8c1c379b4787be5954802906593d7ea144452
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jan 13 14:55:43 2014 -0600

Suppress warning in Makefile (UNINSTALL_LIBS).

Details:
- Redirect errors to /dev/null when using 'find' to locate libraries that
would be uninstalled upon executing "make uninstall-old". Before, if the
Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
or directory" message was emitted. This message was harmless, but is now
suppressed in this situation.

commit f8f67d7251bffc05020e20527c100c8115fd5e55
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jan 10 09:06:11 2014 -0600

Typecast bli_getopt() return value in testsuite.

Details:
- In the test suite driver, inserted an explicit typecast of the return
value of bli_getopt() prior parsing. The lack of typecast caused a
problem on at least one system whereby a return value of -1 was
interpreted as garbage character. Thanks to Francisco Igual for finding
and submitting this fix.

commit e7f154fe2ed3e10e2323cefe5d25c2c23ac902c4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jan 10 08:48:07 2014 -0600

Applied edge case fix to arm/neon microkernel.

Details:
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
double precision real gemm microkernel in kernels/arm/neon/3.

commit 89c76a8a51d070d263c13bfa5ace65769509f2b4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jan 9 12:08:37 2014 -0600

Allow building outside source distribution.

Details:
- Modified build system (mostly configure and top-level Makefile) so that
a user can build a BLIS library outside of the top-level directory of
the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
which will compile, link, and run the testsuite binary. This works even
if the build directory is externally located, thanks to the test suite
binary's new -g and -o command-line options. Also, when creating the
test suite via the top-level Makefile, the linking is against the
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.

commit 12fa82ec12cc340ab28552997d9d50f7c98691f8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jan 8 16:09:26 2014 -0600

Implemented bli_getopt().

Details:
- Added bli_getopt.c and .h files to frame/base. These files implement
a custom version of getopt(), which may be used to parse command line
options passed into a program via argc/argv. I am implementing this
function myself, as opposed to using the version available via unistd.h,
for portability reasons, as the only requirements are string.h (which
is available via the standard C library).
- Modified test suite to allow the user to specify the file name (and/or
path) to the parameters and operations input files: -g may be used to
specify the general input file and -o to specify the operations input
file). If -g or -o or both are not given, default filenames are assumed
(as well as their existence in the current directory).

commit cafb58e86ea5cfb21b9eedc57ca8ebbf24252098
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jan 6 13:28:36 2014 -0600

Updated template micro-kernels to use auxinfo_t.

Details:
- Updated template micro-kernel implementations (located in
config/template/kernels), to adhere to the new auxinfo_t interface.
Meant to include this change in a0331fb1.
- Changed template configuration to use 64-bit integers (for both BLIS
and the BLAS compatibility layer).

commit 9ab126b499c3805045020cb89a8a5848e28d3bf5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jan 6 12:13:26 2014 -0600

Removed error checks in netlib->BLIS param mapping

Details:
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
If the char value input to these functions was not one of the defined
values, bli_check_error_code() with the appropriate error code value
would be called, resulting in an abort(). This was unnecessary and
redundant since these routines are currently only used within the
BLAS compatibility layer, and they are only called AFTER parameter
checking has already been performed on the original BLAS char values.
If the application tried to override xerbla() to prevent an abort()
from being called, this error checking would still get in the way.
Thus, instead of reporting the error situation to the framework (ie:
calling abort()), an arbitrary BLIS parameter value is now chosen and
the function returns normally. Thanks to Jeff Hammond for finding and
reporting this issue.

commit 2cb13600f9f9601c60e7f96f4ca159d169ade9cb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jan 3 12:29:13 2014 -0600

Updated year in copyright headers to 2014.

commit 290fa54e0083c9c837188b8321b13b1b282e7b0c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Dec 20 14:10:26 2013 -0600

Store variable panel strides in trmm/trsm auxinfo.

Details:
- Changed the value being stored into the auxinfo_t structure in trmm
and trsm macro-kernels. Whereas before we stored whatever value was
provided to the macro-kernel implementation via ps_a/ps_b, now we
store the stride that will advance to the next variable-length
micro-panel of the triangular matrix A (left) or B (right).
- Whitespace changes to the files affected above.

commit e3a6c7e77667fd749248df3f75f880266c3136ec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 19 16:29:31 2013 -0600

Macroized conditionals for a2/b2 in macro-kernels.

Details:
- Replaced conditional expressions in macro-kernels related to computing
the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.

commit a0331fb10a50393e31d16339053b75b944132da1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 19 14:50:11 2013 -0600

Introduced auxinfo_t argument to micro-kernels.

Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
with a pointer to a new datatype, auxinfo_t, which is simply a struct
that holds a_next and b_next. The struct may hold other auxiliary
information that may be useful to a micro-kernel, such as micro-panel
stride. Micro-kernels may access struct fields via accessor macros
defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
as well as macro-kernels (for declaring and initializing the structs)
according to above change.

commit 392428dea4001fe4384efe29f6cde32f8abeeb35
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 12 19:01:47 2013 -0600

Added "ri" scalar macros.

Details:
- Added set of basic scalar macros that take arguments' real and
imaginary components separately, named like the previous set except
with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
"whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.

commit f60c8adc2f61eaba06b892f4e73000159de93056
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 10 14:39:56 2013 -0600

Minor updates to dunnington configuration.

Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.

commit 4ef20150492db254b5baf2368add62e19b0ac11b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 9 18:53:03 2013 -0600

Tweaks to dunnington configuration (x86_64/core2).

Details:
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
- Enabled cache blocksize extension of up to 25% for MC and KC (for
double-precision real).

commit 5ad2ce7bf5ba3ea955e6d517bfd270e02820263b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 9 18:30:49 2013 -0600

Minor x86_64 (core2) kernel fixes.

Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
for x86_64/core2 were calling the wrong reference code (l instead
of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.

commit d289f5d3a9c0e1a68a17c1c32b736e282a289c4c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 5 10:56:13 2013 -0600

Whitespace changes to level-2 blocked variants.

Details:
- Joined some lines in level-2 blocked variants to match formatting used
in level-3 blocked variants.
- Streamlined implementation of bli_obj_equals() in bli_query.c.

commit b444489f100d218bc8ef29b01ff8489c358559f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 3 16:08:30 2013 -0600

Added new "attached" scalar representation.

Details:
- Added infrastructure to support a new scalar representation, whereby
every object contains an internal scalar that defaults to 1.0. This
facilitates passing scalars around without having to house them in
separate objects. These "attached" scalars are stored in the internal
atom_t field of the obj_t struct, and are always stored to be the same
datatype as the object to which they are attached. Level-3 variants no
longer take scalar arguments, however, level-3 internal back-ends stll
do; this is so that the calling function can perform subproblems such
as C := C - alpha * A * B on-the-fly without needing to change either
of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):

bli_obj_init_scalar_copy_of()
-> bli_obj_scalar_init_detached_copy_of()
bli_obj_init_scalar() -> bli_obj_scalar_init_detached()
bli_obj_create_scalar_with_attached_buffer()
-> bli_obj_create_1x1_with_attached_buffer()
bli_obj_scalar_equals() -> bli_obj_equals()

- Defined new functions:

bli_obj_scalar_detach()
bli_obj_scalar_attach()
bli_obj_scalar_apply_scalar()
bli_obj_scalar_reset()
bli_obj_scalar_has_nonzero_imag()
bli_obj_scalar_equals()

- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:

bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
bli_obj_is_scalar() -> bli_obj_is_1x1()

- Defined new macros to set and copy internal scalars between objects:

bli_obj_set_internal_scalar()
bli_obj_copy_internal_scalar()

- In level-3 internal back-ends, added conditional blocks where alpha and
beta are checked for non-unit-ness. Those values for alpha and beta are
applied to the scalars attached to aliases of A/B/C, as appropriate,
before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
attached to A and B are multiplied together to obtain alpha, while beta
is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
future support for mixed domain/precision. These can be added back later
once that functionality is given proper treatment. Also, removed the
creating of copy-casts of alpha and beta since typecasting of scalars
is now implicitly handled in the internal back-ends when alpha and
beta are applied to the attached scalars.

commit 992de486d6f23e69a623abd15ae77d7881d13871
Merge: 9552e6ee fd4ac636
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 2 13:58:46 2013 -0600

Unimplemented kernels now call reference.

Details:
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
datatypes call the corresponding reference kernel. Previously, these
kernel functions called abort() with a "not yet implemented" error
message.

commit fd4ac636d9a55cec1476a444bd4e70def219dc8f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 2 13:50:36 2013 -0600

Unimplemented kernels now call reference.

Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
unimplemented kernel functions simply call the corresponding reference
implementation. (Previously, these unimplemented functions would
abort() with a "not yet implemented" message.)

commit 9552e6ee824d4345d5e908e869e071d19829819a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Nov 24 11:40:31 2013 -0600

Removed optional scaling from packm control tree.

Details:
- Removed does_scale field from packm control tree node and
bli_packm_cntl_obj_create() interface. Adjusted all invocations of
_cntl_obj_create() accordingly.
- Redefined/renamted macros that are used in aliasing so that now,
bli_obj_alias_to() does a full alias (shallow copy) while
bli_obj_alias_for_packing() does a partial alias that preserves the
pack_mem-related fields of the aliasing (destination) object.
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
will work just fine for bli_trmm3().
- Removed some commented vestiges of the typecasting functionality needed
to support heterogeneous datatypes.

commit e65c476284db9ef64b23191a21c2584b1083342f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 19 10:05:35 2013 -0600

Minor updates to packm_blk_var2.c and _blk_var3.c.

Details:
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
instead of setm(), scal2m().

commit 9e1d0d4bca48eda54301d8976f203e2544c9df3a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 18 18:11:07 2013 -0600

Added trsm_l, trsm_u ukernels for x86_64/core2.

Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
that already existed in kernels/x86_64/core2-sse3/3.

commit 85e7e02ea3a9190b6fcff5d46b00d41c79cb1242
Merge: 67761e22 70720054
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 18 12:02:00 2013 -0600

Merge branch 'master'. Forgot to git-pull.

commit 67761e224c92500eecf9c1540cc72bdd2fb27679
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 18 11:57:40 2013 -0600

Attempting to fix errors in bgq build.

Details:
- Removed restrict declaration from b_cast and c_cast from
bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
are causing problems for xlc only in those two files and no other
macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.

commit 707200541d344f98cf34c9801954dbb36fbe0447
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 18 11:17:31 2013 -0600

Syntax error fix in x86_64/core2 gemmtrsm_u ukr.

commit bbe2b84a49e7785d4d0c514cda34adfbe66478b0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 18 11:11:06 2013 -0600

Updated Makefile in test, testsuite.

Details:
- Updated Makefiles in test and testsuite directories to use the new
BLIS header installation directory scheme, which is to compile with
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.

commit 9bd7fcfd436625ca2108128086671319362f4d92
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 18 10:58:09 2013 -0600

Outer-to-inner 'restrict' fix in macro-kernels.

Details:
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
macro-kernels. Previously, all restricted pointers were being declared
at the outer-most function scope level. While this violates the C99
standard, very few of the compilers used with BLIS so far have seemed
to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
for identifying this bug (and suggesting the fix).

commit 50549a6a31dd26cf63a013e0ede16b2c7ce835b6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Nov 17 18:31:27 2013 -0600

Changed header install directory to include/blis.

Details:
- Changed top-level Makefile so that headers are installed to
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.

commit d70733abddfb9a95661897e1e4f3c1f3cfa7cbaa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 16 17:34:25 2013 -0600

Added ARM kernels, configurations.

Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
Thanks to Francisco Igual for contributing these kernels and
configurations.

commit d37c2cff62089c86983c2f79762f4b5329037373
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 13 10:47:11 2013 -0600

Minor comment and Makefile changes.

Details:
- Added missing 'check-config' and 'check-make-defs' targets to
testsuite/Makefile.
- Removed unused 'test' target from top-level Makefile.
- Comment changes to testsuite input files.

commit 19885f893a17b91ee79bead0620d0f913392d4c5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 11 12:09:21 2013 -0600

Updated some kernel comment headers.

Details:
- Updated bgq and piledriver comment headers to use BLIS copyright header
instead of libflame.

commit 1a4d698f42981d74fe5f29b980031e1ee7dc42d5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 11 10:15:40 2013 -0600

CHANGELOG update (for 0.1.0).

0.1.0

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 9 17:18:00 2013 -0600

Added object wrappers to 1f test suite modules.

Details:
- Added missing object wrappers to level-1f test suite modules. This was
only apparent if you were configuring with something other than the
reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
working as intended the reference configuration was selected, because
most kernel sets, such as those in the template set, do not have object
wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.

commit 9ef3752079de10124bed906b5d28479d04aa8187
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 8 17:20:47 2013 -0600

Updated template kernels wrt KernelsHowTo wiki.

Details:
- Merged latest state of KernelsHowTo wiki into template micro-kernels
located in config/template/kernels/3.

commit 376bbb59c8944e29c5c1ff6637920d8451370afa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 8 11:17:34 2013 -0600

Removed support for duplication.

Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.

commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Nov 7 11:36:11 2013 -0600

Added comments to testsuite/input.operations.

Details:
- Added extensive comments to the top of testsuite/input.operations,
which describe how to edit the file.
- Removed input.operations.0 and input.operations.1.
- Changed input.general to test all datatypes ("sdcz") by default.

commit a98f78b715fb256a519870071bb5266130d70b21
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 6 15:32:47 2013 -0600

Changed dim_t and inc_t to be signed integers.

Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
This will facilitate interoperability with Fortran in the future.
(Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
or use the absolute value of the strides, rather than the raw strides
which may now be signed. Added new macros bli_is_row_stored_f() and
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
and changed the packm_blk_var[23] variants to use these macros instead
of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
layer properly handles situations where vector increments are negative.
Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
_check routines.

commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 6 10:09:10 2013 -0600

Minor comment update to BLAS compat files.

commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 4 15:50:00 2013 -0600

Fixed bugs in scalv and setv.

Details:
- Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
a segmentation fault may occur if beta is not the same type as
the vector operand for scalv and setv.
- Changed axpyv and scal2v front-ends in a similar fashion.

commit f5953259a1842ee48e5833c22ac86e68a337bfe1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Nov 4 14:43:55 2013 -0600

Fixed a bug related to Hermitian matrix diagonals.

Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
diagonal elements of Hermitian matrices were already zero. This property
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.

commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 2 17:19:40 2013 -0500

Added scaling to abval2s, sqrt2s macros.

Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
and overflow from squaring the real and imaginary components. (This is
the same technique used to fix recent bugs in invscals/invscaljs and
inverts.)

commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 1 10:28:04 2013 -0500

Added new dotxaxpyf variant 2.

Details:
- Added a new variant for dotxaxpyf that is based on dotxf and axpyf
kernels. By default, this variant is not used by any other operation.

commit 97f89fbcf202d72fc440b614708e352ea31633e2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 1 10:16:39 2013 -0500

Fixed bug in complex invscals.

Details:
- Fixed complex inversion in invscals and invscaljs whereby the
imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
operator instead of "<".

commit eda42a21d17a2742eab69ab801ed530b82488c8a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 31 18:00:44 2013 -0500

Defined missing symbols in bla_rotg.c

Details:
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
these bugs.

commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 30 14:39:01 2013 -0500

Fixed bugs in scalm and setm.

Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
beta is not the same type as the matrix operand. Thanks to Vladimir
Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
and setm; namely, the alpha scalar is copy-cast the type of the first
matrix operand.
- Changed the template and reference configurations' bli_config.h files
so that the number of memory allocator blocks of A and B are set based
on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.

commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 24 14:32:20 2013 -0500

Fixed over/under-flow in complex inversion.

Details:
- Fixed the complex bli_?inverts() macros, which were inverting elements
in an "unsafe" manner, such that very large and very small values were
unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
reporting this bug.
- Comment update to bli_sumsqv_unb_var1.c.
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
- Changed 1.0F to 1.0 for bli_drands() macro.

commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 23 12:15:25 2013 -0500

Fixed parameter checking issue in BLAS syr[2]k.

Details:
- Fixed a minor parameter checking bug in the BLAS compatibility layer
for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
trans parameter of either operation, it is (a) allowed, and (b) treated
as 'T' (whereas previously it was disallowed). Thanks for Vladimir
Sukharev for finding and reporting this bug.

commit a091a219bda55e56817acd4930c2aa4472e53ba5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 14 10:11:29 2013 -0500

Minor fixes to piledriver configuration, ukernel.

Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
configuration and gemm micro-kernel.
- Very minor changes to test suite input files.

commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 11 11:37:19 2013 -0500

Added Fran's Sandy Bridge kernels/configuration.

Details:
- Added a kernel directory for kernels developed by Francisco Igual for
the Sandy Bridge architecture, including a dgemm ukernel coded with
AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.

commit 03106d650e4030d4c9831683448376f92fc52d41
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 11 10:40:38 2013 -0500

Fixed minor perf bug in gemm_ker_var2.

Details:
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
computed correctly (ie: do not wraparound) at the edge cases. Thanks to
Tze Meng for helping me identify this bug.

commit b053337387dbdef9035be03538222670a21707ca
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 10 18:26:55 2013 -0500

Added fusing factors, MR/NR to test suite output.

Details:
- Updated the test suite driver (and modules where appropriate) so that
the level-1f fusing factors are output along with the variable dimension.
While this is not strictly necessary, since the fusing factors are output
in the initial parameter summary, it allows extra reassurance to the user
since the fusing factors appear alongside the variable dimension, which
together give a complete picture of the problem size. Similar changes were
made for outputting the register blocksizes when reporting results for the
micro-kernel test modules.

commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 10 14:20:06 2013 -0500

Added test suite modules for level-1f, 3 kernels.

Details:
- Added test modules in test suite for level-1f kernels and level-3
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
facilitate the level-1f test modules. Also added front-end for dupl
operation.
- Added obj_t-based check routines for level-1f operations, which are
called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
factors as a function of datatype, which is needed by their respective
test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.

commit 680188d46bb15b9a1a2867638104939dc77ca2a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 10 13:23:37 2013 -0500

Cleaned up old test drivers.

Details:
- Minor updates to old test drivers in preparation for our participation
in ACM TOMS's replicated results initiative.

commit 3690bdd4f95769c935c410414112102cc3e108b1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 10 11:45:33 2013 -0500

More updates to level-1f kernels for core2-sse3.

Details:
- Changed types in function signatures to match new prototypes. Meant to
include this in previous commit.

commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 10 11:27:27 2013 -0500

Fixed outdated fusing factor macros in 1f kernels.

Details:
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
this out.

commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 1 17:01:18 2013 -0500

Added section overrides to test suite.

Details:
- Added new lines of input to the test suite's input.operations file, which
allows the user to disable entire sections (levels) of tests. Before this
change, the user had to manually disable each operation tests's "master
switch". (This is why input.operations.0 existed: to allow a more
convenient starting point for someone who only wanted to test one or a
few operations.)

commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 30 12:58:18 2013 -0500

Added template implementations and other tweaks.

Details:
- Added a 'template' configuration, which contains stub implementations of the
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
dotxaxpyf, as well as the default fusing factor (which are all equal
in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
reference variants were implemented in terms of front-end routines rather
that directly in terms of the kernels. (For example, axpy2v was implemented
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
frame/3/gemm/ukernels and frame/3/trsm/ukernels.

commit 97aaf220a847363b4da35935eca17790c0ef71f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 17 10:51:36 2013 -0500

Added new kernels, configurations.

Details:
- Added various micro-kernels for the following architectures:
Intel MIC
IBM BG/Q
IBM Power7
AMD Piledriver
Loogson 3A
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
"clarksville" configuration to "dunnington".

commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 13 14:31:53 2013 -0500

Removed default configuration behavior.

Details:
- Changed the configure script so that it no longer defaults to the
reference configuration. This change is being made so that the
developer has a firm awareness of which configuration is being used
to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
suggested change.

commit da77e9614f54f92f703f01e3b9bd67a83280150c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 13 12:00:37 2013 -0500

Minor improvements to static memory allocator.

Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
functionality includes computing the pool size for each datatype (using
that datatype's cache blocksizes) and using the maximum to size the
actual pool array. This addresses the somewhat common pitfall whereby a
developer updates cache blocksizes in bli_kernel.h for only one datatype
(say, single-precision real), while the memory pools are sized using the
double-precision real values. Then, when the developer attempts to link
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
a message saying the static memory pool was exhausted. Clearly, this
message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
meant to check for size consistency among the various cache blocksizes.
(Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
reasonable place to put these constants, rather than further crowd up
bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
I hadn't used it in months. We can always re-create it later.

commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 10 17:17:28 2013 -0500

Added ESSL and Accelerate targets to test drivers.

Details:
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
/ providing this patch.

commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 10 16:35:12 2013 -0500

Various changes to treatment of integers.

Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
assigned values of 32, 64, or some other value. The former two result in
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
with format specifiers not matching the types of their arguments to printf()
and scanf().
- Inserted missing include "bli_system.h" into blis.h (which was slated for
inclusion in d141f9eeb6d1).
- Added explicit typecasting of dim_t and inc_t to macros in
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
Windows-style CRLF newlines for certain files.

commit 068437736b41d51a1f5ec47839f059bf58a20413
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 9 14:07:58 2013 -0500

Fixed set-but-not-used compiler (gcc) warnings.

Details:
- Used void-casts of certain variables to appease gcc (and perhaps other
compilers) when such variables are only used in the complex instances of
the functions. Special thanks to Karl Rupp for suggesting a portable fix
for these warnings.

commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 9 13:48:52 2013 -0500

Small fix to Windows defs.mk makefile fragment.

Details:
- Commented out a !include statement that was attempting to include a
version file that does not yet exist. For now, the version string is
hard-coded into defs.mk.

commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 9 13:09:16 2013 -0500

Added Windows build system.

Details:
- Added a 'windows' directory, which contains a Windows build system
similar to that of libflame's. Thanks to Martin for getting this up
and running.
- Spun off system header includes into bli_system.h, which is included
in blis.h
- Added a Windows section to bli_clock.c (similar to libflame's).

commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 9 11:04:46 2013 -0500

Edited bli_?lamch.c to avoid Windows keyword.

Details:
- Renamed "small" variable to "smnum" to avoid collision with Windows type
by the same name. This change is needed in advance of the upcoming Windows
build system.

commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Sep 4 13:36:07 2013 -0500

Switched integer typedefs (again) to C types.

Details:
- Redefined gint_t and guint_t in terms of the standard C types long int
and unsigned long int, respectively.
- Changed testsuite default max problem size to 500.
- Changed testsuite input.operations to use square problems for level-3
operation tests.

commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Sep 4 12:09:11 2013 -0500

Falling back to 32-bit integers for dim_t, etc.

Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
I've changed the default typedef for gint_t and guint_t from int64_t and
uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
to introductory output of the testsuite.

commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 3 21:58:07 2013 -0500

Applied temp fix to typecasting bug in testsuite.

Details:
- Applied a temporary fix to the typecasting bug in the testsuite driver.
The fix involves casting both numerator and denominator to unsigned long.
This fix is more voodoo than science, as I can't be sure why it even
works.

commit 9ee6e125373869c4213c017ce772c38ecefba103
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Sep 3 21:53:27 2013 -0500

Changed dimension spec for gemm in testsuite.

Details:
- Encounted a bizarre typecasting bug whereby the test suite was not
computing the proper dimension from the problem size and dimension
specification when the latter was set to -3. Will investigate.
Thanks to Fran for finding this "bug".

commit e8be081e68c385ab44d0fea8dade21d40c200b79
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 28 15:52:34 2013 -0500

Generalized matlab and file output in testsuite.

Details:
- Added a new option in input.general that allows outputting in
matlab/octave format so that one can output in matlab format
independently from outputting to files.
- Adjusted input.operations according to above.
- Added input.operations.0 and input.operations.1 with all options
disabled and enabled, respectively.

commit d352c746e5683037d41b5061dfb5ce08e1d0843b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 27 13:41:46 2013 -0500

Added single/real gemm micro-kernel for x86_64.

Details:
- Added a single-precision real gemm micro-kernel in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
- Adjusted the single-precision real register blocksizes in
config/clarksville/bli_kernel.h to be 8x4.
- Added a missing comment to bli_packm_blk_var2.c that was present in
bli_packm_blk_var3.c

commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 19 12:07:41 2013 -0500

Fixed bug in bli_acquire_mpart_t2b(), _l2r().

Details:
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
that cause incorrect partitioning when SUBPART0 was requested. This
bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
this bug.
- Removed dupl kernels from kernels/x86_64/3 directory.
- Uncommented beta == 0 optimizaition code in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.

commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 8 14:39:35 2013 -0500

Moved init_safe(), finalize_safe() to BLAS compat.

Details:
- Moved the bli_init_safe() and bli_finalize_safe() function calls from the
BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
initializers in the BLIS layer wasn't buying us anything because the user
could still call the library with uninitialized global scalar constants,
for example. Thus, we will just have to live with the constraint that
bli_init() MUST be called before calling ANY routine with a bli_ prefix.
- Added the missing _init_safe() and finalize_safe() calls to the level-1
BLAS compatibility wrappers.

commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 8 13:30:19 2013 -0500

Miscellaneous updates.

Details:
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
BLIS_CACHE_LINE_SIZE (typically 64).
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
kernels.
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
kernels, in that the interior and edge-case handling is expressed once
inside the loops in the n and m dimensions, rather than the edge-case
handling being "unrolled" and expressed as distinct code regions. The
previous macro-kernel now lives in retired form in the subdirectory
other/bli_gemm_ker_var2.c.old.
- Updated experimental gemm_ker_var5 according to above change.
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
applied to optimize the macro-kernel accesses pattern on C when C is
row-stored.
- Various updates inside of test/exec_sizes.

commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 7 12:27:04 2013 -0500

Fixed bug in interface of bla_ger_check().

Details:
- Fixed the misplaced lda parameter in the function signature of
bla_ger_check(). Thanks to Tyler for finding this bug.

commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 6 12:25:51 2013 -0500

Fixed cpp guard typos in frame/compat/check files.

Details:
- Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
- Fixed various syntax errors in the code that had yet to be compiled
due to the aforementioned bug.

commit f4ec28e723d28d998f1038f82da6986e44320ef6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 1 11:24:23 2013 -0500

Added basic OpenMP-based gemm and packm files.

Details:
- Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
into the following auxiliary files

frame/1m/packm/other/bli_packm_blk_var2.c
frame/3/gemm/other/bli_gemm_ker_var2.c

The routine in the first file uses a basic OpenMP parallel region to
parallelize the packing of blocks of A and panels of B, while the
second uses a similar parallel region to parallelize along the n
dimension of the gemm macro-kernel.

commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
Merge: 67a8b949 6e7e4523
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 26 11:14:27 2013 -0500

Merge branch 'master' of https://code.google.com/p/blis

commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 26 11:12:37 2013 -0500

Added missing cpp kernel blocksize constraints.

Details:
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
constraints on the register blocksizes relative to the cache blocksizes.
Thanks to Tyler for helping me stumble across this issue.

commit 6e7e452343014e8f86640874dc1dbadca4a642a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 22 14:50:57 2013 -0500

Fixed minor warnings and misc issues.

Details:
- Fixed various warnings output by gcc 4.6.3-1, including removing some
set-but-not-used variables and addressing some instances of typecasting
of pointer types to integer types of different sizes.

commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 22 12:54:32 2013 -0500

Tightened some macros that detect datatypes.

Details:
- Modified the definitions of some macros, such as bli_is_real(), so that
the "special" bit is taken into account so that BLIS_INT is differentiated
from BLIS_FLOAT.
- Whitespace changes to bli_obj_macro_defs.h.
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
being used.

commit b33e2f4443b9043b554963320280ff7783773652
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 19 17:15:03 2013 -0500

CHANGELOG update (for 0.0.9).

0.0.9

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 18 18:04:34 2013 -0500

Added BLAS error checking to compatibility layer.

Details:
- Added frame/compat/check directory, which now houses companion _check()
routines for each of the BLAS wrappers in frame/compat. These _check()
routines are called from the compatibility wrappers and mimic the
error-checking present in the netlib BLAS.
- Edited bla_xerbla.c so that xerbla() translates the operation string to
uppercase before printing.
- Redefined util routines in frame/compat/f2c/util in terms of level0
macros.
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
- Commented out prototypes in test/test_*.c since Fortran integers are now
int64_t by default (and the prototypes that were present in the files
used int).
- Removed redundant include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
since blis.h was already being included.
- Other minor changes to code in frame/compat/f2c.

commit 4e80ad28c97273db3366428ec44020da7944964d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 18 17:53:31 2013 -0500

Added support for C99 complex types/arithmetic.

Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
complex arithmetic to the scalar-level macros in include/level0. This
includes a somewhat substantial reorganization and re-layering of much
of the existing machinery present in the level0 macros.
- Added new define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
commented-out by default, which optionally enables the use of built-in
C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
used (bli_proj_dt_to_real_if_imag_eq0).

commit 6072d7c848e837ba20d607f7b727438ada31bdcf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 17 12:27:45 2013 -0500

Fixed bugs in trsm, trmm macro-kernels.

Details:
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
incorrectly being adjusted upward by MR, instead of NR. The rl and ru
trmm macro-kernels were updated in a similar fashion.
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
diagoffb when recomputing k to skip a zero region below where the
diagonal intersects the right side of the block. The corresponding
trmm macro-kernel was also updated.
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
needed to be placed AFTER the block that recomputes k to skip the zero
region (if present). The other three trsm macro-kernels, as well as the
trmm macro-kernels, were updated in the same manner, for consistency.
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
being updated to skip a zero region to the left of where the diagonal
of A intersects the top edge of the block.
- Comment updates to all trsm and trmm macro-kernels.
- Comment updates to bli_packm_init.c.

commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 10 14:53:59 2013 -0500

Added f2c'ed Givens rotation wrappers.

Details:
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
along with other wrappers for which no BLIS implementation exists.
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
rotm, and rotmg operations.

commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 10 13:40:12 2013 -0500

Removed copynz defs from bli_kernel.h files.

Details:
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
configuration. (Meant to include this in previous commit.)

commit aec12d90f596e8c04b1ad178258a1cd38108f59d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 10 13:33:30 2013 -0500

Removed copynzv, copynzm and related codes.

Details:
- Removed copynzv and copynzm operation directories. These operations
implemented a variation of copyv/m that, in the case of real source
and complex destination operands, leaves the imaginary component
untouched (rather than setting it to zero). I realize now that the
special case(s) (e.g. gemm with real A and B but complex C) that I
thought required this operation actually can be handled more simply.
- Removed level0 scalar macros implementing copynzs, copynzjs.

commit b0a0a0f274a761788531b5d281cc3b411b7124ed
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 9 17:15:38 2013 -0500

Added handling of restrict, stdint.h for non-C99.

Details:
- Removed the include <stdint.h> from blis.h and inserted a cpp macro block
in bli_type_defs.h that includes <stdint.h> for C++ and C99, and otherwise
manually typedefs the types we need (which, for now, are unconditionally
int64_t and uint64_t).
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
- Added cpp macro block to bli_macro_defs.h that defines restrict as
nothing for C++ and non-C99.

commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 8 15:20:34 2013 -0500

Migrated integer usage to stdint.h types.

Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
of char.
- Updated bla_amax() wrappers so that the return type is defined directly
as f77_int, rather than letting the prototype-generating macro decide
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
so I removed them. Also, changed the body of the wrapper so that a
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
doublecomplex types so that they use .real and .imag instead (now that
we are using scomplex and dcomplex).

commit 372501398564fdba3d5a3db86c30bc1039b185ff
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 8 11:24:18 2013 -0500

Added experimental bli_gemm_ker_var5().

Details:
- Added support for an experimental gemm macro-kernel incrementally
packs one micro-panel of B at a time. This is useful for certain
special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.

commit 9915d667a79f23e3a2a2516247c560e9063a1646
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 7 13:28:39 2013 -0500

Defined "total" blocksize query functions.

Details:
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
the default blocksize plus blocksize extension (using the type or the type
of an object).
- Comment update in bli_packm_cxk.c.

commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 27 13:19:56 2013 -0500

Consolidated lower/upper her[2]k blocked variants.

Details:
- Consolidated lower and upper blocked variants for herk and her2k, and
renamed the resulting variants, according to the same changes recently
made to trmm and trsm.
- Implemented support for four new subpartitions types:
BLIS_SUBPART1T
BLIS_SUBPART1B
BLIS_SUBPART1L
BLIS_SUBPART1R
which correspond to "merged" partitions that include the middle "1"
partition as well as either the neighboring "0" or "2" partition. This is
used to clean up code in herk/her2k var2 that attempts to partition away
the strictly zero region above or below the diagonal of a matrix operand
that is being marched through diagonally.
- Added safeguards to herk macro-kernels that skip any leading or trailing
zero region in the panel of C that is passed in. This is now needed given
that herk/her2k var1 no longer partitions off this zero region before
calling the macro-kernel (via bli_her[2]k_int()).
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.

commit 02002ef6f3d2746665982793db36714bd69bccc9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 24 17:08:14 2013 -0500

Added row-storage optimizations for trmm, trsm.

Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
side case is now handled explicitly, rather than induced indirectly by
transposing and swapping strides on operands. This allows us to walk through
the output matrix with favorable access patterns no matter how it is stored,
for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
lower/upper distinction. Instead, we simply label the variants by which
dimension is partitioned and whether the variant marches forwards or
backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
(as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
blocksize extensions (if non-zero) were not being used to appropriately size
the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
packing of A (left-hand operand) and B (right-hand operand) is done with
NR and MR, respectively (instead of MR and NR).

commit d1e81ddc848ee47bc188735883d14582bdd0cabc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 13 11:14:21 2013 -0500

Minor generalizing tweaks to trmm blk var1, var2.

commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 12 16:40:04 2013 -0500

CHANGELOG update.

0.0.8

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 12 16:02:12 2013 -0500

Use separate CFLAGS for "kernels" directories.

Details:
- Added a new "special" directory type: any source code within directories
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
compiler flags. This allows the developer to specify a separate set of
flags (e.g. optimization flags) for compiling kernels while maintaining a
standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
according to above changes.

commit 08475e7c7653ba598665071a617d10f0d8f763c2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 11 12:18:39 2013 -0500

Various level-3 optimizations for row storage.

Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
packing from a lower or upper-stored symmetric/Hermitian matrix to column
panels (which are row-stored). Previously one could only pack to row panels
(which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
execution datatypes.

commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 7 11:04:10 2013 -0500

Added beta == 0 optimization to x86_64 ukernel.

Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
schemes?" switch is disabled, which would result in redundant tests being
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.

commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 6 13:36:06 2013 -0500

Whitespace changes to old test drivers.

Details:
- Replaced tabs with four spaces in places where indention was already
in place.

commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 4 14:57:46 2013 -0500

Fixed unaligned handling in axpyf, dotxaxpyf.

Details:
- Fixed over-cautious handling of unaligned operands in vector instrinsic
implementation of axpyf kernel.
- Fixed over- and under-cautious handling of unaligned operands in vector
intrinsic implementation of dotxaxpyf kernel.

commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 3 16:54:52 2013 -0500

Updated level-1/-1f [vector intrinsic] kernels.

Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.

commit 0288c827d3659bb225ac9c10f168b623ed0106a2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 1 08:02:23 2013 -0500

Updated ukernels for x86_64.

Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
recently).

commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 6 11:05:08 2013 -0500

Replaced axpys usage with subs in trsv.

Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
sizeof(dcomplex).

commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 24 16:28:10 2013 -0500

Fixed x86_64 kernel bugs and other minor issues.

Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
unaligned subpartitions. We were already going out of our way a bit to
handle edge cases in the first iteration for blocked variants, and this
was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
into account how the choice of variant needed to be altered for
upper-stored matrices (given that only lower-stored algorithms are
explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
macros to provide inlined versions of bli_determine_blocksize_[fb]() for
use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
consistency with that of the bugfix for trmv/trsv (both of which now
use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
was invalid only because the code was expecting 1 (for purposes of
performing contiguous vector loads) but got a value greater than 1 because
the column stride of the object (e.g. rho) was inflated for alignment
purposes (albeit unnecessarily since there is only one element in the
object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
internal back-ends to correctly handle cases where output operand still
needs to be scaled (e.g. by beta, in the case of gemm with k = 0).

commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 3 17:35:32 2013 -0500

Renamed _trans_status() macro.

Details:
- Mistakenly forgot to rename the _trans_status() macro and instances in
previous commit.

commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 3 17:24:58 2013 -0500

Renamed _set_trans(), _trans_status() macros.

Details:
- Renamed the following macros:
bli_obj_set_trans() -> bli_obj_set_onlytrans()
bli_obj_trans_status() -> bli_obj_onlytrans_status()
to remove ambiguity as to which bits are read/updated.

commit 2f8174509ea9f844db11ebd9389de5168e85b132
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 1 15:06:30 2013 -0500

Unconditionally check memory pool(s) for errors.

Details:
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
memory pool is exhausted before checking out and returning a block, even
if BLIS error checking has been disabled. These errors are useful because
they likely indicate that BLIS was improperly configured for the code
being run.

commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 1 15:00:30 2013 -0500

CHANGELOG update.

0.0.7

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 30 19:35:54 2013 -0500

Absorbed blocksize extensions into main objects.

Details:
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
happens if and only if cache blocksizes are created with non-zero
extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
blocksize extension mechanism is now available for use.

commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 25 17:16:59 2013 -0500

Added option to disable err checking in testsuite.

Details:
- Added a new line to input.general that allows one to specify the error-
checking level to use for each BLIS experiment. The only two levels
supported for now are "no error checking" and "full error checking".

commit 096b366ddcfe386f44419ef84d8df8be13825f86
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 25 16:43:43 2013 -0500

Use cntl trees that block in n dimension.

Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
algorithms that first paritition in the n dimension with a blocksize
of NC. Typically this is not an issue since only very large problems
exceed that of NC. But developers often run very large problems, and
so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
bli_param_macro_defs.h.

commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 25 12:06:12 2013 -0500

Use PASTEMAC in macro-kernels (over MAC2 or MAC3).

Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.

commit df80acf517dde180ddcc5835c6136b2fa7556d4b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 19:43:23 2013 -0500

Fixed computation of b_next in L3 macro-kernels.

Details:
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
and trsm, in that the edge cases are captured by the main loop, rather
than trying to have "cleanup" sections that result in four distinct
parts (interior, bottom edge, right edge, bottom-right edge) of the
code.
- Fixed the way b_next was being computed in the non-gemm level-3
macro-kernels (herk, trmm, trsm). The way they are computed now matches
that of gemm.

commit 3671528cf8efe4b445d196665143a5c50c2c6048
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 19:12:14 2013 -0500

Fixed minor bug in computing b_next in gemm.

commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 17:49:10 2013 -0500

Fixed rare edge case bug in herk_l macro-kernel.

Details:
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
chosen to be much larger than NR, then one could encounter edge cases
in the the MC dimension that fall entirely below the diagonal, which
the previous implementation of the herk_l macro-kernel was not allowing
for.

commit 1dab11e37d1cb403cbe75b73a644c00de534f104
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 17:17:11 2013 -0500

Updated x86 gemmtrsm ukernels to use alpha.

commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 16:00:18 2013 -0500

Added a_next, b_next arguments to micro-kernels.

Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
addresses of the next micro-panels of A and B. By passing these
pointers into the micro-kernel, we allow the micro-kernel author to
prefetch micro-panels of A and B as necessary (though this is
completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
a_next and b_next. Note that ONLY the gemm macro-kernel computes
a_next and b_next with the precise semantics we want. I will go back
and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.

commit f3815dc84d385c514a5acaf1e925424a57be2f51
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 23 11:12:33 2013 -0500

Added code for backward edge-case blocking.

Disabled:
- Edited bli_determine_blocksize_b() to include experimental (and
currently disabled) code that computes extended blocks.
- Updated commnts relate to above changes.
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.

commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 22 19:00:43 2013 -0500

Updated dupl implementation to use PACKNR and NR.

Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
explicitly so navigate b1 so that situations where PACKNR > NR are
supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.

commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Apr 21 15:10:34 2013 -0500

Disabled blocksize checks for memory pools.

Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
by the contiguous memory allocator for all types, given that the values for
double precision real are the ones used to allocate the space. These checks
can easily go awry in certain situations, especially if you are developing for
only one datatype. So for now, they are probably more trouble than they are
worth.

commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Apr 21 15:00:24 2013 -0500

Allow ldim of packed micro-panels != MR, NR.

Details:
- Made substantial changes throughout the framework to decouple the leading
dimension (row or column stride) used within each packed micro-panel from
the corresponding register blocksize. It appears advantageous on some
systems to use, for example, packed micro-panels of A where the column
stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
to use when packing micro-panels of A and B.
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
where appropriate, instead of MR and NR.
- Added pd field (panel dimension) to obj_t.
- New interface to bli_packm_cntl_obj_create().
- Renamed bli_obj_packed_length()/_width() macros to
bli_obj_padded_length()/_width().
- Removed local defines for cache/register blocksizes in level-3 *_cntl.c.
- Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
blocksize for edge cases, which can improve performance at the margins.

commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Apr 19 15:26:29 2013 -0500

Fixed bug in compatibility layer (her2k/syr2k).

Details:
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
Marker for reporting the behavior that led to this bug.

commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 18 19:39:13 2013 -0500

Changed old level3 test drivers to call front-ends.

Details:
- Changed old level-3 test drivers, in 'test' directory, to always call the
front-end object API instead of the internal back-end with the locally
defined control tree.

commit 83e45de23e565138b8fde06fb11cfedc973b7246
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 18 18:33:03 2013 -0500

Allow packm_init() to reacquire a too-small mem_t.

Details:
- Changed bli_packm_init() to react differently to a situation where a pack
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
than what will be needed to hold the block/panel that now needs to be
packed. Previously, this situation was treated with an abort() since I
assumed something was horribly wrong. I have changed the code so that it now
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
new information. (This change was done at the request of Bryan Marker to
facilitate code generation via DxT.)

commit a6990434173b0cf651f8521194f3aef738deb7d2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 18 13:52:47 2013 -0500

Fixed bug in packing block of A for hemm/symm.

Details:
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
symmetric matrix where the block of A being packed intersects the diagonal,
but some of its micro-panels do not intersect the diagonal and lie completely
in the unstored region. Thanks to Francisco Igual for reporting this bug.
- Comment updates to both _blk_var2.c and _blk_var3.c.

commit c92e7590e1934f830814ab614c794215ebe0c415
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 17 20:53:29 2013 -0500

Activated bli_packm_acquire_mpart_t2b().

Details:
- Removed the overly-paranoid bli_abort() from the end of
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
partitioning through packed blocks of A. Also, and more importantly,
changed an earlier check that was causing an erroneous (but
coincidentally redundant) abort(). Also, updated some of the comments
in bli_packm_part.c.

commit bea579e9f009a44e08008eb14d09f38748ab2b53
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 16 19:43:14 2013 -0500

Allow creation of "empty" objects.

Details:
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
modified bli_adjust_strides() to explicitly handle m = n = 0.
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.

commit 7904e20f2e6908571ee5008da2a08084198eefae
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 16 17:37:16 2013 -0500

Fixed "root" object bug in bli_her[2]k/syr[2]k.

Details:
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
that manifested as the incorrect triangle being updated. It occurred when
the user would pass in a matrix object that was correctly marked as
symmetric/Hermitian and lower-stored, but whose root object was never marked
as lower (or upper). We now alias and re-assign root status for matrix C
within the front-ends. Note that trmm and trsm were already doing this,
albeit for a slightly different reason (to allow the internal back-end to
choose which algorithm to run--lower or upper--based on the uplo of the root
object for both left and right side cases). Thanks to Bryan Marker for
leading me to this bug.

commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 16 11:24:03 2013 -0500

Fixed overzealous type-checking in bli_getsc().

Details:
- Relaxed type checking in getsc so that the input object could be a constant
and not just a proper floating-point type. (If it is a constant, default to
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.

commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 19:27:57 2013 -0500

Fixed bug in bli_obj_is_packed() and renamed.

Details:
- This macro is used to determine whether the partitioning routines should
call a corresponding packm_part routine instead. However, it was
unintentionally catching matrices that were marked as "packed" by virtue
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
checks for row or column panel packing. (Note that I first attempted to
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
erroneous behavior that led me to this bug.

commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 17:54:43 2013 -0500

Removed local reference ukernel blocksize macros.

Details:
- Removed locally defined gemm microkernel blocksize macros from _mxn
reference microkernel definition and header. Meant to include this in
a recent/previous commit (0020ef7c8271).

commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 14:40:31 2013 -0500

Formatting change to mods in previous commit.

commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 14:31:40 2013 -0500

Set structure of objects in level-2 BLIS APIs.

Details:
- Added missing statement to set structure field of local objects in
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
reporting this bug.

commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 10:21:26 2013 -0500

Tweak to test suite function string construction.

Details:
- Fixed a minor bug in the way that the test suite would construct function
name strings when the user anchored all parameters in input.operations.
In this case, the test driver would mistake this situation for one where
the operation simply had no parameters to begin with, and thus would not
include the parameter string in the function string that is output for
every result.

commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 15 09:59:46 2013 -0500

Fixed a bug in reference implementation of dupl.

Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
creation interface.
- Added 'restrict' to x86 gemm microkernel interface.

commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Apr 14 19:05:33 2013 -0500

Modified bli_kernel.h include order in blis.h.

Details:
- Delayed include of bli_kernel.h in blis.h to prevent a situation where
_kernel.h includes an optimized microkernel header, which uses BLIS types
such as dim_t and inc_t, which would precede the definition of those types
in bli_type_defs.h.
- Moved the include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
(immediately after that of bli_kernel.h).

commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 13 16:53:16 2013 -0500

CHANGELOG update.

Page 5 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.