Blis

Latest version: v0.9.1

Safety actively analyzes 628936 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 7

0.6.0

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 3 18:37:19 2019 -0500

Version file update (0.6.0)

commit 0f1b3bf49eb593ca7bb08b68a7209f7cd550f912
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 3 18:35:19 2019 -0500

ReleaseNotes.md update in advance of next version.

Details:
- Updated ReleaseNotes.md in preparation for next version.
- CREDITS file update.

commit 27da2e8400d900855da0d834b5417d7e83f21de1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 3 17:14:56 2019 -0500

Minor edits to docs/PerformanceSmall.md.

Details:
- Added performance analysis to "Comments" section of both Kaby Lake and
Epyc sections.
- Added emphasis to certain passages.

commit 09ba05c6f87efbaadf085497dc137845f16ee9c5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 3 16:53:19 2019 -0500

Added sup performance graphs/document to 'docs'.

Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
publishes new performance graphs for Kaby Lake and Epyc showcasing
the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
document.

commit 6bf449cc6941734748034de0e9af22b75f1d6ba1
Merge: abd8a9fa a4e8801d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 31 17:42:40 2019 -0500

Merge branch 'amd'

commit a4e8801d08d81fa42ebea6a05a990de8dcedc803
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 31 17:30:51 2019 -0500

Increased MT sup threshold for double to 201.

Details:
- Fine-tuned the double-precision real MT threshold (which controls
whether the sup implementation kicks for smaller m dimension values)
from 180 to 201 for haswell and 180 to 256 for zen.
- Updated octave scripts in test/sup/octave to include a seventh column
to display performance for m = n = k.

commit 3a45ecb15456249c30ccccd60e42152f355615c1
Merge: 3f867c96 b69fb0b7
Author: Kiran Devrajegowda <Kiran.Devrajegowdaamd.com>
Date: Fri May 31 06:47:02 2019 -0400

Merge "Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup" into amd-staging-rome2.0

commit b69fb0b74a4756168de270fc9b18f7cf7aa57f17
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Fri May 31 15:14:22 2019 +0530

Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup

Change-Id: I9f5d8225254676a99c6f2b09a0825e545206d0fc

commit 3f867c96caea3bbbbeeff1995d90f6cf8c9895fb
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Fri May 31 12:22:44 2019 +0530

When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq

Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d

commit abd8a9fa7df4569aa2711964c19888b8e248901f (origin/pfhp)
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 28 12:49:44 2019 -0500

Inadvertantly hidden xerbla_() in blastest (313).

Details:
- Attempted a fix to issue 313, which reports that when building only
a shared library (ie: static library build is disabled), running the
BLAS test drivers can fail because those drivers provide their own
local version of xerbla_() as a clever (albeit still rather hackish)
way of checking the error codes that result from the individual tests.
This local xerbla_() function is never found at link-time because the
BLAS test drivers' Makefile imports BLIS compilation flags via the
get-user-cflags-for() function, which currently conveys the
-fvisibility=hidden flag, which hides symbols unless they are
explicitly annotated for export. The -fvisibility=hidden flag was
only ever intended for use when building BLIS (not for applications),
and so the attempted solution here is to omit the symbol export
flag(s) from get-user-cflags-for() by storing the symbol export
flag(s) to a new BULID_SYMFLAGS variable instead of appending it
to the subconfigurations' CMISCFLAGS variable (which is returned by
every get-*-cflags-for() function). Thanks to M. Zhou for reporting
this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
text.

commit 13806ba3b01ca0dd341f4720fb930f97e46710b0
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Mon May 27 16:24:43 2019 +0530

This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019

Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041

commit ee123f535872510f77100d3d55a43d4ca56047d5
Author: Meghana <meghana.vankadariamd.com>
Date: Mon May 27 15:36:44 2019 +0530

Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES

Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5

commit 9d93a4caa21402d3a90aac45d7a1603736c9fd63
Author: prangana <pradeep.raoamd.com>
Date: Fri May 24 17:59:13 2019 +0530

update version 2.0

commit 755730608d923538273a90c48bfdf77571f86519
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 23 17:34:36 2019 -0500

Minor rewording of language around mt env. vars.

commit ba31abe73c97c16c78fffc59a215761b8d9fd1f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 23 14:59:53 2019 -0500

Added BLIS theading info to Performance.md.

Details:
- Documented the BLIS environment variables that were set
(e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
threading configuration in order to achieve the parallelism reported
on in docs/Performance.md.

commit cb788ffc89cac03b44803620412a5e83450ca949
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 23 13:00:53 2019 -0500

Increased MT sup threshold for double to 180.

Details:
- Increased the double-precision real MT threshold (which controls
whether the sup implementation kicks for smaller m dimension values)
from 80 to 180, and this change was made for both haswell and zen
subconfigurations. This is less about the m dimension in particular
and more about facilitating a smoother performance transition when
m = n = k.

commit 057f5f3d211e7513f457ee6ca6c9555d00ad1e57
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 23 12:51:17 2019 -0500

Minor build system housekeeping.

Details:
- Commented out redundant setting of LIBBLIS_LINK within all driver-
level Makefiles. This variable is already set within common.mk, and
so the only time it should be overridden is if the user wants to link
to a different copy of libblis.
- Very minor changes to build/gen-make-frags/gen-make-frag.sh.
- Whitespace and inconsequential quoting change to configure.
- Moved top-level 'windows' directory into a new 'attic' directory.

commit e05171118c377f356f89c4daf8a0d5ddc5a4e4f7
Author: Meghana <meghana.vankadariamd.com>
Date: Thu May 23 16:15:27 2019 +0530

Implemented TRSM for small matrices for cases where A is on the right

Added separate kernels for zen and zen2

Change-Id: I6318ddc250cf82516c1aa4732718a35eae0c9134

commit 02920f5c480c42706b487e37b5ecc96c3555b851
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Thu May 23 15:29:59 2019 +0530

make checkblis fails for matrix dimension check at the begining hence reverting it

Change-Id: Ibd2ee8c2d4914598b72003fbfc5845be9c9c1e87

commit 84215022f29fb3bfedd254d041635308d177e6c0
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Thu May 23 11:08:41 2019 +0530

Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration

Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5

commit a3554eb1dcc1b5b94d81c60761b2f01c3d827ffa
Merge: ea082f83 17b878b6
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Thu May 23 11:51:07 2019 +0530

Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis to configure zen2

Change-Id: I97e17bca9716b80b862925f97bb513c07b4b0cae

commit ea082f839071dd9ec555062dc3851c31d12f00e4
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Thu May 23 10:38:29 2019 +0530

adding empty zen2 directory with .gitignore file

Change-Id: Ifa37cf54b2578aa19ad335372b44bca17043fe4b

commit b80bd5bcb2be8551a9a21fafc8e6c8b6336c99b5
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue May 21 15:11:47 2019 +0530

config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model is from 0x30H.
test/test_gemm.c - commented out define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples

Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2

commit a042db011df9a1c3e7c7ac546541f4746b176ea5
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Mon May 20 14:17:32 2019 +0530

Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0

Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43

commit a23f92594cf3d530e5794307fe97afc877d853b7
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Mon May 20 10:48:06 2019 +0530

config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
frame/base/bli_cpuid.c : ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; endif, defined new function bool bli_cpuid_is_zen2(...).
frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
frame/base/bli_gks.c : ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); endif
frame/include/bli_arch_config.h : ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) endif ifdef BLIS_FAMILY_ZEN2 include "bli_family_zen2.h" endif
frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20

Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866

commit 17b878b66d917d50b6fe23721d8579e826cb3e8c
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Wed May 22 14:02:53 2019 +0530

adding license same as in ut-austin-amd-branch

Change-Id: I6790768d2bf5d42369d304ef93e34701f95fbaff

commit df755848b8a271323e007c7a628c64af63deab00
Merge: ca4b33c0 c72ae27a
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Wed May 22 13:30:07 2019 +0530

Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis into rome2.0

Change-Id: Ie8aad1ab810f0f3c0b90ec67f9dd3dfb8dcc74cc

commit c72ae27adee4726679ee004d02c972582b5285b4
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Mar 19 12:49:26 2018 +0530

Re-enabling the small matrix gemm optimization for target zen

Change-Id: I13872784586984634d728cd99a00f71c3f904395

commit ab0818af80f7f683080873f3fa24734b65267df2
Author: sraut <Biplab.Rautamd.com>
Date: Wed Oct 3 15:30:33 2018 +0530

Review comments incorporated for small TRSM.

Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9

commit 32392cfc72af7f42da817a129748349fb1951346
Author: Jeff Hammond <jeff.r.hammondintel.com>
Date: Tue May 14 15:52:30 2019 -0400

add info about CXX in configure (311)

commit fa7e6b182b8365465ade178b0e4cd344ff6f6460
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 1 19:13:00 2019 -0500

Define _POSIX_C_SOURCE in bli_system.h.

Details:
- Added
ifndef _POSIX_C_SOURCE
define _POSIX_C_SOURCE 200809L
endif
to bli_system.h so that an application that uses BLIS (specifically,
an application that includes blis.h) does not need to remember to
define the macro itself (either on the command line or in the code
that includes blis.h) in order to activate things like the pthreads.
Thanks to Christos Psarras for reporting this issue and suggesting
this fix.
- Commented out include <sys/time.h> in bli_system.h, since I don't
think this header is used/needed anymore.
- Comment update to function macro for bli_?normiv_unb_var1() in
frame/util/bli_util_unb_var1.c.

commit 3df84f1b5d5e1146bb01bfc466ac20c60a9cc859
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 27 21:27:32 2019 -0500

Minor bugfixes in sup dgemm implementation.

Details:
- Fixed an obscure but in the bli_dgemmsup_rv_haswell_asm_5x8n() kernel
that only affected the beta == 0, column-storage output case. Thanks
to the BLAS test drivers for catching this bug.
- Previously, bli_gemmsup_ref_var1n() and _var2m() were returning if
k = 0, when the correct action would be to scale by beta (and then
return). Thanks to the BLAS test drivers to catching this bug.
- Changed the sup threshold behavior such that the sup implementation
only kicks in if a matrix dimension is strictly less than (rather than
less than or equal to) the threshold in question.
- Initialize all thresholds to zero (instead of 10) by default in
ref_kernels/bli_cntx_ref.c. This, combined with the above change to
threshold testing means that calls to BLIS or BLAS with one or more
matrix dimensions of zero will no longer trigger the sup
implementation.
- Added disabled debugging output to frame/3/bli_l3_sup.c (for future
use, perhaps).

commit ecbdd1c42dcebfecd729fe351e6bb0076aba7d81
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 27 19:38:11 2019 -0500

Ceased use of BLIS_ENABLE_SUP_MR/NR_EXT macros.

Details:
- Removed already limited use of the BLIS_ENABLE_SUP_MR_EXT and
BLIS_ENABLE_SUP_NR_EXT macros in bli_gemmsup_ref_var1n() and
bli_gemmsup_ref_var2m(). Their purpose was merely to avoid a long
conditional that would determine whether to allow the last iteration
to be merged with the second-to-last iteration. Functionally, the
macros were not needed, and they ended up causing problems when
building configuration families such as intel64 and x86_64.

commit aa8a6bec3036a41e1bff2034f8ef6766a704ec49
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 27 18:53:33 2019 -0500

Fixed typo in --disable-sup-handling macro guard.

Details:
- Fixed an incorrectly-named macro guard that is intended to allow
disabling of the sup framework via the configure option
--disable-sup-handling. In this case, the preprocessor macro,
BLIS_DISABLE_SUP_HANDLING, was still named by its name from an older
uncommitted version of the code (BLIS_DISABLE_SM_HANDLING).

commit b9c9f03502c78a63cfcc21654b06e9089e2a3822
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 27 18:44:50 2019 -0500

Implemented gemm on skinny/unpacked matrices.

Details:
- Implemented a new sub-framework within BLIS to support the management
of code and kernels that specifically target matrix problems for which
at least one dimension is deemed to be small, which can result in long
and skinny matrix operands that are ill-suited for the conventional
level-3 implementations in BLIS. The new framework tackles the problem
in two ways. First the stripped-down algorithmic loops forgo the
packing that is famously performed in the classic code path. That is,
the computation is performed by a new family of kernels tailored
specifically for operating on the source matrices as-is (unpacked).
Second, these new kernels will typically (and in the case of haswell
and zen, do in fact) include separate assembly sub-kernels for
handling of edge cases, which helps smooth performance when performing
problems whose m and n dimension are not naturally multiples of the
register blocksizes. In a reference to the sub-framework's purpose of
supporting skinny/unpacked level-3 operations, the "sup" operation
suffix (e.g. gemmsup) is typically used to denote a separate namespace
for related code and kernels. NOTE: Since the sup framework does not
perform any packing, it targets row- and column-stored matrices A, B,
and C. For now, if any matrix has non-unit strides in both dimensions,
the problem is computed by the conventional implementation.
- Implemented the default sup handler as a front-end to two variants.
bli_gemmsup_ref_var2() provides a block-panel variant (in which the
2nd loop around the microkernel iterates over n and the 1st loop
iterates over m), while bli_gemmsup_ref_var1() provides a panel-block
variant (2nd loop over m and 1st loop over n). However, these variants
are not used by default and provided for reference only. Instead, the
default sup handler calls _var2m() and _var1n(), which are similar
to _var2() and _var1(), respectively, except that they defer to the
sup kernel itself to iterate over the m and n dimension, respectively.
In other words, these variants rely not on microkernels, but on
so-called "millikernels" that iterate along m and k, or n and k.
The benefit of using millikernels is a reduction of function call
and related (local integer typecast) overhead as well as the ability
for the kernel to know which micropanel (A or B) will change during
the next iteration of the 1st loop, which allows it to focus its
prefetching on that micropanel. (In _var2m()'s millikernel, the upanel
of A changes while the same upanel of B is reused. In _var1n()'s, the
upanel of B changes while the upanel of A is reused.)
- Added a new configure option, --[en|dis]able-sup-handling, which is
enabled by default. However, the default thresholds at which the
default sup handler is activated are set to zero for each of the m, n,
and k dimensions, which effectively disables the implementation. (The
default sup handler only accepts the problem if at least one dimension
is smaller than or equal to its corresponding threshold. If all
dimensions are larger than their thresholds, the problem is rejected
by the sup front-end and control is passed back to the conventional
implementation, which proceeds normally.)
- Added support to the cntx_t structure to track new fields related to
the sup framework, most notably:
- sup thresholds: the thresholds at which the sup handler is called.
- sup handlers: the address of the function to call to implement
the level-3 skinny/unpacked matrix implementation.
- sup blocksizes: the register and cache blocksizes used by the sup
implementation (which may be the same or different from those used
by the conventional packm-based approach).
- sup kernels: the kernels that the handler will use in implementing
the sup functionality.
- sup kernel prefs: the IO preference of the sup kernels, which may
differ from the preferences of the conventional gemm microkernels'
IO preferences.
- Added a bool_t to the rntm_t structure that indicates whether sup
handling should be enabled/disabled. This allows per-call control
of whether the sup implementation is used, which is useful for test
drivers that wish to switch between the conventional and sup codes
without having to link to different copies of BLIS. The corresponding
accessor functions for this new bool_t are defined in bli_rntm.h.
- Implemented several row-preferential gemmsup kernels in a new
directory, kernels/haswell/3/sup. These kernels include two general
implementation types--'rd' and 'rv'--for the 6x8 base shape, with
two specialized millikernels that embed the 1st loop within the kernel
itself.
- Added ref_kernels/3/bli_gemmsup_ref.c, which provides reference
gemmsup microkernels. NOTE: These microkernels, unlike the current
crop of conventional (pack-based) microkernels, do not use constant
loop bounds. Additionally, their inner loop iterates over the k
dimension.
- Defined new typedef enums:
- stor3_t: captures the effective storage combination of the level-3
problem. Valid values are BLIS_RRR, BLIS_RRC, BLIS_RCR, etc. A
special value of BLIS_XXX is used to denote an arbitrary combination
which, in practice, means that at least one of the operands is
stored according to general stride.
- threshid_t: captures each of the three dimension thresholds.
- Changed bli_adjust_strides() in bli_obj.c so that bli_obj_create()
can be passed "-1, -1" as a lazy request for row storage. (Note that
"0, 0" is still accepted as a lazy request for column storage.)
- Added support for various instructions to bli_x86_asm_macros.h,
including imul, vhaddps/pd, and other instructions related to integer
vectors.
- Disabled the older small matrix handling code inserted by AMD in
bli_gemm_front.c, since the sup framework introduced in this commit
is intended to provide a more generalized solution.
- Added test/sup directory, which contains standalone performance test
drivers, a Makefile, a runme.sh script, and an 'octave' directory
containing scripts compatible with GNU Octave. (They also may work
with matlab, but if not, they are probably close to working.)
- Reinterpret the storage combination string (sc_str) in the various
level-3 testsuite modules (e.g. src/test_gemm.c) so that the order
of each matrix storage char is "cab" rather than "abc".
- Comment updates in level-3 BLAS API wrappers in frame/compat.

commit 0d549ceda822833bec192bbf80633599620c15d9
Author: Isuru Fernando <isurufgmail.com>
Date: Sat Apr 27 22:56:02 2019 +0000

make unix friendly archives on appveyor (310)

commit ca4b33c001f9e959c43b95a9a23f9df5adec7adf
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Wed Apr 24 15:02:39 2019 +0530

Added compiler option (-mno-avx256-split-unaligned-store) in the file config/zen/make_defs.mk to improve performance of intrinsic codes, this flag ensures compiler generates 256-bit stores for the equivalent intrinsics code.

Change-Id: I8f8cd81a3604869df18d38bc42097a04f178d324

commit 945928c650051c04d6900c7f4e9e29cd0e5b299f
Merge: 663f6629 74e513eb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 17 15:58:56 2019 -0500

Merge branch 'amd' of github.com:flame/blis into amd

commit 74e513eb6a6787a925d43cd1500277d54d86ab8f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Apr 17 13:34:44 2019 -0500

Support row storage in Eigen gemm test/3 driver.

Details:
- Added preprocessor branches to test/3/test_gemm.c to explicitly
support row-stored matrices. Column-stored matrices are also still
supported (and is the default for now). (This is mainly residual work
leftover from initial integration of Eigen into the test drivers, so
if we ever want to test Eigen with row-stored matrices, the code will
be ready to use, even if it is not yet integrated into the Makefile
in test/3.)

commit b5d457fae9bd75c4ca67f7bc7214e527aa248127
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 16 12:50:01 2019 -0500

Applied forgotten variable rename from 89a70cc.

Details:
- Somehow the variable name change (root_file_name -> root_inputname)
in flatten-headers.py mentioned in the commit log entry for 89a70cc
didn't make it into the actual commit. This commit applies that
change.

commit 89a70cccf869333147eb2559cdfa5a23dc915824
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Apr 11 18:33:08 2019 -0500

GNU-like handling of installation prefix et al.

Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
includedir, and sharedir (and also added an --exec-prefix option).
The defaults to these variables are set as follows:
prefix: /usr/local
exec_prefix: ${prefix}
libdir: ${exec_prefix}/lib
includedir: ${prefix}/include
sharedir: ${prefix}/share
The key change, aside from the addition of exec_prefix and its use to
define the default to libdir, is that the variables are substituted
into config.mk with quoting that delays evaluation, meaning the
substituted values may contain unevaluated references to other
variables (namely, ${prefix} and ${exec_prefix}). This more closely
follows GNU conventions, including those used by GNU autoconf, and
also allows make to override any one of the variables *after*
configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
top-level Makefile to use $(wildcard ...) instead of 'find'. This
was motivated by the new way of handling prefix and friends, which
leads to the 'find' command being run on /usr/local (by default),
which can take a while almost never yielding any benefit (since the
user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
non-verbose output) since those statements often end with file or
directory paths, which get confusing to read when puctuated by a
period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
over the years.)
- In configure script, changed the default state of threading_model
variable from 'no' to 'off' to match that of debug_type, where there
are similarly more than two valid states. ('no' is still accepted
if given via the --enable-debug= option, though it will be
standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
32812ff.
- CREDITS file update.

commit 9d76688ad90014a11ddc0c2f27253d62806216b1
Author: kdevraje <Kiran.Devrajegowdaamd.com>
Date: Thu Apr 11 10:22:48 2019 +0530

Fix for single rank crash with HPL application. When computing offset of C buffer, as integer variables are used for a row and column index, the intermediate result value overflows and a negative value gets added to the buffer, when the negative value is too large it would index the buffer out of the range resulting in segmentation fault. Although the crash is a result of dgemm kernel, added similar code in sgemm kernel also.

Change-Id: I171119b0ec0dfbd8e63f1fcd6609a94384aabd27

commit 32812ff5aba05d34c421fe1024a61f3e2d5e7052
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 9 12:20:19 2019 -0500

Minor bugfix to flatten-headers.py.

Details:
- Fixed a minor bug in flatten-headers.py whereby the script, upon
encountering a include directive for the root header file, would
erroneously recurse and inline the conents of that root header.
The script has been modified to avoid recursion into any headers
that share the same name as the root-level header that was passed
into the script. (Note: this bug didn't actually manifest in BLIS,
so it's merely a precaution for usage of flatten-headers.py in other
contexts.)

commit bec90e0b6aeb3c9b19589c2b700fda2d66f6ccdf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 2 17:45:13 2019 -0500

Minor update to docs/HardwareSupport.md document.

Details:
- Added more details and clarifying language to implications of 1m and
the recycling of microkernels between microarchitectures.

commit 89cd650e7be01b59aefaa85885a3ea78970351e4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 2 17:23:55 2019 -0500

Use void_fp for function pointers instead of void*.

Change void*-typed function pointers to void_fp.
- Updated all instances of void* variables that store function pointers
to variables of a new type, void_fp. Originally, I wanted to define
the type of void_fp as "void (*void_fp)( void )"--that is, a pointer
to a function with no return value and no arguments. However, once
I did this, I realized that gcc complains with incompatible pointer
type (-Wincompatible-pointer-types) warnings every time any such a
pointer is being assigned to its final, type-accurate function
pointer type. That is, gcc will silently typecast a void* to
another defined function pointer type (e.g. dscalv_ker_ft) during
an assignment from the former to the latter, but the same statement
will trigger a warning when typecasting from a void_fp type. I suspect
an explicit typecast is needed in order to avoid the warning, which
I'm not willing to insert at this time.
- Added a typedef to bli_type_defs.h defining void_fp as void*, along
with a commented-out version of the aborted definition described
above. (Note that POSIX requires that void* and function pointers
be interchangeable; it is the C standard that does not provide this
guarantee.)
- Comment updates to various _oapi.c files.

commit ffce3d632b284eb52474036096815ec38ca8dd5f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Apr 2 14:40:50 2019 -0500

Renamed armv8a gemm kernel filename.

Details:
- Renamed
kernels/armv8a/3/bli_gemm_armv8a_opt_4x4.c
to
kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c.
This follows the naming convention used by other kernel sets, most
notably haswell.

commit 77867478af02144544b4e7b6df5d54d874f3f93b
Author: Isuru Fernando <isurufgmail.com>
Date: Tue Apr 2 13:33:11 2019 -0500

Use pthreads on MinGW and Cygwin (307)

commit 7bc75882f02ce3470a357950878492e87e688cec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 28 17:40:50 2019 -0500

Updated Eigen results in docs/graphs with 3.3.90.

Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
results, this time using a development version cloned from their git
mirror on March 27, 2019 (version 3.3.90). Performance is improved
over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
test/3/matlab.

commit 20ea7a1217d3833db89a96158c42da2d6e968ed8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 27 18:09:17 2019 -0500

Minor text updates (Eigen) to docs/Performance.md.

Details:
- Added/updated a few more details, mostly regarding Eigen.

commit bfb7e1bc6af468e4ff22f7e27151ea400dcd318a
Merge: 044df950 2c85e1dd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 27 17:58:19 2019 -0500

Merge branch 'dev'

commit 2c85e1dd9d5d84da7228ea4ae6deec56a89b3a8f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 27 16:29:51 2019 -0500

Added Eigen results to performance graphs.

Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
docs/graphs to report on Eigen implementations, where applicable.
Specifically, Eigen implements all level-3 operations sequentially,
however, of those operations it only provides multithreaded gemm.
Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
omitted. Thanks to Sameer Agarwal for his help configuring and
using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.

commit bfac7e385f8061f2e6591de208b0acf852f04580
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 27 16:04:48 2019 -0500

Added ability to plot with Eigen in test/3/matlab.

Details:
- Updated matlab scripts in test/3/matlab to optionally plot/display
Eigen performance curves. Whether Eigen is plotted is determined by
a new boolean function parameter, with_eigen.
- Updated runme.m scratchpad to reflect the latest invocations of the
plot_panel_4x5() function (with Eigen plotting enabled).

commit 67535317b9411c90de7fa4cb5b0fdb8f61fdcd79
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 27 13:32:18 2019 -0500

Fixed mislabeled eigen output from test/3 drivers.

Details:
- Fixed the Makefile in test/3 so that it no longer incorrectly labels
the matlab output variables from Eigen-linked hemm, herk, trmm, and
trsm driver output as "vendor". (The gemm drivers were already
correctly outputing matlab variables containing the "eigen" label.)

commit 044df9506f823643c0cdd53e81ad3c27a9f9d4ff
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Mar 27 12:39:31 2019 -0500

Test with shared on windows (306)

Export macros can't support both shared and static at the same time.
When blis is built with both shared and static, headers assume that
shared is used at link time and dllimports the symbols with __imp_
prefix.

To use the headers with static libraries a user can give
-DBLIS_EXPORT= to import the symbol without the __imp_ prefix

commit 5e6b160c8a85e5e23bab0f64958a8acf4918a4ed
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 26 19:10:59 2019 -0500

Link to Eigen BLAS for non-gemm drivers in test/3.

Details:
- Adjusted test/3/Makefile so that the test drivers are linked against
Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
this since Eigen's headers don't define implementations to the
standard BLAS APIs.
- Simplified included headers in hemm, herk, trmm, and trsm source
driver files, since nothing specific to Eigen is needed at
compile-time for those operations.

commit e593221383aae19dfdc3f30539de80ed05cfec7f
Merge: 92fb9c87 c208b9dc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 26 15:51:45 2019 -0500

Merge branch 'master' into dev

commit 92fb9c87bf88b9f9c401eeecd9aa9c3521bc2adb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 26 15:43:23 2019 -0500

Add more support for Eigen to drivers in test/3.

Details:
- Use compile-time implementations of Eigen in test_gemm.c via new
EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
library is not necessary.) However, as of Eigen 3.3.7, Eigen only
parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
any other level-3 operation.
- Fixed a bug in trmm and trsm drivers whereby the wrong function
(bli_does_trans()) was being called to determine whether the object
for matrix A should be created for a left- or right-side case. This
was corrected by changing the function to bli_is_left(), as is done
in the hemm driver.
- Added support for running Eigen test drivers from runme.sh.

commit c208b9dc46852c877197d53b6dd913a046b6ebb6
Author: Isuru Fernando <isurufgmail.com>
Date: Mon Mar 25 13:03:44 2019 -0500

Fix clang version detection (305)

clang -dumpversion gives 4.2.1 for all clang versions as clang was
originally compatible with gcc 4.2.1

Apple clang version and clang version are two different things
and the real clang version cannot be deduced from apple clang version
programatically. Rely on wikipedia to map apple clang to clang version

Also fixes assembly detection with clang

clang 3.8 can't build knl as it doesn't recognize zmm0

commit 53842c7e7d530cb2d5609d6d124ae350fc345c32
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Fri Mar 22 13:57:14 2019 +0530

Removed printing alpha and beta values

Change-Id: I49102db510311a30f6a936f9d843f35838f50d23

commit 6805db45e343d83d1adaf9157cf0b841653e9ede
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Fri Mar 22 12:55:35 2019 +0530

Corrected setting alpha & beta values- alpha = -1 and beta = 1 - bli_setc(-1.0, 0, &alpha) should be used rather than bli_setc(0.0, -1.0, &alpha). This corrected now

Change-Id: Ic1102dfd6b50ccf212386a1211c6f31e8d987ef9

commit feefcab4427a75b0b55af215486b85abcda314f7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 21 18:11:20 2019 -0500

Allow disabling of BLAS prototypes at compile-time.

Details:
- Modified bli_blas.h so that:
- By default, if the BLAS layer is enabled at configure-time, BLAS
prototypes are also enabled within blis.h;
- But if the user defines BLIS_DISABLE_BLAS_DEFS prior to including
blis.h, BLAS prototypes are skipped over entirely so that, for
example, the application or some other header pulled in by the
application may prototype the BLAS functions without causing any
duplication.
- Updated docs/BuildSystem.md to document the feature above, and
related text.

commit 20153cd4b594bc34f860c381ec18de3a6cc743c7
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Thu Mar 21 16:23:53 2019 +0530

Modified test_gemm.c file in test folder
A Macro 'FILE_IN_OUT" is defined to read input parameters from a csv file.
Format for input file:
Each line defines a gemm problem with following parameters: m k n cs_a cs_b cs_c
The operation always implemented is C = C - A*B and column-major format.
When macro is disabled - it reverts back to original implementation.
Usage: ./test_gemm_<mkl/blis/openblas>.x input.csv output.csv
GEMM is called through BLAS interface
For BLIS - the test application also prints either 'S' indicating small gemm routine or 'N' - conventional BLIS gemm
for MKL/OpenBLAS - ignore this character

Change-Id: I0924ef2c1f7bdea48d4cdb230b888e2af2c86a36

commit 288843b06d91e1b4fade337959aef773090bd1c9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 20 17:52:23 2019 -0500

Added Eigen support to test/3 Makefile, runme.sh.

Details:
- Added targets to test/3/Makefile that link against a BLAS library
build by Eigen. It appears, however, that Eigen's BLAS library does
not support multithreading. (It may be that multithreading is only
available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.

commit 153e0be21d9ff413e370511b68d553dd02abada9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 17:53:18 2019 -0500

More minor tweaks to docs/Performance.md.

Details:
- Defined GFLOPS as billions of floating-point operations per second,
and reworded the sentence after about normalization.

commit 05c4e42642cc0c8dbfa94a6c21e975ac30c0517a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 17:07:20 2019 -0500

CHANGELOG update (0.5.2)

0.5.2

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 17:07:18 2019 -0500

Version file update (0.5.2)

commit 64560cd9248ebf4c02c4a1eeef958e1ca434e510
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 17:04:20 2019 -0500

ReleaseNotes.md update in advance of next version.

Details:
- Updated ReleaseNotes.md in preparation for next version.

commit ab5ad557ea69479d487c9a3cb516f43fa1089863
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 16:50:41 2019 -0500

Very minor tweaks to Performance.md.

commit 03c4a25e1aa8a6c21abbb789baa599ac419c3641
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 16:47:15 2019 -0500

Minor fixes to docs/Performance.md.

Details:
- Fixed some incorrect labels associated with the pdf/png graphs,
apparently the result of copy-pasting.

commit fe6dd8b132f39ecb8893d54cd8e75d4bbf6dab83
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 16:30:23 2019 -0500

Fixed broken section links in docs/Performance.md.

Details:
- Fixed a few broken section links in the Contents section.

commit 913cf97653f5f9a40aa89a5b79e2b0a8882dd509
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 16:15:24 2019 -0500

Added docs/Performance.md and docs/graphs subdir.

Details:
- Added a new markdown document, docs/Performance.md, which reports
performance of a representative set of level-3 operations across a
variety of hardware architectures, comparing BLIS to OpenBLAS and a
vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.

commit 9945ef24fd758396b698b19bb4e23e53b9d95725
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 19 15:28:44 2019 -0500

Adjusted cache blocksizes for zen subconfig.

Details:
- Adjusted the zen sub-configuration's cache blocksizes for float,
scomplex, and dcomplex based on the existing values for double.
(The previous values were taken directly from the haswell subconfig,
which targets Intel Haswell/Broadwell/Skylake systems.)

commit d202d008d51251609d08d3c278bb6f4ca9caf8e4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 18 18:18:25 2019 -0500

Renamed --enable-export-all to --export-shared=[].

Details:
- Replaced the existing --enable-export-all / --disable-export-all
configure option with --export-shared=[public|all], with the 'public'
instance of the latter corresponding to --disable-export-all and the
'all' instance corresponding to --enable-export-all. Nothing else
semantically about the option, or its default, has changed.

commit ff78089870f714663026a7136e696603b5259560
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Mar 18 13:22:55 2019 -0500

Updates to docs/Multithreading.md.

Details:
- Made extra explicit the fact that: (a) multithreading in BLIS is
disabled by default; and (b) even with multithreading enabled, the
user must specify multithreading at runtime in order to observe
parallelism. Thanks to M. Zhou for suggesting these clarifications
in 292.
- Also made explicit that only the environment variable and global
runtime API methods are available when using the BLAS API. If the
user wishes to use the local runtime API (specify multithreading on
a per-call basis), one of the native BLIS APIs must be used.

commit 3a929a3d0ba0353159a6d4cd188f01b7a390ccfc
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Mon Mar 18 10:51:41 2019 +0530

Fixed code merging: bli_gemm_small.c - missed conditional checks for L!=0 && K!=0. Now they are added. This fix is done to pass blastest

Change-Id: Idc9c9a04d2015a68a19553c437ecaf8f1584026c

commit 663f662932c3f182fefc3c77daa1bf8c3394bb8b
Merge: 938c05ef 6bfe3812
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Mar 16 16:17:12 2019 -0500

Merge branch 'amd' of github.com:flame/blis into amd

commit 938c05ef8654e2fc013d39a57f51d91d40cc40fb
Merge: 4ed39c09 5a5f494e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Mar 16 16:01:43 2019 -0500

Merge branch 'amd' of github.com:flame/blis into amd

commit 6bfe3812e29b86c95b828822e4e5473b48891167
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 15 13:57:49 2019 -0500

Use -fvisibility=[...] with clang on Linux/BSD/OSX.

Details:
- Modified common.mk to use the -fvisibility=[hidden|default] option
when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
etc.). Thanks to Isuru Fernando for pointing out this option works
with clang on these OSes.

commit 809395649c5bbf48778ede4c03c1df705dd49566
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 13 18:21:35 2019 -0500

Annotated additional symbols for export.

Details:
- Added export annotations to additional function prototypes in order to
accommodate the testsuite.
- Disabled calling bli_amaxv_check() from within the testsuite's
test_amaxv.c.

commit e095926c643fd9c9c2220ebecd749caae0f71d42
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Mar 13 17:35:18 2019 -0500

Support shared lib export of only public symbols.

Details:
- Introduced a new configure option, --enable-export-all, which will
cause all shared library symbols to be exported by default, or,
alternatively, --disable-export-all, which will cause all symbols to
be hidden by default, with only those symbols that are annotated for
visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
symbols), to be exported. The default for this configure option is
--disable-export-all. Thanks to Isuru Fernando for consulting on
this commit.
- Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
which was intended for 5a5f494.
- Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
frame/include/bli_config_macro_defs.h.
- Provided appropriate logic within common.mk to implement variable
symbol visibility for gcc, clang, and icc (to the extend that each of
these compilers allow).
- Relocated --help text associated with debug option (-d) to configure
slightly further down in the list.

commit 5a5f494e428372c7c27ed1f14802e15a83221e87
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 12 18:45:09 2019 -0500

Removed export macros from all internal prototypes.

Details:
- After merging PR 303, at Isuru's request, I removed the use of
BLIS_EXPORT_BLIS from all function prototypes *except* those that we
potentially wish to be exported in shared/dynamic libraries. In other
words, I removed the use of BLIS_EXPORT_BLIS from all prototypes of
functions that can be considered private or for internal use only.
This is likely the last big modification along the path towards
implementing the functionality spelled out in issue 248. Thanks
again to Isuru Fernando for his initial efforts of sprinkling the
export macros throughout BLIS, which made removing them where
necessary relatively painless. Also, I'd like to thank Tony Kelman,
Nathaniel Smith, Ian Henriksen, Marat Dukhan, and Matthew Brett for
participating in the initial discussion in issue 37 that was later
summarized and restated in issue 248.
- CREDITS file update.

commit 3dc18920b6226026406f1d2a8b2c2b405a2649d5
Merge: b938c16b 766769ee
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 12 11:20:25 2019 -0500

Merge branch 'master' into dev

commit 766769eeb944bd28641a6f72c49a734da20da755
Author: Isuru Fernando <isurufgmail.com>
Date: Mon Mar 11 19:05:32 2019 -0500

Export functions without def file (303)

* Revert "restore bli_extern_defs exporting for now"

This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.

* Remove symbols not intended to be public

* No need of def file anymore

* Fix whitespace

* No need of configure option

* Remove export macro from definitions

* Remove blas export macro from definitions

commit 4ed39c0971c7917e2675cf5449f563b1f4751ccc
Merge: 540ec1b4 b938c16b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Mar 8 11:56:58 2019 -0600

Merge branch 'amd' of github.com:flame/blis into amd

commit b938c16b0c9e839335ac2c14944b82890143d02f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 7 16:40:39 2019 -0600

Renamed test/3m4m to test/3.

Details:
- Renamed '3m4m' directory to '3', which captures the directory nicely
since it builds test drivers to test level-3 operations.
- These test drivers ceased to be used to test the 3m and 4m (or even
1m) induced methods long ago, hence the name change.

commit ab89a40582ec7acf802e59b0763bed099a02edd8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 7 16:26:12 2019 -0600

More minor updates and edits to test/3m4m.

Details:
- Further updates to matlab scripts, mostly for compatibility with
GNU Octave.
- More tweaks to runme.sh.
- Updates to runme.m that allow copy-paste into matlab interactive
session to generate graphs.

commit f0e70dfbf3fee4c4e382c2c4e87c25454cbc79a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Mar 7 01:04:05 2019 +0000

Very minor updates to test/3m4m for ul252.

Details:
- Very minor updates to the newly revamped test/3m4m drivers when used
on a Xeon Platinum (SkylakeX).

commit 7fe44748383071f1cbbc77d904f4ae5538e13065
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Wed Mar 6 16:23:31 2019 +0530

Disabled BLIS_ENABLE_ZEN_BLOCK_SIZES in bli_family_zen.h for ROME tuning

Change-Id: Iec47fcf51f4d4396afef1ce3958e58cf02c59a57

commit 9f1dbe572b1fd5e7dd30d5649bdf59259ad770d5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Mar 5 17:47:55 2019 -0600

Overhauled test/3m4m Makefile and scripts.

Details:
- Rewrote much of Makefile to generate executables for single- and dual-
socket multithreading as well as single-threaded. Each of the three
can also use a different problem size range/increment, as is often
appropriate when doubling/halving the number of threads.
- Rewrote runme.sh script to flexibly execute as many threading
parameter scenarios as is given in the input parameter string
(currently set within the script itself). The string also encodes
the maximum problem size for each threading scenario, which is used
to identify the executable to run. Also improved the "progress" output
of the script to reduce redundant info and improve readability in
terminals that are not especially wide.
- Minor updates to test_*.c source files.
- Updated matlab scripts according to changes made to the Makefile,
test drivers, and runme.sh script, and renamed 'plot_all.m' to
'runme.m'.

commit f5ed95ecd7d5eb4a63e1333ad5cc6765fc8df9fe
Author: Kiran Varaganti <Kiran.Varagantiamd.com>
Date: Tue Mar 5 15:01:57 2019 +0530

0.5.1

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 18 14:56:16 2018 -0600

Version file update (0.5.1)

commit 3ab231afc9f69d14493908c53c85a84c5fba58aa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 18 14:53:37 2018 -0600

ReleaseNotes.md update in advance of next version.

Details:
- Updated ReleaseNotes.md in preparation for next version.

commit d1aa87164e1e82347d62aa98793963c5265ef7e7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 18 14:52:40 2018 -0600

README.md update (External packages section).

Details:
- Updated External packages section in anticipation of introducing BLIS
into Debian package universe. Thanks to M. Zhou for sponsoring BLIS in
Debian.

commit 7bf901e9265a1acd78e44c06f7178c8152c7e267
Author: sraut <Biplab.Rautamd.com>
Date: Tue Dec 18 14:39:16 2018 +0530

Fix on EPYC machine for multi instance performance issue,
Issue: For the default values of mc, kc and nc with multi instance mode the performance across the cores dip drastically.
Fix: After experimentation found different set of values (mc, kc and nc) which fits in the cache size, and performance across the remains same across all the cores.

Change-Id: I98265e3b7e61cd7602a0cc5596240e86c08c03fe

commit d2b2a0819a2fccad9165bc48c0e172d79a87542c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 17 19:26:35 2018 -0600

Removed stray sections from Multithreading.md.

Details:
- Removed unintended section headers from before table of contents.

commit 93d56319f2953cf0e9df1ff2cda90b8e41351b2c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 17 19:17:30 2018 -0600

Added missing bli_init_once() in bli_thread API.

Details:
- Fixed an issue with specifying threading globally at runtime via
bli_thread_set_num_threads() (the automatic way) or via
bli_thread_set_ways() (the manual way), with bli_thread_init_rntm()
also affected. These functions were not calling bli_init_once() prior
to acting, and therefore their effects on the global rntm_t structure
were being wiped out by the eventual call to bli_init_once(), by some
other BLIS function. Thanks to Ali Emre Gülcü for reporting the
behavior associated with this bug.
- Added additional content to docs/Multithreading.md covering topics of
choosing between OpenMP and pthreads, and specifying affinity via
OpenMP.
- CREDITS file update.

commit 76016691e2c514fcb59f940c092475eda968daa2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Dec 13 17:23:09 2018 -0600

Improvements to bli_pool; malloc()/free() tracing.

Details:
- Added malloc_ft and free_ft fields to pool_t, which are provided when
the pool is initialized, to allow bli_pool_alloc_block() and
bli_pool_free_block() to call bli_fmalloc_align()/bli_ffree_align()
with arbitrary align_size values (according to how the pool_t was
initialized).
- Added a block_ptrs_len argument to bli_pool_init(), which allows the
caller to specify an initial length for the block_ptrs array, which
previously suffered the cost of being reallocated, copied, and freed
each time a new block was added to the pool.
- Consolidated the "buf_sys" and "buf_align" pointer fields in pblk_t
into a single "buf" field. Consolidated the bli_pblk API accordingly
and also updated the bli_mem API implementation. This was done
because I'd previously already implemented opaque alignment via
bli_malloc_align(), which allocates extra space and stores the
original pointer returned by malloc() one element before the element
whose address is aligned.
- Tweaked bli_membrk_acquire_m() and bli_membrk_release() to call
bli_fmalloc_align() and bli_ffree_align(), which required adding an
align_size field to the membrk_t struct.
- Pass the pack schemas directly into bli_l3_cntl_create_if() rather
than transmit them via objects for A and B.
- Simplified bli_l3_cntl_free_if() and renamed to bli_l3_cntl_free().
The function had not been conditionally freeing control trees for
quite some time. Also, removed obj_t* parameters since they aren't
needed anymore (or never were).
- Spun-off OpenMP nesting code in bli_l3_thread_decorator() to a
separate function, bli_l3_thread_decorator_thread_check().
- Renamed:
bli_malloc_align() -> bli_fmalloc_align()
bli_free_align() -> bli_ffree_align()
bli_malloc_noalign() -> bli_fmalloc_noalign()
bli_free_noalign() -> bli_ffree_noalign()
The 'f' is for "function" since they each take a malloc_ft or free_ft
function pointer argument.
- Inserted various printf() calls for the purposes of tracing memory
allocation and freeing, guarded by cpp macro ENABLE_MEM_DEBUG, which,
for now, is intended to be a "hidden" feature rather than one hooked
up to a configure-time option.
- Defined bli_rntm_equals(), which compares two rntm_t for equality.
(There are no use cases for this function yet, but there may be soon.)
- Whitespace changes to function parameter lists in bli_pool.c, .h.

commit f808d829c58dc4194cc3ebc3825fbdde12cd3f93
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 12 15:22:59 2018 -0600

Handle edge cases, zero-filling in packm kernels.

Details:
- Updated the API and semantics of packm kernels such that they must now
handle edge cases, meaning that a c-by-k packm kernel must be able to
pack edge cases that are fewer than c rows/columns and be able to
zero-fill the remaining elements. They must also be able to zero-fill
the equivalent region when copying fewer than k columns/rows (which is
needed by trsm). The new packm kernel API is generally:

void packm_kernel
(
conj_t conja,
dim_t cdim,
dim_t n,
dim_t n_max,
ctype* restrict kappa,
ctype* restrict a, inc_t inca, inc_t lda,
ctype* restrict p, inc_t ldp,
cntx_t* restrict cntx
);

where cdim and n are the dimensions (short and long, respectively) of
the submatrix being copied from the source matrix A, and n_max is the
"full" long dimension (corresponding to the k dimension in gemm) of
the micropanel. The "full" short dimension (corresponding to the
register blocksize MR or NR) is not part of the API because it is
known intrinsically by the packm kernel implementation. Thanks to
Devin Matthews for prompting us to make this change (282).
- Updated all reference packm kernels in ref_kernels/1m according to
above changes, as well as all optimized packm kernels (which only
consisted of those for knl).
- Bumped the major soname version number in 'so_version' to 2. At first
I was considering leaving it unchanged, but I couldn't escape the
reality that the packm kernel API is much closer to an expert API
than it is some obscure helper function interface within the framework
that nobody would ever notice.
- Removed reference packm kernels for mr/nr = 30. The only sub-config
that would have been using those kernels is knc, which is likely no
longer being used by very many people (if any). (This also mostly
offset the larger object code footprint incurred by moving the edge-
case handling into the individual packm kernels.)
- Fixed an obscure race condition for 3mh and 4mh induced methods in
which those implementations were modifying the contexts stored in the
gks rather than a local copy.
- Fixed a minor bug in the testsuite that prevented non-1m-based induced
method implementations of trsm from executing.

commit 02ec0be3ba0b0d6b4186386ae140906a96de919b
Merge: e275def3 c534da62
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 5 19:33:53 2018 -0600

Merge branch 'master' into amd

commit c534da62c0015f91391983da5376c9e091378010
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 5 15:51:05 2018 -0600

Disabled ARM configuration families in registry.

Details:
- Disabled (commented out) the arm32 and arm64 configuration families
in the config_registry file. Having a configuration family registered
only makes sense if BLIS is currently outfitted with runtime hardware
detection logic to choose the appropriate sub-configuration. That
logic is currently missing for ARM architectures, and thus having the
ARM configuration families in the configuration registry only serves
to confuse people. Thanks to Devangi Parikh for suggesting this
change.

commit 6885051a164628904fad0d8a3b39c82f9a7b193c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 5 14:45:39 2018 -0600

Generalizations/cleanup to mixeddt matlab scripts.

Details:
- Parameterized, reorganized, and added comments to matlab scripts in
test/mixeddt/matlab.
- Reordered some lines of code and added comments to plot_l3_perf.m in
test/3m4m/matlab.

commit cbdb0566bf3201a495bbdcb8cb50342fa0098649
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Dec 5 20:06:32 2018 +0000

Updates to 3m4m, mixeddt test driver files.

Details:
- Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to
port recent changes to the former to the latter.
- Disabled (for now) code in 3m4m/test_*.c files that disables all
induced methods except for the one that is requested from the
Makefile via the IND macro. This is done because usually, we want to
test whatever method is enabled automatically for complex datatypes.
(That is, when native complex microkernels are missing, we usually
want to test performance of 1m.)

commit 0645f239fbdf37ee9d2096ee3bb0e76b3302cfff
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 4 14:31:06 2018 -0600

Remove UT-Austin from copyright headers' clause 3.

Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.

commit 9b688a2d69dd420f4d2582827c5ac87e422cd3bc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 4 13:30:25 2018 -0600

Refer to color mm algorithm in Multithreading.md.

commit 22384fd2b749aa8cfdfad1084ce5e7dbd4ad2d64
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Dec 4 13:09:04 2018 -0600

Minor updates to test_gemm.c in test/mixeddt.

commit 2ba3b1780cbca58e43a3948d67bd07e637036125
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 3 19:40:39 2018 -0600

Removed symbols from libblis-symbols.def.

Details:
- Removed bli_gemm_md_front() and bli_gemm_md_zgemm() symbols from
build/libblis-symbols.def, which will hopefully appease AppVeyor.

commit dcb38c4e59c3395c258799e69bfe2104c578c528
Merge: dc184095 375eb30b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 3 18:06:19 2018 -0600

Merge branch 'dev'

commit 375eb30b0a63ac06a363a5f75f283584258db48b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Dec 3 17:49:52 2018 -0600

Added mixed-precision support to 1m method.

Details:
- Lifted the constraint that 1m only be used when all operands' storage
datatypes (along with the computation datatype) are equal. Now, 1m may
be used as long as all operands are stored in the complex domain. This
change largely consisted of adding the ability to pack to 1e and 1r
formats from one precision to another. It also required adding logic
for handling complex values of alpha to bli_packm_blk_var1_md()
(similar to the logic in bli_packm_blk_var1()).
- Fixed a bug in several virtual microkernels (bli_gemm_md_c2r_ref.c,
bli_gemm1m_ref.c, and bli_gemmtrsm1m_ref.c) that resulted in the wrong
ukernel output preference field being read. Previously, the preference
for the native complex ukernel was being read instead of the pref for
the native real domain ukernel. This bug would not manifest if the
preference for the native complex ukernel happened to be equal to that
of the native real ukernel.
- Added support for testing mixed-precision 1m execution via the gemm
module of the testsuite.
- Tweaked/simplified bli_gemm_front() and bli_gemm_md.c so that pack
schemas are always read from the context, rather than trying to
sometimes embed them directly to the A and B objects. (They are still
embedded, but now uniformly only after reading the schemas from the
context.)
- Redefined cpp macro bli_l3_ind_recast_1m_params() as a static function
and renamed to bli_gemm_ind_recast_1m_params() (since gemm is the only
consumer).
- Added 1m optimization logic (via bli_gemm_ind_recast_1m_params()) to
bli_gemm_ker_var2_md().
- Added explicit handling for beta == 1 and beta == 0 in the reference
gemm1m virtual microkernel in ref_kernels/ind/bli_gemm1m_ref.c.
- Rewrote various level-0 macro defs, including axpyris, axpbyris,
scal2ris, and xpbyris (and their conjugating counterparts) to
explicitly support three operand types and updated invocations to
xpbyris in bli_gemmtrsm1m_ref.c.
- Query and use the storage datatype of the packed object instead of the
storage datatype of the source object in bli_packm_blk_var1().
- Relocated and renamed frame/ind/misc/bli_l3_ind_opt.h to
frame/3/gemm/ind/bli_gemm_ind_opt.h.
- Various whitespace/comment updates.

commit e275def30ac41cadce296560fa67282704f20a02
Merge: 8091998b dc184095
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 30 15:39:50 2018 -0600

Merge branch 'master' into amd

commit dc18409551f341125169fe8d4d43ac45e81bdf28
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 28 11:58:40 2018 -0600

CREDITS file update.

commit ee4d2712963816f84d7e3fdd39d93424e1aaf63d
Merge: e81c4b56 3d7e8bc3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 28 11:52:57 2018 -0600

Merge pull request 287 from SuperFluffy/fix_configuration_links

Fix configuration links

commit 3d7e8bc3b8e77693152138e75676f71573e5e6cd
Author: Richard Janis Goldschmidt <janis.beckertgmail.com>
Date: Wed Nov 28 15:56:37 2018 +0100

Fix configuration links

commit 6a4885f8be9ecd81423ebf2eb6da75d7981c979b
Merge: 1d8aae22 e81c4b56
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 27 13:22:59 2018 -0600

Merge branch 'master' into dev

commit e81c4b56660b25a39f8fdc09fbe07459c5bd8e8e
Merge: 757043ea cfbdb58d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 21 17:00:49 2018 -0600

Merge pull request 285 from isuruf/pthread

Move LDFLAGS to the end

commit cfbdb58de2e44f2e3a3d8b14fceece7aef4b3006
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 14:23:39 2018 -0600

Move LDFLAGS to the end

Otherwise the linker will drop flags like -lpthread

commit 757043eae8630c0a76e9bb04f2cb0bd72439a86a
Merge: e769bf46 7af8fa01
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 21 13:07:26 2018 -0600

Merge pull request 283 from isuruf/patch-3

Fix MinGW and Cygwin build failures

commit 7af8fa01373b7bb30fa3b1fd110fd201c87ea225
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 02:10:05 2018 -0600

Fix blis dll path

commit 2acd8dcd23805203a6821358c5e3e09d521fecdf
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 02:02:18 2018 -0600

Fix install path of dll.a

commit b7b0ad22b151e89e2a6c7782cf4d8d47b4e60734
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:54:44 2018 -0600

Test mingw

commit bafe521ed0012b7b8814404b78a6c576d8386370
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:54:36 2018 -0600

Fixes for mingw

commit be831879bd03edcddff8a345161f749ad92215af
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:39:32 2018 -0600

test gcc shared

commit f6b924648c79c4b1c3d3c7fbf85372680aff8362
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:39:19 2018 -0600

Don't use .def for gcc

commit ce6e4eae6d5e977e6f699acc9cf239be8ac53771
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:34:56 2018 -0600

test no threading

commit c9169b4685bfe81bc562cf9128b35a6a9884799b
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:17:36 2018 -0600

Add mingw64 path

commit 0f753090eaf4264b743a49ce15de97514bcbe112
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:14:52 2018 -0600

Fix PATH

commit d424470b1f2fa8717fa54c0245b21341504665f6
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 01:04:26 2018 -0600

Check openmp and pthreads threading

commit c73e7601e58239e2dedec6c9f1b752e949254a42
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 00:50:33 2018 -0600

Revert "enable rdp"

This reverts commit 368274bcbd0c9232521d14fa28304f35ced0e6d7.

commit 6209b2e6060b89e65f3405c31333af8952dd63c0
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 00:50:22 2018 -0600

Remove conda

commit 0b1b344447b8a2fcd635a48f0ce7ce89b2107dc4
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 00:42:39 2018 -0600

Fix make name

commit 7a9838983ba8dd32ac9f87712255721542ff561f
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 00:35:27 2018 -0600

Use m2w64-make

commit 4c1dedd6a90087807f16353a5d0bcaaade35a7a5
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 00:28:20 2018 -0600

No activate on gcc

commit 368274bcbd0c9232521d14fa28304f35ced0e6d7
Author: Isuru Fernando <isurufgmail.com>
Date: Tue Nov 20 23:40:26 2018 -0600

enable rdp

commit 707a5e7f9b07f554e1e9289dd0ce3b7dc4fded6e
Author: Isuru Fernando <isurufgmail.com>
Date: Tue Nov 20 23:39:31 2018 -0600

No conda for mingw build

commit 65b0565c0ad9162d4474bd84eabde491fa971538
Author: Isuru Fernando <isurufgmail.com>
Date: Tue Nov 20 23:19:38 2018 -0600

Check MinGW-w64

commit 9ddffba5847080e0d77d9e6059d05dc4b1d89ba5
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Nov 21 00:23:34 2018 -0600

Fix MinGW build failure

Fixes https://github.com/flame/blis/issues/278

commit 1d8aae220bc52ce8e3a8afaa64b57e5d83480bdc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 20 18:42:07 2018 -0600

Track internal scalar datatypes.

Details:
- Added a num_t datatype bitfield to the obj_t in the form of a new
info2 field in the obj_t. This change was made primarily so that in
the case of mixed-datatype gemm, the alpha scalar would not need to
be cast to the storage datatype of B (or A) before then being cast to
the computation datatype just before the macrokernel is called. This
double-casting regime could result in loss of precision if the storage
datatype of B (or A) is less than the computation precision. In
practice, it was likely not going to be a big deal since most usage of
alpha is for -1.0, 0.0, and 1.0 (or integer multiples thereof), which
can all be represented exactly in single or double precision.
- The type of objbits_t was changed to uint32_t, so the new format
potentially takes up the same space as the previous obj_t definition,
assuming no padding inserted by the compiler. Shrinking info to 32
bits and spilling over into a second field was chosen over using the
high 32 bits of a single 64-bit objbits_t info field because many of
the bitwise operations are performed with enums such as num_t, dom_t,
and prec_t, which may take on the type of 32-bit ints. It's easier to
just keep all of those bitwise operations in 32 bits than perform a
million typecasts throughout bli_type_defs.h and bli_obj_macro_defs.h
to ensure that the integers are treated as 64-bit for the purposes of
the ANDs, ORs, and bitshifts.
- Many comment updates.
- Thanks to Devin Matthews and Devangi Parikh for their feedback and
involvement during this commit cycle.

commit e769bf46b0931d68031af212110484ec98e16908
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 20 16:16:53 2018 -0600

Tweak testsuite to issue FAIL for Nan, Inf (279).

Details:
- Adjusted the definition for libblis_test_get_string_for_result() in
testsuite/src/test_libblis.c so that the "FAIL" string is returned if
the computed residual contains either NaN or Inf. Previously, a
residual containing NaN would result in the selection of the "PASS"
string. Thanks to Devin Matthews for reporting this issue (279).
- Expounded on comment for the macro definitions of bli_isnan() and
bli_isinf() in bli_misc_macro_defs.h to make it more obvious why they
must remain macros.

commit 279deae18fb8b8106161863b46fcb38232314de4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Nov 16 11:34:19 2018 -0600

Added 4x5 matlab plotting scripts to test/3m4m.

Details:
- Added a new directory, test/3m4m/matlab, containing matlab scripts for
plotting 4x5 panels of performance graphs (using the subplot()
function) for gemm, hemm, herk, trmm, and trsm across all four
floating-point datatypes. I expect to further refine these scripts as
time goes on, but their current state constitutes a good start.

commit 7b02c726650336c12286c8ba166d1d0fdf7601a8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 14 13:49:55 2018 -0600

CREDITS file update.

commit 84dd298a27033945fa2d3b6e5dce1fe625cd2a0a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 14 13:47:45 2018 -0600

Patch to fix msys2/Windows build failure (277).

Details:
- Expanded cpp guard in frame/include/bli_x86_asm_macros.h to also check
__MINGW32__ in addition to _WIN32, __clang__, and __MIC__. Thanks to
Isuru Fernando for suggesting this fix, and also to Costas Yamin for
originally reporting the issue (277).

commit 8091998b6500e343c2024561c2b1aa73c3bafb0b
Merge: 333d8562 7b5ba731
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 14 12:36:35 2018 -0600

Merge branch 'master' into amd

commit 7b5ba7319b3901ad0e6c6b4fa3c1d96b579efbe9
Merge: ce719f81 52392932
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Nov 14 12:32:01 2018 -0600

Merge branch 'dev' of github.com:flame/blis into dev

commit 52392932dc1ea3c16220cc4e6978efcb2f5f0616
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 13 22:23:38 2018 +0000

Minor fixes to test/3m4m drivers.

Details:
- Cleanups to Makefile to allow all test drivers to be built for
OpenBLAS and MKL in addition to BLIS.
- Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_().
- Fixed incorrect types for betap in BLAS cpp macro branch of
test_herk.c.

commit 4f12e36a0d0e6df146314b4e50e36c5e7a1af3d3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 13 14:23:12 2018 -0600

Fixed number of columns in first output line.

Details:
- In previous commit, forgot to remove output column corresponding to
the k dimension.

commit a2e0cdd7debf8109198536d55af05d5631072fb2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 13 14:15:11 2018 -0600

Added hemm test driver to test/3m4m.

Details:
- Added a new test_hemm.c test driver to test/3m4m, which was modeled
after the driver by the similar name in test. Also updated Makefile
so that blis-nat-[sm]t would trigger builds for the new driver.

commit 0f9b53e84b48d8d73a56cc9889eae3595ca58a78
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Nov 13 13:03:15 2018 -0600

Fixed a bug in high-level mixeddt conditional.

Details:
- Fixed a bug in frame/3/bli_l3_oapi.c in the conditional that divides
use of induced method (1m) execution from native execution. The former
was intended to only be used in cases where all storage datatypes are
complex and the datatype of C is equal to the computation datatype.
(If mixed datatypes are detected, native execution would be used.)
However, the code in bli_gemm() was erroneously checking the execution
datatype instead of the computation datatype, which at that point is
guaranteed to be equal to the storage datatype even if the computation
datatype contains a different value. Thanks to Devangi Parikh for
helping in isolating this bug.

commit 333d8562f04eea0676139a10cb80a97f107b45b0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Nov 11 14:28:53 2018 -0600

Added debug output to bli_malloc.c.

Details:
- Added debug output to bli_malloc.c in order to debug certain kinds of
memory behavior in BLIS. The printf() statements are disabled and must
be enabled manually.
- Whitespace/comment updates in bli_membrk.c.

commit ce719f816d1237f5277527d7f61123e77180be54
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 10 14:48:43 2018 -0600

More edits to mixeddt matlab scripts.

Details:
- Renamed scripts in test/mixeddt/matlab:
plot_case_all.m -> plot_dom_all.m
plot_case_md.m -> plot_dom_case.m
plot_all_md.m -> plot_dt_all.m
- Added plot_dt_select.m in order to plot select graphs for the main
body of the mixeddt paper, and added additional related legend
handling in plot_gemm_perf.m.
- Added test/mixeddt/matlab/output and a .gitkeep file within in order
to force git to recognize the directory.

commit bf99e7c14baf45725b698d06ad043b531e3a2763
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Nov 8 18:47:17 2018 -0600

Minor updates to test/mixeddt driver.

Details:
- Cleaned up test/mixeddt Makefile in preparation for gathering new
data for mixeddt paper, including renaming implementations to
"internal" and "ad-hoc" to match the terminology to be used in the
paper.
- Added new matlab scripts for generating 8 figures, each covering all
mixed-precision cases for each mixed-domain case.
- Updated the runme.sh script according to changes to Makefile.
- Fixed a minor bug in test_gemm.c that may have given incorrect
performance in complex, homogeneous storage datatype cases where
the computation precision was equal to the storage precisions.
(Examples: zzzd, cccs.)

commit 4bbb454bf3c361af9e97bfa394a73d610cd9002a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 3 19:11:01 2018 -0500

Testsuite docs update for mixed-datatype gemm.

Details:
- Updated docs/Testsuite.md to include mention of the new mixed-domain
and mixed-precision settings, including descriptions.
- Updated docs/MixedDatatypes.md to include a brief section on running
the testsuite to exercise mixed-datatype functionality, which mostly
amounts to a link to the Testsuite.md document.
- Minor verbiage change to testsuite output to correct a misleading
label associated with the value returned by the query function
bli_info_get_simd_num_registers(). (The function does not return the
number of SIMD registers present in the hardware, but rather a maximum
assumed value for the purposes of allocating temporary microtile
workspace on the function stack.)

commit 16401ae922b1285437cf5f6867b2764650a95fb0
Merge: f19c33af 2d403a15
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Nov 3 19:09:43 2018 -0500

Merge branch 'dev'

commit 2d403a1535380a2ebe2ae2c0f5ac54ba7564fbeb
Merge: e90e7f30 4a12979f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Nov 1 20:18:53 2018 -0500

Merge pull request 275 from RhysU/patch-1

Spelling in FAQ

commit 4a12979f65697ed79ba290efd59f4b994ac9429b
Author: Rhys Ulerich <rhys.ulerichgmail.com>
Date: Thu Nov 1 20:20:59 2018 -0400

Spelling in FAQ

commit f19c33af4cbe6f5705b96fbf2b8799c3c2bd75c3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 26 17:07:15 2018 -0500

Disallow 64b BLAS integers + 32b BLIS integers.

Details:
- Print an error message from configure if the user attempts to
explicitly configure BLIS for simultaneous use of 64-bit integers in
the BLAS API with 32-bit integers in the BLIS API.
- Added cpp macro conditional to bli_type_defs.h to mandate that BLIS
integers be 64 bits if the BLAS integers are 64 bits. This and the
above item take care of issue 274. Thanks to Devin Matthews and
Jeff Hammond for suggesting these safeguards.
- Slight reorganization and relabeling (for clarity) of BLAS/CBLAS
sections and BLIS integer size line of the testsuite configuration
output.
- Very minor edits to docs/MixedDatatypes.md.

commit e90e7f309b3f2760a01e8e09a29bf702754fa2b5 (origin/win-pthreads)
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 25 14:09:43 2018 -0500

CHANGELOG update (0.5.0)

0.5.0

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 25 14:09:40 2018 -0500

Version file update (0.5.0)

commit 75da7f2a208ad7d26ed9c6d3e10d08b2a1caf9d6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 25 14:02:41 2018 -0500

ReleaseNotes.md update in advance of next version.

Details:
- Updated ReleaseNotes.md in preparation for next version.
- Updated docs/FAQ.md to reflect recent developments, and other edits.
- Minor updates to RELEASING.

commit 6fbc456fb3f4401ec951a618990f15a84fdfa236
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 25 13:20:25 2018 -0500

Added SALT testing to Travis CI.

Details:
- Modified .travis.yml to automatically employ the simulation of
application-level threading within the testsuite, with supporting
changes to common.mk, the top-level Makefile, and
travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
'.salt' suffix (similar to those with the '.fast' suffix) for
testing application-level threading.
- Updated docs/BuildSystem.md to document the new make targets
'testblis-salt' and 'checkblis-salt'.

commit 0e27963a6770e6b64f3299ad0613d5df45d8b6ae
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 24 12:16:19 2018 -0500

Add bli_pthread_mutex_trylock().

Details:
- Added the missing bli_pthread_mutex_trylock() function and prototype
to the non-Windows sections of bli_pthread.c and .h. This function
isn't needed by BLIS, but I figured why not make the Windows and
non-Windows sections consistent with one another.

commit 4b683740c12f83804a51ec610b16ce28607d5c85
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 24 11:56:16 2018 -0500

Defined bli_pthread_cond_*() and related defs.

Details:
- Added function definitions for bli_pthread_cond_*() as well as related
types and constants to bli_pthread.c, and corresponding prototypes to
bli_pthread.h.

commit 4b4f8072b9bb495b3e01d45698b0bad3dac31ba8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 24 11:31:46 2018 -0500

Define bli_pthreads barrier types on OS X.

Details:
- Fully define bli_pthreads barrier-related types on OS X. Only typedef
those types in terms of pthreads types on non-Windows, non-Apple OSes
(i.e. Linux).

commit ad98790dcef6bd9aab7f13d615b987b5daa58757
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 23 20:35:05 2018 -0500

Fix names of Windows pthread initializer macros.

Details:
- Renamed the PTHREAD_ initializer macros in the Windows cpp case to use
BLIS_ prefixes to match their non-Windows counterparts.

commit 06c23954e6b17219a50c3d37821544a46defaf89
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 23 19:16:54 2018 -0500

Defined unified bli_pthreads_*() API for all OSes.

Details:
- Expanded the bli_pthread_*() -> pthread_*() wrappers in
frame/thread/bli_pthread.c to include cases for Windows taken from
frame/base/bli_pthread_wrap.c. Now, bli_thread_*() is always defined
and always used by BLIS and the BLIS testsuite (in lieu of calling
pthreads directly, as before). The implementation used in this new
API depends on whether we are building for Windows, and to a lesser
extent, whether we are building on OS X. For the core API, Windows
uses Windows threads, non-Windows (Linux, OS X) uses pthreads.
OS X and Windows get barriers implemented in terms of other
bli_pthread_*() functions, and Linux gets barriers implemented in
terms of pthread_barrier*(). This commit addresses issue 273.
- Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(),
which was erroneously calling pthread_mutex_lock().
- Minor changes to configure so that the auto-detection executable
can be built given the above changes (most notably, turning on
POSIX extensions via -D_GNU_SOURCE).
- Removed temporary play-test code for shiftd that accidentally got
committed into test/3m4m/test_gemm.c.

commit 0ae9585da1e3db1cf8034d4b16305a5883beb0d3
Author: pradeeptrgit <pradeep.raoamd.com>
Date: Tue Oct 23 09:36:23 2018 +0530

Update version number to 1.2

Change-Id: Ibb31f6683cdecca6b218bc2f0c14701d7e92ebf3

commit eac7d267a017d646a2c5b4fa565f4637ebfd9da7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 22 18:10:59 2018 -0500

Unconditionally define bli_l3_thread_entry().

Details:
- Define a dummy bli_l3_thread_entry() function when multithreading is
disabled altogether, or enabled via OpenMP. This function was
originally necessary when multithreading is enabled via pthreads.
By defining the function no matter the threading options given, it is
less likely that an AppVeyor Windows build will complain due to a
missing symbol in the DLL. (To be clear: AppVeyor was working fine
before, but a problem may have arisen if it were switched to an
OpenMP build.)
- Removed the prototype for bli_l3_thread_entry() from
bli_thrcomm_pthreads.c and placed it in bli_thrcomm.h.
- Regenerated the symbols list file build/libblis-symbols.def.

commit 4ee986f0a74207f4ca29df077929134725d62b80
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 22 14:09:44 2018 -0500

Added mixed-datatype testing to Travis CI (271).

Details:
- Modified .travis.yml to automatically test the mixed-datatype support
of the gemm operation, with supporting changes to common.mk, the
top-level Makefile, and travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
'.mixed' suffix (similar to those with the '.fast' suffix) for testing
mixed-datatype gemm.
- Updated docs/BuildSystem.md to document the new make targets
'testblis-md' and 'checkblis-md'.

commit c3c6ebc9c6244053d654a9b0c955acb2fef42ee8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Oct 21 18:48:54 2018 -0500

Fixed thrinfo_t printing for small problems.

Details:
- Fixed a bug in the code that prints out the communicator and work ids
from the various threads' thrinfo_t nodes. This bug manifested when
the dimension being parallelized was not large enough such that every
thread was assigned actual work (since the minimum amount of work is
determined by the register blocksize in the dimension being
parallelized). In those cases, the threads that receive no work in
that dimension do not finish building their thrinfo_t tree, leaving
lower-level nodes non-existent. (The bug itself was usally observed as
a segfault when the printing code attempted to dereference all the way
down the thrinfo_t tree.) The solution involves explicitly checking
each node as it is dereferenced, and if at any time NULL is found, all
subsequent communicator and work ids are set to -1.

commit 73a222c0d99dcc221be7dea10eaebf844f31f72e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Oct 20 14:13:04 2018 -0500

Minor edits to 'configure --help' text.

commit 14f3d5e6df183819a0c393b2661ad15df0786544
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 20:39:35 2018 -0500

Refresh libblis-symbols.def post-merge 090e4f0.

commit 090e4f08fc2f429a1b2db77b0a6f8276f892a7ac
Merge: c9be5889 0854e880
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 18:41:10 2018 -0500

Merge branch 'master' into dev

commit 0854e880b0848e0c2e3d0644c93c80b0fd13c0dc
Merge: 4e38a8d4 343a2715
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 18:05:00 2018 -0500

Merge pull request 261 from flame/win-pthreads

Implement missing pthreads function on Windows

commit c9be5889fbe947c64ef75740662e4d63032f4c35
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 17:42:40 2018 -0500

Added "Known issues" section to Multithreading.md.

Details:
- Added known issues section to Multithreading.md.
- Trivial changes to MixedDatatypes.md, Sandboxes.md.

commit 343a2715ebee28d250ee41b914abdcd1dc77c344
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 16:59:19 2018 -0500

Whitespace changes to configure, bli_pthread_wrap.

Details:
- Mostly whitespace changes (spaces to tabs) to configure and
bli_pthread_wrap.c and .h.

commit 3678a1cd518df9447b4b1ea86885eb2ba8abcf6e
Merge: 85397cd4 4e38a8d4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 16:11:31 2018 -0500

Merge branch 'master' into win-pthreads

commit 4e38a8d4eebb18ead74e644fac76a4fde8e7f6c6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 15:54:15 2018 -0500

Implemented python version checking in configure.

Details:
- Added python version checking to configure script. (Recall that python
is needed to execute the flatten-headers.py script.) Minimum versions
of python needed are currently as follows:
python2: 2.7 or later
python3: 3.5 or later
The standard search order for python interpeters is:
python python3 python2
The PYTHON environment variable is also supported and will be checked
before the standard search order list.
- Updated BuildSystem.md to include: a minimum make version; mention
that the C compiler must actually be a C99 compiler; and the caveat
that Windows builds do not require pthreads since BLIS can provide
an implementation of pthreads internally.

commit 85397cd4fa52f6c4c33f4fb715478c55533c680e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 13:12:43 2018 -0500

Added explanatory comment to bli_pthread.c.

Details:
- Added a verbose comment to bli_pthread.c that explains why a bli_
wrapper to pthreads APIs is useful.

commit 53c07035ef61cc9b8469636d4d8fa5085f37652d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Oct 19 12:53:03 2018 -0500

Refresh libblis-symbols.def from bb6df28.

Details:
- Forgot to regenerate the symbols file after the previous commit
(bb6df281) in which shiftd operation was introduced.

commit 473ce54f5fbea4860ac0514e7e8b022c1ea03e63
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 18 19:03:56 2018 -0500

Added bli_pthread_*() API.

Details:
- Defined a bli_pthread_*() API so that the testsuite, when being linked
against a Windows DLL, will be able to access pthreads functionality
without those pthreads functions being explicitly exported by the DLL.
Instead, we export the bli_pthread_*() layer, which uses types and
functions that are identical to pthreads, but adds a 'bli_' prefix.
Only a few basic functions are present in the bli_pthreads_*() API
for now. Thanks to Devin Matthews and Isuru Fernando for their help
on a related PR (261) that this commit will hopefully facilitate.
- Updated testsuite so that it calls bli_pthread_*() layer instead of
pthread_*() functions directly.
- Regenerated build/libblis-symbols.def.
- Comment updated to build/regen-symbols.sh.

commit bb6df2814fcaa2fa62a549379f61be2f8667a598
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 18 17:11:39 2018 -0500

Defined a new level-1d operation: shiftd.

Details:
- Defined a new level-1d operation called 'shiftd', including object and
typed APIs. This operation adds a scalar value to every element along
an arbitrary diagonal of a matrix. Currently, shiftd is implemented in
terms of the addv kernel. (The scalar is passed in as the x vector
with an increment of zero.)
- Replaced ad-hoc usage of setd and addd (after creating a temporary
matrix object) with use of shiftd, which is much more concise, in
various test driver files in the testsuite. Similar changes were made
to the standalone test drivers and the example code.
- Added documentation entries in BLISObjectAPI.md and BLISTypedAPI.md
for bli_shiftd() and bli_?shiftd(), respectively.
- Added observed object properties to level-1d documentation in
BLISObjectAPI.md.

commit 53e0a0c9b38e8525c7224e280342ef56328af567
Merge: 1c7247b6 ec676799
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 18 14:54:59 2018 -0500

Merge branch 'master' into win-pthreads

commit ec67679990660a60362a49406595383672812287
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 18 14:27:02 2018 -0500

Refreshed Windows symbol list; added regen script.

Details:
- Moved windows/build/libblis-symbols.def to build/libblis-symbols.def.
Updated link commands in common.mk accordingly.
- Added a new script build/regen-symbols.sh that will regenerate the
libblis-symbols.def file in its new location after building a
haswell-targeted shared library. Thanks to Isuru Fernando for
providing the symbol generation command.
- Ran the new script to refresh the symbols file.

commit fdad54ab8eee4a7efd04ec4afb3e6902eb22e60a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 18 12:43:22 2018 -0500

Removed old symbol from libblis-symbols.def.

Details:
- Removed bli_gemm_ker_var1() from windows/build/libblis-symbols.def
since this function is no longer compiled.

commit 49d3f9fcbb4a75553439f97c099ea48d85763eea
Merge: 779d64dc 3c527256
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 17 18:00:40 2018 -0500

Merge branch 'master' into dev

commit 3c52725693d0d7726e1c8fb224f9b1ef786db8b9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 17 14:56:22 2018 -0500

Renamed/moved l3 zen ukernels to haswell kernel set.

Details:
- Renamed the microkernels in kernels/zen/3 to kernels/haswell/3 and
then updated the file contents to use the 'haswell' infix.
- Updated bli_cntx_init_zen.c and bli_cntx_init_haswell.c according to
above function renames.
- Moved/updated the corresponding prototypes in bli_kernels_zen.h to
bli_kernels_haswell.h.
- Updated config_registry according to above changes.
- NOTE: This rename reflects the fact that haswell microkernels are
specifically written to overcome the floating-point latency for FMA
instructions on Intel Haswell-like architectures, which can issue two
FMA instructions per cycle. These ukernels happen to work fine on AMD
Zen-based architectures. However, Zen only issues one FMA per cycle,
which, while halving its floating-point throughput, gives it extra
flexibility in the design of its microkernels--namely, mr and nr can
be smaller and still overcome the floating-point latency for those
single-issue cores. A smaller value of mr and nr allows for a larger
value of kc, which may be useful in some situations. In the future,
we may write such Zen-specific microkernels to take advantage of this
additional flexibility.

commit 71c5832d5f5596f25204980803423d08143a4010
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 17 14:11:01 2018 -0500

Consolidated slab/rr-explicit level-3 macrokernels.

Details:
- Consolidated the *sl.c and *rr.c level-3 macrokernels into a single
file per sl/rr pair, with those files named as they were before
c92762e. The consolidation does not take away the *option* of using
slab or round-robin assignment of micropanels to threads; it merely
*hides* the choice within the definitions of functions such as
bli_thread_range_jrir(), bli_packm_my_iter(), and bli_is_last_iter()
rather than expose that choice explicitly in the code. The choice of
slab or rr is not always hidden, however; there are some cases
involving herk and trmm, for example, that require some part of the
computation to use rr unconditionally. (The --thread-part-jrir option
controls the partitioning in all other cases.)
- Note: Originally, the sl and rr macrokernels were separated out for
clarity. However, aside from the additional binary code bloat, I later
deemed that clarity not worth the price of maintaining the additional
(mostly similar) codes.

commit 57eab3a4f0e43099fc2ff189df9fcc0d7801c2cd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 17 11:29:20 2018 -0500

CREDITS file update.

commit 6722ec21817cbab9d86ee63f00984eb407b5e627
Author: Ye Luo <xw111luoyegmail.com>
Date: Wed Oct 17 11:26:00 2018 -0500

Fix bgclang compilation on BGQ (270)

* Fix bgq kernels

* Support bgq with bgclang

commit 1c7247b6d146fc728d7c4240e4e069e33f8f8868
Merge: c1bc5530 6c5a1aaf
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 16 14:44:32 2018 -0500

Merge branch 'win-pthreads' of github.com:flame/blis into win-pthreads

commit c1bc5530d51bf55b4aa3c35165f6d4452a0fd779
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 16 14:44:10 2018 -0500

Don't call pthread_once in auto-detect.

commit b9c61d03f542a2e92551ff0595415bec3076ab25
Merge: 5a1e461f 3612ecac
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 16 14:39:57 2018 -0500

Merge branch 'nested-omp-patch'

commit 5a1e461ffe09ed200ee2fc7aafccf6dd7e8c0080
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 16 14:21:45 2018 -0500

Execute flatten-headers.py via $(PYTHON).

Details:
- Execute build/flatten-headers.py python script via $(PYTHON) in
common.mk. This allows distributions that define the current/preferred
python interpreter in the PYTHON environment variable to use that
interpreter when executing flatten-headers.py. Thanks to Isuru
Fernando for this suggestion, and for Dave Love for submitting the
initial issue/request.

commit 6c5a1aaff540b19672e91501e894ed695aee322b
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 16 10:15:59 2018 -0500

Fix type in bli_pthread_wrap.c

commit 29e6245816760b1bd4ac738d7d3e11a9d9d13473
Merge: 0b73209f ed657714
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 16 10:12:25 2018 -0500

Merge branch 'master' into win-pthreads

commit 0b73209f6b22cc024169146d343627f6999b63d8
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 16 10:02:06 2018 -0500

Add missing argument to WaitForSingleObject and use $is_win in configure
to turn off pthreads.

commit ed65771482a705f7ed028d822489766327b44e76
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 15 17:54:45 2018 -0500

Fixed merge fail on testsuite threading macros.

Details:
- Applied the following C preprocessor macro renames

BLIS_DEFAULT_MR_THREAD_MAX -> BLIS_THREAD_MAX_IR
BLIS_DEFAULT_NR_THREAD_MAX -> BLIS_THREAD_MAX_JR
BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N

in src/test_libblis.c. This is apparently the result of a failure by
git to properly merge the 'master' and 'amd' branches in the previous
commit. (The 'master' branch contained a commit, 53a9ab1, in which
these same cpp macros were renamed throughout the source distribution.

commit dc5fd898af8c74c2e2a75fc647157da0d04dd922
Merge: 667d3929 637c2ce7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 15 17:41:35 2018 -0500

Merge branch 'amd'

commit 779d64dc3091dea6b7530283304e52878151d218
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 15 17:13:18 2018 -0500

Added entry for xpbym to input.operations.fast.

Details:
- Forgot to add an entry for the new xpbym operation to
input.operations.fast in previous commit.

commit 5fec95b99f61761963834f62a9867f797687813c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 15 16:37:39 2018 -0500

Implemented mixed-datatype support for gemm.

Details:
- Implemented support for gemm where A, B, and C may have different
storage datatypes, as well as a computational precision (and implied
computation domain) that may be different from the storage precision
of either A or B. This results in 128 different combinations, all
which are implemented within this commit. (For now, the mixed-datatype
functionality is only supported via the object API.) If desired, the
mixed-datatype support may be disabled at configure-time.
- Added a memory-intensive optimization to certain mixed-datatype cases
that requires a single m-by-n matrix be allocated (temporarily) per
call to gemm. This optimization aims to avoid the overhead involved in
repeatedly updating C with general stride, or updating C after a
typecast from the computation precision. This memory optimization may
be disabled at configure-time (provided that the mixed-datatype
support is enabled in the first place).
- Added support for testing mixed-datatype combinations to testsuite.
The user may test gemm with mixed domains, precisions, both, or
neither.
- Added a standalone test driver directory for building and running
mixed-datatype performance experiments.
- Defined a new variation of castm, castnzm, which operates like castm
except that imaginary values are not touched when casting a real
operand to a complex operand. (By contrast, in these situations castm
sets the imaginary components of the destination matrix to zero.)
- Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
also simplified the implementation of bli_obj_imag_equals().
- Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
when given BLIS_CONSTANT objects.
- Disabled dt_on_output field in auxinfo_t structure as well as all
accessor functions. Also commented out all usage of accessor
functions within macrokernels. (Typecasting in the microkernel is
still feasible, though probably unrealistic for now given the
additional complexity required.)
- Use void function pointer type (instead of void*) for storing function
pointers in bli_l0_fpa.c.
- Added documentation for using gemm with mixed datatypes in
docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
- Defined level-1d operation xpbyd and level-1m operation xpbym.
- Added xpbym test module to testsuite.
- Updated frame/include/bli_x86_asm_macros.h with additional macros
(courtsey of Devin Matthews).

commit 3612ecac98a9d36c3fcd64154121d420bb69febd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 11 15:16:41 2018 -0500

Added comments to nested OpenMP handling code.

Details:
- Added comments to bli_thrcomm_openmp.c relating to changes made in
6ac0c80 and 1064d79.

commit 667d3929ee20e94849b4e25b693b4037b7e3f350
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 11 11:47:57 2018 -0500

Added Fortran APIs for some thread functions.

Details:
- Defined Fortran-77 compatible APIs for bli_thread_set_num_threads()
and bli_thread_set_ways(). These wrappers are defined in
frame/compat/blis/thread/b77_thread.c. Thanks to Kay Dewhurst for
suggesting these new interfaces.
- Added missing prototype for bli_thread_set_ways() in bli_thread.h and
removed prototypes for non-existent functions bli_thread_set_*_nt().
- CREDITS file update.

commit 1064d79711f03a0541b92d8b8b9b7e25e04097a5
Author: Devin Matthews <damatthewssmu.edu>
Date: Thu Oct 11 11:14:25 2018 -0500

Adjust rntm_t struct as well.

commit 6ac0c805609b85616ddb32e50101c4f9feb25a35
Author: Devin Matthews <damatthewssmu.edu>
Date: Thu Oct 11 10:45:07 2018 -0500

Fix OMP nesting problem.

Detect when OpenMP uses fewer threads than requested and correct accordingly, so that we don't wait forever for nonexistent threads. Fixes 267.

commit 78a6935483409ae277c766406e175772e820b1de
Author: sraut <Biplab.Rautamd.com>
Date: Thu Oct 11 10:49:40 2018 +0530

Added comments for the change in syrk small matrix change.

Change-Id: I958939e9953323730da49ef07d1b10e578837d82

commit 53a9ab1c85be14dcfd2560f5b16e898e3e258797
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 10 15:11:09 2018 -0500

Renamed thread auto-factorization macro constants.

Details:
- Renamed the following C preprocessor macros whose fallback/default
values are specified within frame/include/bli_kernel_macro_defs.h:

BLIS_DEFAULT_MR_THREAD_MAX -> BLIS_THREAD_MAX_IR
BLIS_DEFAULT_NR_THREAD_MAX -> BLIS_THREAD_MAX_JR
BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N

- Renamed the above cpp macro overrides within the knl, skx, and zen
sub-configurations, as well as invocations of those macros in
bli_rntm.c.
- Moved config/zen/bli_kernel.h to an 'old' directory as it is no longer
used by any code within BLIS.

commit 637c2ce794b0414ba8b25e9a452f7d64f825d63a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 9 17:18:04 2018 -0500

Updated column index range for irun.py -q.

Details:
- Forgot to apply the column index range fix in 10f179f to situations
when "quiet" mode (-q) is requested. This commit applies the new
column index range modifications to the quiet case.

commit e2a59400bdda7ed7ee0ff00edea70c00ed593b6c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Oct 9 15:29:48 2018 -0500

Allow trsm_l parallelism in the jc loop.

Details:
- Previously, trsm was consolidating all ways of parallelism into the jr
loop. This was unnecessary and to some degree detrimental on some
types of hardware. Now, any parallelism bound for the jc loop will be
applied to the jc loop, while all other loops' parallelism is funneled
to the jr loop. Thanks to Devangi Parikh for helping investigate this
issue and suggesting the fix.
- NOTE: This change affects only left-side trsm. However, currently
right-side trsm is currently implemented in terms of the left-side
case, and thus the change effectively applies to both left and right
cases.

commit f1dba506c970f14e612580d3c171e7c5ffd0a5fb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 8 17:59:41 2018 -0500

Output threading status/params from testsuite.

Details:
- Updated testsuite to output various parameters related to parallelism
in BLIS. These parameters include:
- threading status: disabled, openmp, or pthreads;
- thread partitioning for jr/ir loops: slab or rr (round-robin);
- ways of parallelism from environment variables, and also actual
values used by gemm, herk, trmm_l, trmm_r, trsm_l, and trsm_r for
square problems (assuming all dimensions are set to 1000);
- automatic thread factorization parameters.
- Also output the status of two relatively new configure-time options:
libmemkind and the sandbox.

commit 10f179fb13fc1179921a4ef8efdd2174f01e07da
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 8 14:36:38 2018 -0500

Updated irun.py to use updated column index range.

Details:
- Updated the irun.py script so that it updates the matlab column index
range (if found) to reflect the additional columns of data that are
substituted in. Thanks to Devangi Parikh for recognizing and reporting
this issue.

commit c244a716c97849dee41f52b5f424116aae1b710b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Oct 7 20:59:40 2018 -0500

Added missing -r option to configure --help output.

Details:
- Added inadvertantly-omitted mention of -r option-equivalent to
--thread-part-jrir to the output for 'configure --help'. Also made
minor edits to the same text.

commit c92762ecdca1eb0b08c8acd583b4739a1e3fbd39
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Oct 7 20:30:32 2018 -0500

Added option of slab or rr partitioning in jr/ir.

Details:
- Updated existing macrokernel function names and definitions to
explicitly use slab assignment of micropanels to threads, then created
duplicate versions of macrokernels that explicitly use round-robin
assignment instead of slab. NOTE: As in ac18949, trsm_r macrokernels
were not substantially updated in this commit because they are
currently disabled in bli_trsm_front.c.
- Updated existing packing function (in blk_packm_blk_var1.c) to
explicitly use slab partitioning, and then duplicated for round-robin.
- Updated control tree initialization to use the appropriate macrokernel
and packm function pointers depending on which method (slab or rr) was
enabled at configure-time.
- Updated configure script to accept new --thread-part-jrir=[slab|rr]
option (-m [slab|rr] for short), which allows the user to explicitly
request either slab or round-robin assignment (partitioning) of
micropanels to threads.
- Updated sandbox/ref99 according to above changes.
- Minor updates to build/add-copyright.py.

commit 98e01ea04bfe1032e5bd4781043afd84f864a19e
Merge: ac18949a 541b8a3b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 4 20:44:12 2018 -0500

Merge branch 'master' into amd

commit 541b8a3b3e9af4078f5e6fb2f9608d681839952a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Oct 4 20:39:06 2018 -0500

Removed 1h short-circuit from bli_clock_min_diff().

Details:
- Removed a guard from bli_clock_min_diff() that would return 0 if the
time delta was greater than 60 minutes. This was originally intended
to disregard extremely large values under the assumption that the
user probably didn't intend to run a test that long. However, since
it is in bli_clock_min_diff(), it doesn't actually help short-circuit
an implementation that is hanging or looping infinitely, since such
an implementation would first have to finish before the
bli_clock_min_diff() is called. Thanks to Kiran Varaganti for
reporting this issue.

commit f0c3ef359f7c6c1687fb2671cb35deb346e00597
Author: Kiran V <Kiran.Varagantiamd.com>
Date: Thu Oct 4 16:32:21 2018 +0530

This is a fix to floating-point exception error for BLIS SGEMM with larger matrix sizes.
BUG No: CPUPL-197 fixed by Thangaraj Santanu
The bli_clock_min_diff() function in BLIS assumed that if the time taken is greater than 1 hour then the reading must be wrong. However this is not the case in general, while the other checks such as time taken closer to zero or nsec is ofcourse valid.
gerrit review: http://git.amd.com:8080/#/c/118694/1/frame/base/bli_clock.c

Change-Id: I9dc313d7c5fdc20684f67a516bf3237de3e0694a

commit 8bf30eb4735872388b5317883d99b775a344ce25
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Wed Oct 3 22:22:29 2018 -0400

Fixed runme.sh in test/studies/thunderx2

Details:
- Fixed the setting of threads for a single core run.

commit f6f2456ba2afa8f85f43c7c2c90acc439d61d94f
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Wed Oct 3 21:43:46 2018 -0400

Fixed the Makefile in test/studies/thunderx2

Details:
- Fixed target for make-all-st and make-all-mt so that the armpl
targets are built

commit 743a1a6dec1bd3908f0f15513b501c9bd59715b3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 3 14:40:10 2018 -0500

Fixed misleading version query from gcc 7+.

Details:
- gcc 7 introduced new behavior to the -dumpversion option whereby only
the major version component is output. However, as part of this
change, gcc 7 also introduced a new option, -dumpfullversion, which is
guaranteed to always output the major, minor, and revision numbers. If
we are using gcc 7 or later, we re-query the version string with this
new option and then re-parse the result so as to avoid misleading
output from configure (e.g. using gcc 7.3.0 is reported as 7.7.7).

commit de07840ba5672b9d7b2ed2b918974e98c3f249fb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Oct 3 13:57:25 2018 -0500

Whitespace, https updates to README.md.

Details:
- Reformatted to fit all lines within 80 columns, unless a link is too
long to fit on a single line.
- Changed some links from http to https.

commit 80a8b3dd8034ec8bc03d31be3f9c837c3f6fc94b
Author: sraut <Biplab.Rautamd.com>
Date: Wed Oct 3 15:30:33 2018 +0530

Review comments incorporated for small TRSM.

Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9

commit b8dfd82e0d1afda4ee5436662d63515a59b2dee3
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 2 15:37:12 2018 -0500

Get pthreads via blis.h in the test driver.

commit d0c0c20b7bd3ecf914b5910a50f618fb7d7aa355
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 2 15:16:00 2018 -0500

There seems to be a problem with _POSIX_BARRIERS on Travis.

commit 0904d9e4df0c8a256ac35c491f14a587ebe9fca2
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 2 15:04:36 2018 -0500

*Always* use Windows primitives instead of pthreads.

commit 998317d309934cd7129f8c818ea6e5f07534ebc8
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 2 14:43:24 2018 -0500

Remove pthreads from appveyor build.

commit 627d0c5bfd4b7b149803587391c93b164c11ced5
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 2 14:40:55 2018 -0500

Combine the alternative barrier implementation for macOS with the pthread wrapper for Windows. Also implement pthread_{create,join} for Windows.

commit 81d2c064a209df7eca7d6103696ca3a137a7f82e
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 2 11:46:36 2018 -0500

Add wrapper for basic pthreads functionality (mutex, once) with MSVC.

commit d33f130ea621fca1dccb30631f454d237918eb04
Author: Devin Matthews <damatthewssmu.edu>
Date: Tue Oct 2 11:45:43 2018 -0500

Some configure changes:

1) Allow environment variables to be set anywhere in the argument list.
2) Allow any environment variable to be set.
3) Allow LIBPHTREAD to be set to null without getting defaulted to -lpthread.

commit 9d5f1c4f3bf70c2c0ea84bfa326a0113ae2d176c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 1 17:39:26 2018 -0500

Patch to avoid gcc warning in blastest/f2c/open.c.

Details:
- Use the modulo operator to limit the size of an integer that is given
to sprintf(). This avoids a warning in some versions of gcc about the
integer potentially overflowing the available space in the string into
which the integer is being printed.

commit 0c3cd00ba76de607e807f8deb04b1a2ce18ea7a8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 1 16:18:25 2018 -0500

More README.md updates.

Details:
- Replaced much of "Getting Started" section with a shortened version of
the bullet list of documentation currently shown in the github wiki
page. Thanks to Devangi Parikh for her feedback in this change.

commit 8eaf34bd23b30a1857a50d7142ee9811895f24bf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 1 14:29:07 2018 -0500

Very minor README.md update.

commit 599090e0eb41b2706fa1231fa7b90096f3281678
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Oct 1 14:04:30 2018 -0500

README.md update.

Details:
- Added language mentioning SHPC group to Introduction.

commit ee46fa3efb6e920fa6c3d0b0601007f5de31deb5
Author: sraut <Biplab.Rautamd.com>
Date: Mon Oct 1 16:30:30 2018 +0530

Small TRSM optimization changes :- 1) single precision small trsm kernels for XAt=B case are further optimized for performance. 2) double precision small trsm kernels for AX=B and XAtB cases are implemented. 3) single precision small trsm kernels for AutX=B are implemented in intrinsics to improve the current performance.

Change-Id: Ic9d67ae6d8522615257dde018903f049dcffa2cf

commit 08045a6c52b6e025652c5b18eb120c0f4e61cf6f
Author: sraut <Biplab.Rautamd.com>
Date: Mon Oct 1 15:38:23 2018 +0530

Corrected the fix made for blastest level-3 failure to check m,n,k non-zero condition in bli_gemm_small.c

Change-Id: Idaf9f2327c3127b04a2738ae8a058b83d6c57934

commit ac18949a4b9613741b9ea8e5026d8083acef6fe4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Sep 30 18:54:56 2018 -0500

Multithreading optimizations for l3 macrokernels.

Details:
- Adjusted the method by which micropanels are assigned to threads in
the 2nd (jr) and 1st (ir) loops around the microkernel to (mostly)
employ contiguous "slab" partitioning rather than interleaved (round
robin) partitioning. The new partitioning schemes and related details
for specific families of operations are listed below:
- gemm: slab partitioning.
- herk: slab partitioning for region corresponding to non-triangular
region of C; round robin partitioning for triangular region.
- trmm: slab partitioning for region corresponding to non-triangular
region of B; round robin partitioning for triangular region.
(NOTE: This affects both left- and right-side macrokernels:
trmm_ll, trmm_lu, trmm_rl, trmm_ru.)
- trsm: slab partitioning.
(NOTE: This only affects only left-side macrokernels trsm_ll,
trsm_lu; right-side macrokernels were not touched.)
Also note that the previous macrokernels were preserved inside of
the 'other' directory of each operation family directory (e.g.
frame/3/gemm/other, frame/3/herk/other, etc).
- Updated gemm macrokernel in sandbox/ref99 in light of above changes
and fixed a stale function pointer type in blx_gemm_int.c
(gemm_voft -> gemm_var_oft).
- Added standalone test drivers in test/3m4m for herk, trmm, and trsm
and minor changes to test/3m4m/Makefile.
- Updated the arguments and definitions of bli_*_get_next_[ab]_upanel()
and bli_trmm_?_?r_my_iter() macros defined in bli_l3_thrinfo.h.
- Renamed bli_thread_get_range*() APIs to bli_thread_range*().

commit b952ca8feb6f17f71a4512649c2aa72bdee9c8f4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 28 16:12:32 2018 -0500

CREDITS file update.

commit 7d96fc437ebaa9dd2d7071865b5df16402fadd64
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 28 15:40:45 2018 -0500

Allow slashes ('/') in version tags.

Details:
- Updated the configure script to allow slashes in version string. This
is needed so that downstream maintainers (such as those for Debian)
can create local tags such as "upstream/0.4.1". Thanks to M. Zhou for
reporting this issue via PR 256 and providing me the information
needed to debug the problem.

commit 5fdddf6f37c64da093c7f59e3a85214e819ae652
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 28 11:25:54 2018 -0500

Removed 'debian' directory.

Details:
- Removed the top-level 'debian' directory. This directory is apparently
no longer needed (issue 257). Thanks to M. Zhou and Nico Schlömer for
their contributions.

commit 9814cfdf3157ef4726ee604fc895d56e8063d765
Author: Meghana <meghana.vankadariamd.com>
Date: Fri Sep 28 11:02:39 2018 +0530

fixed blastest level-3 failure by adding ((M&N&K) != 0) to check condition in bli_gemm_small.c

Change-Id: I85e4a32996ebb880f3c00bd293edc38f74700fe6

commit 86330953b14c180862deef3ccdcc6431259be27b
Merge: 7af5283d 807a6548
Author: praveeng <praveen.gamd.com>
Date: Fri Sep 28 10:08:06 2018 +0530

Resolved conflicts and modified bli_trsm_small.c

Change-Id: I578d419cff658003e0fdd4c4cdc93145d951ce31

commit 60b2650d7406d266feffe232c2d5692a9e3886d0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 24 15:04:45 2018 -0500

Added statistics-collecting irun.py script.

Details:
- Added irun.py script to 'build' directory. This irun.py script is a
python script for repeatedly invoking a test driver executable, such
as those found in test/3m4m, and replace the performance output column
with four columns that aggregate statistics. Specifically, the script
reports the minimum, average, maximum, and standard deviation for each
problem size. This script is useful especially (though not
exclusively) when trying to determine the impact of relatively minor
changes to the code, or other small optimizations that may be
difficult to distinguish from "noise." One way this "noise" manifests
is that a test executable may run slightly slower or faster for all
problem sizes (and all implementations) tested by the executable over
the life of a single execution. The cause of these minor
across-the-board pertubations in the overall performance signatures is
unknown, though we hypothesize that it may relate to any number of
issues such as operating system scheduling, where in memory the
program is loaded, or how the CPU clock frequency is throttled at the
time of execution. Regardless of the source of these subtle
performance anomalies, the statistical properties reported by the
irun.py script help the user to more precisely characterize the
underlying performance exhibited by any given test driver, which
allows him or her to make better judgments about the true difference
in performance between two implementations, or minor changes within a
single implementation.

commit 807a654888117fb3a27ea36384f1c1c11b882cd5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 20 15:41:05 2018 -0500

Fixed confusing configure message for libmemkind.

Details:
- Corrected feedback echoed to user by configure when libmemkind is
found but not explicitly requested. In these cases, configure would
echo a message that it had received an explicit request to enable
libmemkind, which was not accurate, even if the end result was the
same--that libmemkind is enabled by default when it is found. Thanks
To Devangi Parikh for reporting this issue.

commit 02adab427c779b0aaf38a5877a5f0246b1909e8f
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Thu Sep 20 14:38:50 2018 -0400

Created a 'thunderx2' subdirectory within test/studies

Details:
- Created a 'thunderx2' subdirectory within test/studies to house
various level-3 test driver used to measure performance on
ThunderX2.

commit d7537fb51dac0636591fc7c68261a2322642ab3c
Merge: dad07245 c03728f1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Sep 12 15:24:20 2018 -0500

Merge branch 'dev'

commit dad07245dbcfaf35232ec379ba756eb133c361c1
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Wed Sep 12 04:16:58 2018 -0500

Fixed yet another bug in runme script in test/studies

Details:
- Fixed another copy-paste bug

commit e669057fe35f2037d8111af687d84a0ecf6d7a2a
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Tue Sep 11 22:29:42 2018 -0500

Fixed bug in runme script in test/studies

Details:
- Fixed bug in runme script for skx studies that set the number of
threads incorrectly

commit 232fdc3df3e01ae3f86d53767bd14eb93b511e6e
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Mon Sep 10 18:45:50 2018 -0500

Updated runme script in test/studies.

Details:
- Updated runme script for skx studies to run multithreading tests
on 1 and 2 sockets.

commit c03728f1f45edb5e434db90ab8a77ba0184a682b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 10 17:54:27 2018 -0500

Various minor cleanups.

Details:
- Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
unconditionally, but differently for Windows and non-Windows, but
then disabled the definition of bli_setenv() entirely since BLIS
no longer needs to set environment variables. Updated bli_winsys.h
accordingly, and call bli_sleep() from within testsuite instead of
sleep() directly.
- Use
if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
instead of
if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
when guarding against local definition of pthread barrier in
testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
should always be set to 200809L when barriers are supported, though I
won't be surprised if we encounter a case in the future where it is
set to something else such as 1 while still supported.)
- Removed old _VERS_CONF_INST definitions and installation rules in
top-level Makefile. These are no longer needed because we no longer
output libraries with the version and configuration name as
substrings.
- Comment/whitespace updates in Makefile, config.mk.in, common.mk,
configure, bli_extern_defs.h, and test_libblis.h.
- Added mention of 1m to README.md and other trivial tweaks.

commit e249a00a82908054ecd307cf602c8801275903e8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Sep 10 16:48:35 2018 -0500

Imported skx dgemm ukernel from skx-redux branch.

Details:
- Added the new bli_dgemm_skx_asm_16x14.c microkernel from the skx-redux
branch, along with appropriate blocksizes in bli_cntx_init_skx.c and
a prototype in bli_kernels_skx.h. (Devin has not yet written the
sgemm analague, so for now we will continue using the older sgemm
ukernel.)
- Updated frame/include/bli_x86_asm_macros.h with a minor change that
was present within the skx-redux branch.

commit e93b01ff60bf9742baa5eefd93e208d1219e7a43
Author: Isuru Fernando <isurufgmail.com>
Date: Sun Sep 9 15:57:43 2018 -0500

Windows DLL support (246)

* Enable shared

* Enable rdp

* Add support for dll

* Use libblis-symbols.def

* Fix building dlls

* Fix libblis-symbols.def

* Fix soname

* Fix Makefile error

* Fix install target

* Fix missing symbols

* Add BLIS_MINUS_TWO

* Add path to dll

* Fix OSX soname

* Add declspec for dll

* Add -DBLIS_BUILD_DLL

* Replace enable_shared in config

* switch to auto for now

* blis_ -> bli_

* Remove BLIS_BUILD_DLL in make check

* change auto->haswell

* enable_shared_01

* Add wno-macro-redefined

* print out.cblat3

* BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY

* Use V=1

* Remove fpic for windows

* Remember LIBPTHREAD

* Remove libm for windows

* Remember AR

* Fix remembering libpthread

* Add Wno-maybe-uninitialized in only gcc

* Don't do blastest for shared for now

* Fix install target

And remove unnecessary change

* test auto and x86_64

* Fix install target again

* Use IS_WIN variable

* Remove leading dot from LIBBLIS_SO_MAJ_EXT

* Make is_win yes/no

* Add comments for windows builds

* Change if else blocks location

commit 1330d5c4bc3b644ec0af54c3939a5b9f00eacd9c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 7 19:37:59 2018 -0500

Employ "user" cflags for tl Makefile test targets.

Details:
- Use get-user-cflags-for() to generate cflags when compiling BLAS test
drivers and BLIS testsuite from top-level Makefile. Meant to include
these changes in previous commit (4b5437e). Thanks to Isuru Fernando
for pointing out this oversight.

commit 4b5437ec7afb2befffffbb83f7872bcb4fc61e51
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Sep 7 17:24:32 2018 -0500

Define a cpp macro specific to BLIS compilation.

Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
itself is being built. This macro will not be defined when, for
example, the testsuite or example code compiles code local to those
applications. This was done in part by defining a new cflags function
get-user-cflags-for(), which is now the designated function for
application Makefiles if they wish to inherit a basic set of CFLAGS
from BLIS. (The compiler flags returned are identical to that of
get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
instead of get-frame-cflags-for().

commit cc2cca4f56eb30212a0dce3e5c121e64d9e59560
Merge: e19e7212 fb81c7fc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 6 17:12:13 2018 -0500

Merge branch 'dev'

commit e19e7212872da3d464734199193436faa51f0da0
Merge: 97965b09 b3d0702c
Author: Jeff Hammond <jeff.sciencegmail.com>
Date: Thu Sep 6 14:58:49 2018 -0700

Merge pull request 244 from kali/pthread-barrier-osx

add an adhoc impl for pthread_barrier

commit b3d0702cf2ef6dda19a23dd8a677be1b6f73c322
Merge: 4e7d0670 97965b09
Author: Jeff Hammond <jeff.sciencegmail.com>
Date: Thu Sep 6 14:58:23 2018 -0700

Merge branch 'master' into pthread-barrier-osx

commit 4e7d06700f176a62952d7d51e41fdcbc6b7a9d5f
Author: Mathieu Poumeyrol <kalizoy.org>
Date: Thu Sep 6 23:48:31 2018 +0200

second __APPLE__

commit fb81c7fc665d68e6a2add163feb29acc0bce8936
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Sep 6 16:29:39 2018 -0500

Defined cortexa53 sub-configuration.

Details:
- Added a new sub-configuration 'cortexa53', which is a mirror image
of cortexa57 except that it will use slightly different compiler
flags. Thanks to Mathieu Poumeyrol for making this suggestion after
discovering that the compiler flags being used by cortexa57 were
not working properly in certain OS X environments (the fix to which
is currently pending in pull request 245).

commit 24ecc0d94aaa9ab4df1ae6d199c4ec6d7783169f
Author: Mathieu Poumeyrol <kalizoy.org>
Date: Thu Sep 6 22:10:16 2018 +0200

use _POSIX_BARRIERS instead of __APPLE__

commit 97965b09059a610db06fb7a22bdfa79c0d37d673
Author: Mathieu Poumeyrol <kaliusers.noreply.github.com>
Date: Thu Sep 6 21:10:29 2018 +0200

cortexa9 and cortexa53 travis build + qemu test (245)

commit a6802eab7d94b5a9de633c53beca8245b74f5dc6
Author: Mathieu Poumeyrol <kalizoy.org>
Date: Thu Sep 6 17:16:35 2018 +0200

reinstantiate test on macos

commit d688a2b7e5a19cba44ea398a99e325e19b8fce50
Author: Mathieu Poumeyrol <kalizoy.org>
Date: Thu Sep 6 15:25:16 2018 +0200

add an adhoc impl for pthread_barrier

commit ab9f9e684dc3ffbb70cc45b21c67af5d916919e5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 30 15:14:02 2018 -0500

CHANGELOG update (0.4.1)

0.4.1

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 30 15:13:59 2018 -0500

Version file update (0.4.1)

commit 08dd67c4b21244851f8416bd59159bea7a9c5b3d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 30 15:12:13 2018 -0500

ReleaseNotes.md update in advance of next version.

commit 4fa4cb0734e7de6505b5d6f1aeef3a5d5c89dcbb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 29 18:06:41 2018 -0500

Trivial comment header updates.

Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.

commit b051ffb815baf6c3ece2b5118b679fd9219d5780
Merge: 6f33d9de aaa549f4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 29 17:06:48 2018 -0500

Merge branch 'dev'

commit 6f33d9de21fbc2f579846b9104fb9d513753f79c
Author: Mathieu Poumeyrol <kaliusers.noreply.github.com>
Date: Wed Aug 29 23:48:22 2018 +0200

fix compilation of armv7a kernels (242)

commit 8199e339aefdd27019c7f3d8c99818d375d5400b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 27 07:00:12 2018 -0500

Added testsuite threading to input.general.fast.

Details:
- Added lines associated with the testsuite's new threading option to
input.general.fast. This change was intended for the previous commit
(10d0735).

commit 10d07357afbb2d468837aa97369ef9a6d0610817
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 20:34:30 2018 -0500

Better thread safety; added threading to testsuite.

Details:
- Replaced critical sections that were conditional upon multithreading
being enabled (via pthreads or OpenMP) with unconditional use of
pthreads mutexes. (Why pthreads? Because BLIS already requires it
for its initialization mechanism: pthread_once().) This was done in
bli_error.c, bli_gks.c, bli_l3_ind.c. Also, replaced usage of BLIS's
mtx_t object and bli_mutex_*() API with pthread mutexes in
bli_thread.c. The previous status quo could result in a race condition
if the application called BLIS from more than one thread. The new
pthread-based code should be completely agnostic to the application's
threading configuration. Thanks to AMD for bringing to our attention
the need for a thread-safety review.
- Added an option to the testsuite to simulate application-level
multithreading. Specifically, each thread maintains a counter that is
incremented after each experiment. The thread only executes the
experiment if: counter % n_threads == thread_id. In other words, the
threads simply take turns executing each problem experiment. Also,
POSIX guarantees that fprintf() will not intermingle output, so
output was switched to fprintf() instead of libblis_test_fprintf().
- Changed membrk_t objects to use pthread_mutex_t intead of mtx_t and
replaced use of bli_mutex_init()/_finalize() in bli_membrk.c with
wrappers to pthread_mutex_init()/_destroy().
- Changed the implementation of bli_l3_ind_oper_enable_only() to fix
a race condition; specifically, two threads calling the function with
the same parameters could lead to a non-deterministic outcome.
- Added include <pthread.h> to bli_cpuid.c and moved the same in
bli_arch.c.
- Added 'const' to declaration of OPT_MARKER in bli_getopt.c.
- Added include <pthread.h> to bli_system.h.
- Added add-copyright.py script to automate adding new copyright lines
to (and updating existing lines of) source files.

commit aaa549f4d1e63929fe2bea023ce849253cfbbb42
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 20:13:51 2018 -0500

Minor update to configure --help (--sharedir option).

Details:
- Fixed/tweaked description for --sharedir=SHAREDIR option.

commit 573b8ac373f821a65cc8afd51cdbe03b8ec01081
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 13:51:32 2018 -0500

Fixed copy-paste typo in previous commit.

Details:
- Fixed a typo in travis/do_testsuite.sh introduced in 62ea1d3.

commit 62ea1d33d3bc1e890420a1e828b9d0e87e87533b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Aug 26 13:35:53 2018 -0500

Fixed broken out-of-tree builds.

Details:
- Fixed stale filepaths to check-blastest.sh and check-blistest.sh in
travis/do_testsuite.sh and travis/do_sde.sh.
- Create a symbolic link to the 'config' directory so that the top-level
Makefile can find the configs' make_defs.mk files during out-of-tree
builds.
- Added additional case handling to out-of-tree scenario to handle
situations where files 'Makefile', 'common.mk', or 'config' exist but
are not symbolic links. In such cases, configure warns the user and
exits.
- Homogenized various error messages throughout configure.
- Belated thanks to Victor Eijkhout for requesting the feature added
in 0f491e9 whereby lesser Makefiles can compile and link against
an existing installation of BLIS.

commit 0f491e994a7e14d4dfce26e6a51dba2bccad29a3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 25 20:12:36 2018 -0500

Allow lesser Makefiles to reference installed BLIS.

Details:
- Updated the build system so that "lesser" Makefiles, such as those in
belonging to example code or the testsuite, may be run even if the
directory is orphaned from the original build tree. This allows a
user to configure, compile, and install BLIS, delete the build tree
(that is, the source distribution, or the build directory for out-
of-tree builds) and then compile example or testsuite code and link
against the installed copy of BLIS (provided the example or testsuite
directory was preserved or obtained from another source). The only
requirement is that make be invoked while setting the
BLIS_INSTALL_PATH variable to the same installation prefix used when
BLIS was configured. The easiest syntax is:

make BLIS_INSTALL_PATH=/install/prefix

though it's also permissible to set BLIS_INSTALL_PATH as an
environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
blastest and testsuite, respectively, so that if those directories are
copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
of building/linking against an installed copy of BLIS.

commit 36ff92ce0d3b428b15b6cddc6f5944afe22e43ec
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 18:26:09 2018 -0500

Missing C++ compiler no longer fatal to configure.

Details:
- Changed configure so that the absence of any C++ compiler from the
pre-defined search list does not result in an exit. Instead, in this
situation, the found_cxx variable is assigned 'c++notfound' and the
error message is changed to remind the user that C++ will not be
available in the sandbox. Thanks to Devangi Parikh for reporting this
issue.
- Also tweaked the message when a C++ compiler *is* found to remind any
would-be confused user that BLIS will only use C++ if it is needed by
code in the sandbox.

commit 658f0a129bdc565b072696b6ebddce501132091c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 17:49:37 2018 -0500

Fixed obscure integer size bug in va_arg() usage.

Details:
- Fixed a bug in the way that the variadic bli_cntx_set_l3_nat_ukrs()
function was defined. This function is meant to take a microkernel id,
microkernel datatype, microkernel address, and microkernel preference
as arguments, and is typically called within the bli_cntx_init_*()
function defined within a sub-configuration for initializing an
appropriate context. The problem is with the final argument: the
microkernel preference. These preferences are actually boolean values,
0 or 1 (encoded as FALSE or TRUE). Since the variadic function does
not give the compiler any type information for any variadic arguments,
they are "promoted" in the course of internal (macroized) processing
according to default argument promotion rules. Thus, integer literals
such as 0 and 1 become int and floating-point literals (such as 0.0 or
1.0) become double. Previous to this commit, we indicated to va_arg()
that the ukernel preference was a 'bool_t', which is a typedef of
int64_t on 64-bit systems. On systems where int is defined as 64 bits,
no problems manifest since int is the same size as the type we passed
in to va_arg(), but on systems where int is 32 bits, the ukernel
preference could be misinterpreted as a garbage value. (This was
observed on a modern armv8 system.) The fix was to interpret the
bool_t value as int and then immediately typecast it to and store it
as a bool_t. Special thanks to Devangi Parikh for helping track down
this issue, including deciphering the use of va_arg() and its
byzantine treatment of types.
- Added explicit typecasts for all invocations of va_arg() in
bli_cntx.c.

commit e71dc389120b032e42091e4d1a928515ed6f7275
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 15:56:04 2018 -0500

Fixed a very minor memory leak in gks.

Details:
- Fixed a memory leak in the global kernel structure that resulted in 56
bytes per configured architecture (of which only 18 are presently
supported by BLIS). The leak would only manifest if BLIS was
initialized and then finalized before the application terminated.
Thanks to Devangi Parikh for helping track down this leak.

commit a7e3a5f9753468c8e665e6c5c3b38d22b7c92500
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 24 14:51:11 2018 -0500

Fixed uncallable bli_finalize().

Details:
- Previously, bli_finalize_once()--which, like bli_init_once(), was
implemented in terms of pthread_once()--was using the same
pthread_once_t control object being used by bli_init(), thus
guaranteeing that it would never be called as long as BLIS had already
been initialized. This could manifest as a rather large memory leak to
any application that attempted to finalize BLIS midway through its
execution (since BLIS reserves several megabytes of storage for
packing buffers per thread used). The fix entailed giving each
function its own pthread_once_t object. Thanks to Devangi Parikh for
helping track down this very quiet bug.

commit a79c21c7c17fb4854fd24c73b81ec5543f74082d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 23 14:40:46 2018 -0500

Fixed cleanmk target post-1b0f8d6.

Details:
- Changed the cleanmk target to delete makefile fragments from their new
home in obj/$(CONFIG_NAME). The old definition worked only because of
a typo (REFERKN_PATH instead of REFKERN_PATH), and only in the
non-verbose (V != 1) case.

commit ffb57242f3eb1175c991fe1b492595fdaa175c27
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 18:22:41 2018 -0500

Cosmetic output changes to configure.

Details:
- Disable sandbox-related obj directory creation, directory mirroring,
and makefile fragment generation when a sandbox is not enabled.
- Prevent various duplicate actions by configure (such as those
mentioned above for sandboxes above).

commit ac17454aae9ad430f05aa7c156919c6c695c300c
Merge: a77bec76 7afd095a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 15:34:53 2018 -0500

Merge branch 'master' into dev

commit a77bec766a01e42f13f8cacbec8c4cbde8ecefef
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 15:31:29 2018 -0500

Whitespace changes, minor renames in build system.

Details:
- Minor whitespace cleanup, mostly in the form of spaces -> tabs.
- Shortened certain variables' _FRAGMENT_ infixes to _FRAG_ in
common.mk.

commit 1b0f8d60d1132b56485cc202ebf1246898d3a2a4
Author: Devin Matthews <damatthewssmu.edu>
Date: Wed Aug 22 13:19:29 2018 -0700

Generate makefile fragments in build tree (240)

* Make src dir read-only in out-of-tree build test.

* Generate makefile fragments in the build tree.

commit 7afd095af33690e0175903852b354c9fe46993f6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 14:58:24 2018 -0500

Removed skx from code snippet in previous commit.

Details:
- The docs/ConfigurationHowTo.md document was written with examples that
did not yet contain the skx sub-configuration, but the previous commit
included bli_arch.c code copied and pasted from a recent commit that
does support skx. To keep things consistent, I've removed skx from the
recently-added ConfigurationHowTo.md code snippet.

commit 48211a980d78673133076e8eced1007b1980f5e6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 22 14:55:02 2018 -0500

Update to docs/ConfigurationHowTo.md.

Details:
- Added missing language directing the reader to modify the config_name
string array in bli_arch.c when adding a new sub-configuration. Thanks
to Devangi Parikh for reporting this missing section.

commit 65c9096c6e21f3dc2947fa12be9ea3034f8662dc
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Aug 17 11:44:12 2018 -0500

Fixed broken -p option to configure.

Details:
- Fixed some stale code that was preventing the -p option to configure
from working as expected (though the --prefix option was unaffected).
This bug was was most likely introduced in 7e5648c (May 7 2018).
Thanks to Dave Love for reporting this issue.

commit e358d5e497c77b305af462f44266370a596445e2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Aug 16 12:18:45 2018 -0500

README.md update (Funding section).

commit a61dd5e7bcf23f7237d407a5e06dd44e1bec9ad0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 14 17:08:03 2018 -0500

Changed 'test' target to be more like 'check'.

Details:
- Redefined the 'test' make target in the top-level Makefile so that the
final result ("everything passed" or at "least one failure") is echoed
to stdout. Note that 'check' is unchanged, and thus is now effectively
a fast version of 'test'.
- Updated docs/BuildSystem.md to reflect the above change.

commit ce5c3a198a7ae1ca676c27da4541d51ed19d16e1
Merge: 4f6745d6 0bbe69d5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 14 16:52:19 2018 -0500

Merge branch 'master' of github.com:flame/blis

commit 4f6745d68a2c66511695eff0beb00a82ffc6bbbe
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 14 16:50:47 2018 -0500

Fixed link error when building only shared library.

Details:
- Fixed a linker error that occurred when attempting to compile and link
the testsuite and/or BLAS test drivers after having configured BLIS to
only generate a shared library (no static library). The chosen
solution involved
(1) adding the local library path, $(BASE_LIB_PATH), to the search
paths for the shared library via the link option
-Wl,-rpath,$(BASE_LIB_PATH).
(2) adding a local symlink to $(BASE_LIB_PATH) that uses the .so major
version number so that ld would find the shared library at
execution time.
Thanks to Sajid Ali for reporting this issue, to Devin Matthews for
pointing out the need for the -rpath option, and to Devangi Parikh for
helping Sajid isolate the problem.
- Added include <ctype.h> to bli_system.h to avoid a compiler warning
resulting from using toupper() from bli_string.c without a prototype.
Thanks again to Sajid Ali, whose build log revealed this compiler
warning.
- Added '*.so.*' to .gitignore.
- CREDITS file update.

commit 0bbe69d5ed260849297d8f2d35b7668d167482ed
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Tue Aug 14 14:49:58 2018 -0500

Updated plotting scripts in test/studies.

Details:
- Fixed indexing on plots to correspond to the removal of dtime in
the test drivers.

commit e93e0e149e087e08eca2885f1a748a4e88ffe55d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 15:54:30 2018 -0500

Removed redefinition of axpyv, scal2v func types.

Details:
- Removed a stray/accidental redefinition of axpyv and scal2v function
types in frame/1d/bli_l1d_ft.h (probably a copy/paste leftover during
development).

commit 1deb33bd16349aaa643694d1bd685ff8a9a5f476
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 15:02:50 2018 -0500

Updated penryn kernels to use new _ker_ft type names.

Details:
- Updated older _ft kernel type suffixes used within penryn level-1v
and -1f kernels to use the newer _ker_ft suffix that was introduced
in 0175483. (Thank you Travis CI.)

commit 9cb0b023ca91abdc056d726cdc070062e4954611
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 14:21:07 2018 -0500

INSTALL file update.

commit 017548314f3f78f66fbe3264509ac5302bd8d62b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Aug 7 14:13:25 2018 -0500

Replaced function chooser macros w/ func ptr arrays.

Details:
- Previously, most object API functions (_oapi.c) used a function
chooser macro that would expand out to an if-elseif-elseif-else
conditional that used a num_t datatype to call the appropriate
type-specific API (_tapi.c). This always felt a little hackish, and
would get in the way somewhat of addig support for new num_t datatypes
in the future. So, I've replaced that functionality with code that
queries a function pointer that is then typecast appropriately. This
model of function calling was already pervasive for kernels queried
from the cntx_t structure. It was also already in use in various other
functions, such as macrokernels, and this commit simply extends that
pattern.
- The above change required many new files, mostly header files, that
define the function types (mostly _ft.h) for the queriable functions
as well as some source files to define the function pointer arrays and
their corresponding query functions (_fpa.c). Various other function
types, mostly for kernel function types, were renamed to reduce the
potential for confusion with the function types for expert and basic
(non-expert) typed API functions.
- Removed definitions for all of the "bli_call_ft_*()" function chooser
macros from bli_misc_macro_defs.h.

commit addce089664561f9f63efa6f107e58fc48d29871
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Aug 6 13:18:20 2018 -0500

Format spec and other updates in test, test/3m4m.

Details:
- Removed the dtime (delta time, or wallclock time) column from the
matlab output of all test drivers in test, test/3m4m, test/studies.
This value was rarely (if ever) really needed and usually only served
to take up screen space.
- Updated format specifier in test/studies/skx to use %7.2f instead of
%6.3f.
- For the test drivers in 'test' directory, added an initial line of
output that sets last entry of matlab matrix to zero in order to
induce a pre-allocation of the entire array of performance results.

commit 94d5ef42c833a4d43e50a80d46dddbd7a56d2db6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Aug 4 15:57:17 2018 -0500

Adjusted gflops format spec in testsuite, test/3m4m.

Details:
- Changed the format specifier for the gflops column in the testsuite
output from %7.3f to %7.2f. This was done mainly to keep the output
aligned properly when the expected perfomance exceeded 1000 gflops.
Also, two decimal places still conveys plenty of precision for all
practical applications, including just eyeballing performance deltas
between two executions (let alone two implementations).
- Changed the format specifier for gflops in the test/3m4m drivers
from %6.3f to %7.2f (for the same reasons listed above).

commit c7ff06bae92b9b6c6656f2030d13486b95417821
Merge: 6074082c ebe998d0
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Wed Aug 1 14:20:41 2018 -0500

Merge branch 'master' of https://github.com/flame/blis

commit 6074082cd359dd775ef72478f8f3a281c5a6a6f9
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Wed Aug 1 13:30:51 2018 -0500

Fixed bug in bli_cntx_set_packm_ker_dt() implementation.

Details:
- Fixed bug in static function bli_cntx_set_[packm/unpackm]_ker_dt(), which
were incorrectly calling bli_cntx_get_[packm/unpackm]_ker_dt to get the
corresponding func_t.

commit ebe998d06cc56a9a9d66990b6ebf683d6fd0efdf
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 1 13:24:00 2018 -0500

Fixed typos in BuildSystem.md from previuos commit.

commit e72a344e94c5ae253f69b60f41d92ca89a5d1d1c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Aug 1 13:00:38 2018 -0500

Added table of 'make' targets to BuildSystem.md.

Details:
- Added a new section to BuildSystem.md that describes the most useful
make targets defined in the top-level Makefile.

commit 4f60d0288e00586dc921ff57db851f1266ff8e70
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 30 19:22:57 2018 -0500

README.md, comment updates.

Details:
- Added links, and sandbox language to README.md.
- Adjusted some comments in high-level level-3 object functions to make
clear what bli_thread_init_rntm() does.

commit 455d3f49e5c8362395be14c79e6adb5123e29623
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jul 29 18:31:29 2018 -0500

Edits to object/typed API, multithreading docs.

commit 922a1c05e06f52c97fb369870dce07233e61c4c9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 20:15:55 2018 -0500

More tweaks to README.md.

commit a7a0cf2b5d9f1dea5061c0f20eeaf371dfd4ea12
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:59:31 2018 -0500

More edits to docs/Multithreading.md.

commit be21d0cf68c330fd0d2048465a43ddc59d0b9d6c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:46:51 2018 -0500

Fixed typos in docs/Multithreading.md.

commit eac07c7b4f7a41c68d63f1e67141b2b58009609e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:45:28 2018 -0500

Edits to docs/Multithreading.md.

commit 5438375a032273b46ae626fee909ffc05f48ab72
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:34:21 2018 -0500

Fixed link in README.md.

commit 1f1a237d3f0b24d71ce2d7ee52d8a84f8e6a29ad
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:33:28 2018 -0500

Fixed links in BLISTypedAPI.md.

commit 89c8806e3aa49310f36c0314c5f6956c83a627a1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:30:56 2018 -0500

Minor doc fixes to previous commit.

commit b8c7574f84873b9c408f70c29c41ce464df57c2d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 28 16:27:09 2018 -0500

README.md, typed/object API updates.

Details:
- Updated the typed and object APIs to include language on the rntm_t
parameters in the expert interfaces.
- Updated README to include link to object API.

commit 29c34c4adb02d91fb34d1ccc0e821d6cfb7ce5c5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:26:19 2018 -0500

CREDITS file update.

commit 55a04edf52ac4f16c51b738bc884684adc1f1777
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:10:46 2018 -0500

CHANGELOG update (0.4.0)

0.4.0

Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:10:43 2018 -0500

Version file update (0.4.0)

commit b86cf13793b07f35c027a56c9faec8f4b6279d3e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:08:21 2018 -0500

Release Notes update in advance of next version.

commit a8b4084a0e04e47ac02ceae93a2018f5363e1205
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 16:07:26 2018 -0500

CREDITS file update.

commit 8e10cac5f388ac961c3d77b0a465214e7c9dc91a
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 27 14:45:35 2018 -0500

Updates to CREDITS, RELEASING, config/README.md.

Details:
- Added individuals' github handles to CREDITS file.
- Updated RELEASING, config/README.md files.

commit 401b69c8f26a86726ac5e1fb4f9fc2d2098ef204
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 17:55:13 2018 -0500

More indentation in docs/ConfigurationHowTo.md.

commit 1c6a1b921ef96999bb449d657cca6d9a556f7245
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 17:14:58 2018 -0500

Trying new indentation in ConfigurationHowTo.md.

Details:
- Modified a few sections to take advantage of a feature of markdown
that allows a bullet or enumeration to have multiple paragraphs. This
is a trial run to make sure the indentation looks good when rendered
in a web browser.

commit 71f978719527fcf17617cb234e48bf349a76c12d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 15:55:36 2018 -0500

Whitespace changes to macrokernels' func ptr defs.

commit 87d57c31c2bfcf4609dfe31ce915e9345150e613
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 14:20:18 2018 -0500

Various minor updates to typed, object API docs.

commit fb6e16268aaafbab2fd78d47cbf821e2152261fd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 25 14:17:28 2018 -0500

Consolidated prototypes in bli_l1v_tapi.h.

Details:
- Consolidated typed API function prototypes in bli_l1v_tapi.h by
leveraging identical function signatures between operations.
- Removed 'restrict' keyword since it is not actually present in the
function definitions.

commit af60d738f21340ccb0903e6c87dbf6af4fc44fc0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 24 15:35:52 2018 -0500

Finished object creation part of BLISObjectAPI.md.

Details:
- Filled in remaining section on object creation function reference
of BLISObjectAPI.md. All object management functions demonstrated as
part of the example code in examples/oapi are now documented, as well
as some other functions that are not shown in the example code.
- Updated variuos links (mostly in function index) to correctly point to
the object API reference instead of the typed API reference.
- Added documentation to getijm, setijm.

commit 8217a6a3b68382c62f016c658d337e6086112fef
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 24 13:13:10 2018 -0500

Moved sandbox README.md to docs/Sandboxes.md.

Details:
- Relocated sandbox/ref99/README.md to docs/Sandboxes.md and made minor
edits to the document.

commit b7db29332394324ffd1a73c3847a75e9a5b38c8d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 19 11:14:30 2018 -0500

Explicitly typecast return vals in static funcs.

Details:
- Added explicit typecasting to various functions (mostly static
functions), primarily those in bli_param_macro_defs.h,
bli_obj_macro_defs.h, bli_cntx.h, bli_cntl.h, and a few other header
files.
- This change was prompted by feedback from Jacob Gorm Hansen, who
reported that including "blis.h" from his application caused a
gcc to output error messages (relating to types being returned
mismatching the declared return types) when used via the C++ compiler
front-end. This is the first pass of fixes, and we may need to
iterate with additional follow-up commits (233).

commit fa08e5ead95f9d757af6ab5b095a8bf131e3874d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 17 19:02:15 2018 -0500

Fixed minor issues in ecbebe7 with mt disabled.

Details:
- Fixed an unused variable warning in frame/base/bli_rntm.c when
multithreading is disabled.
- Fixed a missing variable declaration in bli_thread_init_rntm_from_env()
when multithreading is disabled.

commit ecbebe7c2e43950dfa369f71c2b83cabe348a046
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 17 18:37:32 2018 -0500

Defined rntm_t to relocate cntx_t.thrloop (235).

Details:
- Defined a new struct datatype, rntm_t (runtime), to house the thrloop
field of the cntx_t (context). The thrloop array holds the number of
ways of parallelism (thread "splits") to extract per level-3
algorithmic loop until those values can be used to create a
corresponding node in the thread control tree (thrinfo_t structure),
which (for any given level-3 invocation) usually happens by the time
the macrokernel is called for the first time.
- Relocating the thrloop from the cntx_t remedies a thread-safety issue
when invoking level-3 operations from two or more application threads.
The race condition existed because the cntx_t, a pointer to which is
usually queried from the global kernel structure (gks), is supposed to
be a read-only. However, the previous code would write to the cntx_t's
thrloop field *after* it had been queried, thus violating its read-only
status. In practice, this would not cause a problem when a sequential
application made a multithreaded call to BLIS, nor when two or more
application threads used the same parallelization scheme when calling
BLIS, because in either case all application theads would be using
the same ways of parallelism for each loop. The true effects of the
race condition were limited to situations where two or more application
theads used *different* parallelization schemes for any given level-3
call.
- In remedying the above race condition, the application or calling
library can now specify the parallelization scheme on a per-call basis.
All that is required is that the thread encode its request for
parallelism into the rntm_t struct prior to passing the address of the
rntm_t to one of the expert interfaces of either the typed or object
APIs. This allows, for example, one application thread to extract 4-way
parallelism from a call to gemm while another application thread
requests 2-way parallelism. Or, two threads could each request 4-way
parallelism, but from different loops.
- A rntm_t* parameter has been added to the function signatures of most
of the level-3 implementation stack (with the most notable exception
being packm) as well as all level-1v, -1d, -1f, -1m, and -2 expert
APIs. (A few internal functions gained the rntm_t* parameter even
though they currently have no use for it, such as bli_l3_packm().)
This required some internal calls to some of those functions to
be updated since BLIS was already using those operations internally
via the expert interfaces. For situations where a rntm_t object is
not available, such as within packm/unpackm implementations, NULL is
passed in to the relevant expert interfaces. This is acceptable for
now since parallelism is not obtained for non-level-3 operations.
- Revamped how global parallelism is encoded. First, the conventional
environment variables such as BLIS_NUM_THREADS and BLIS_*_NT are only
read once, at library initialization. (Thanks to Nathaniel Smith for
suggesting this to avoid repeated calls getenv(), which can be slow.)
Those values are recorded to a global rntm_t object. Public APIs, in
bli_thread.c, are still available to get/set these values from the
global rntm_t, though now the "set" functions have additional logic
to ensure that the values are set in a synchronous manner via a mutex.
If/when NULL is passed into an expert API (meaning the user opted to
not provide a custom rntm_t), the values from the global rntm_t are
copied to a local rntm_t, which is then passed down the function stack.
Calling a basic API is equivalent to calling the expert APIs with NULL
for the cntx and rntm parameters, which means the semantic behavior of
these basic APIs (vis-a-vis multithreading) is unchanged from before.
- Renamed bli_cntx_set_thrloop_from_env() to bli_rntm_set_ways_for_op()
and reimplemented, with the function now being able to treat the
incoming rntm_t in a manner agnostic to its origin--whether it came
from the application or is an internal copy of the global rntm_t.
- Removed various global runtime APIs for setting the number of ways of
parallelism for individual loops (e.g. bli_thread_set_*_nt()) as well
as the corresponding "get" functions. The new model simplifies these
interfaces so that one must either set the total number of threads, OR
set all of the ways of parallelism for each loop simultaneously (in a
single function call).
- Updated sandbox/ref99 according to above changes.
- Rewrote/augmented docs/Multithreading.md to document the three methods
(and two specific ways within each method) of requesting parallelism
in BLIS.
- Removed old, disabled code from bli_l3_thrinfo.c.
- Whitespace changes to code (e.g. bli_obj.c) and docs/BuildSystem.md.

commit 323eaaab99752858b12e81e2eb8e416f009a3028
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Fri Jul 13 11:40:06 2018 -0500

Removed left over code from plotting scripts.

commit 60c197736495b47ce974ffb9b43874d1ebcfe78c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 12 19:22:14 2018 -0500

Documented accessor functions in BLISObjectAPI.md.

Details:
- Added documentation to docs/BLISObjectAPI.md for a handful of
commonly-used obj_t accessor functions.
- Minor updates to docs/BLISTypedAPI.md.

commit 77327ad796e11ef67df0cc91d45ed663598ba4df
Merge: 73b0b2a3 9fef8575
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Thu Jul 12 17:09:33 2018 -0500

Merge branch 'master' of https://github.com/flame/blis

commit 73b0b2a3ac1be6dfbe85c116886b4e29d98ac945
Author: Devangi N. Parikh <dnpcs.utexas.edu>
Date: Thu Jul 12 16:53:10 2018 -0500

Created hardware-specific test driver directory.

Details:
- Created a 'studies' subdirectory within 'test' to be used to house
test drivers, makefiles, run scripts, matlab plot code, and related
files that have been customized for collecting performance data on
specific host machines or product lines. This new setup will help us
catalog, track, and share test driver materials over time, and in a
way that facilitates reproducibility.
- Created an 'skx' subdirectory within 'test/studies' to house various
level-3 test driver files used to measure performance on SkylakeX
nodes (specifically, those nodes used by TACC's stampede2 system).

commit 9fef85756d15ee0f977fff6e57acd01c20cba184
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 11 18:40:30 2018 -0500

Cleaned up loose ends in BLISObjectAPI.md.

Details:
- Deleted some lines from the API function signatures that did not
belong (and were only left over from the copy-paste of the typed API).
- Fixed some paragraph-in-bullet indentation.

commit 80ddeae4629022b69fdf1f1b053a1fcba643c40c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 11 18:31:57 2018 -0500

Added BLISObjectAPI.md to docs.

Details:
- Added first draft of BLISObjectAPI.md. (Object management section is
still missing.)
- Small fixes to BLISTypedAPI.md found while writing BLISObjectAPI.md.
- In various .md files, changed verbatim blocks to language
attributes (e.g. c for C code).

commit 038442add39ce629fee0d960b212ce0c95138d46
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 11 12:24:18 2018 -0500

Added -lpthread to makefile example in BuildSystem.md.

Details:
- Added missing pthreads library linking to example makefile in
docs/BuildSystem.md, as well as similar language to build requirements
at the beginning of the document. Thanks to Stefanos Mavros for
bringing this to our attention.
- Updated CREDITS file.

commit bf10d8624e7b5902c9d9189c7c93f318b8e1b9a5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 9 18:40:13 2018 -0500

Small updates to KernelsHowTo.md, BLISTypedAPI.md.

Details:
- Minor updates to BLISTypedAPI.md, mostly to bring terminology
up-to-date with the new "typed API" classification.
- Added contents section to KernelsHowTo.md.

commit 1fd3bce59e43b422e62f9684bca9d1296a29edc3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 9 18:20:11 2018 -0500

Further updates to KernelsHowTo.md, BLISTypedAPI.md.

Details:
- Added missing level-1v operations to BLISTypedAPI (e.g. axpbyv,
xpbyv).
- Updated broken linkes in KernelsHowTo.md based on misnamed anchors.
- Other minor changes.

commit c40d30a6c920bd2e5a8353a3cd07a7e2b2265758
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jul 9 17:55:54 2018 -0500

Updated KernelsHowTo.md, BLISTypedAPI.md.

Details;
- Added missing (basic) information in KernelsHowTo.md for level-1f and
level-1v kernels.
- Updated section regarding contexts.

commit f8913c2bf91c0e0fb4e68aedf64a242a19db92a0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:35:13 2018 -0500

Fixed outdated scalv() calls in penryn l1f kernels.

Details:
- Fixed stale calls to dscalv() from the dotxf and dotxaxpyf penryn
kernels that were not updated during the basic/expert API separation
in e88aeda.

commit e78e71d549ac17ecd52c7b33008df1cd78f1b59e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:18:09 2018 -0500

Added README.md mention/link to examples/tapi.

Details:
- Added language to README.md to bring the reader's attention to the
example code for the typed API (in addition to those for the object
API).

commit 419ffb158573a26bfec47bac73e4394e7926a7b8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:14:23 2018 -0500

Updates to README.md.

Details:
- Updated wiki links according to renamed/relocated files in 'docs'.
- Converted links to relative paths.
- Added link to docs/Multithreading.md.

commit 7d3e8a7e5f1ec299d009fb6c9071f0c1b089b460
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 20:01:29 2018 -0500

Reverted docs/*.md links to relative paths.

Details:
- Within the documents in docs/*.md, reverted links to other local
documents to relative paths.
- Fixed some links/documents that did not yet have the '.md' suffix.
- Testing whether we can use relative links ('docs/BLISTypedAPI.md')
from within README.md.

commit d97c862c2b9170d774f414e63ae365488fffb4f5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 19:40:41 2018 -0500

Updated links (URLs) in docs/*.md.

Details:
- Updated most markdown links in the documents/wikis to use absolute
paths instead of the relative paths that were in use previously.
A few links were not updated, except for adding a ".md" to reflect
the documents' new names, in order to test whether relative
linking still works.

commit 3a0c12135875e0fb04de9798664e4fae632d994e
Merge: 2c7960c8 bcacddfa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 16:51:38 2018 -0500

Merge branch 'dev'

commit bcacddfad75b20969660606751eea6ead6c42ca9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 16:45:29 2018 -0500

Added 'docs' directory with wiki markdown files.

Details:
- Exported all github wikis to a new 'docs' directory.
- Renamed 'BLISAPIQuickReference' wiki to 'BLISTypedAPI' and removed
all cntx_t* arguments from the (now non-expert) APIs (with the
exception of the kernel APIs).
- Added section to BuildSystem documenting new ARG_MAX hack.

commit 3ee2bc0f7aa3b08da92331d64271bee99eaf8c1d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jul 7 16:02:16 2018 -0500

Renamed files that distinguish basic/expert APIs.

Details:
- Renamed various files that were previously named according to a
"with context" or "without context" convention. For example, the
following files in frame/3 were renamed:

frame/3/bli_l3_oapi_woc.c -> frame/3/bli_l3_oapi_ba.c
frame/3/bli_l3_oapi_wc.c -> frame/3/bli_l3_oapi_ex.c
frame/3/bli_l3_tapi_woc.c -> frame/3/bli_l3_tapi_ba.c
frame/3/bli_l3_tapi_wc.c -> frame/3/bli_l3_tapi_ex.c

Here, the "ba" is for "basic" and "ex" is for "expert". This new
naming scheme will make more sense especially if/when additional
expert parameters are added to the expert APIs (typed and object).

commit e88aedae735dfeb6fa5ac28d4527eb3ca58c6510
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jul 6 19:14:02 2018 -0500

Separated expert, non-expert typed APIs.

Details:
- Split existing typed APIs into two subsets of interfaces: one for use
with expert parameters, such as the cntx_t*, and one without. This
separation was already in place for the object APIs, and after this
commit the typed and object APIs will have similar expert and non-
expert APIs. The expert functions will be suffixed with "_ex" just as
is the case for expert interfaces in the object APIs.
- Updated internal invocations of typed APIs (functions such as
bli_?setm() and bli_?scalv()) throughout BLIS to reflect use of the
new explictly expert APIs.
- Updated example code in examples/tapi to reflect the existence (and
usage) of non-expert APIs.
- Bumped the major soname version number in 'so_version'. While code
compiled against a previous version/commit will likely still work
(since the old typed function symbol names still exist in the new API,
just with one less function argument) the semantics of the function
have changed if the cntx_t* parameter the application passes in is
non-NULL. For example, calling bli_daxpyv() with a non-NULL context
does not behave the same way now as it did before; before, the
context would be used in the computation, and now the context would
be ignored since the interace for that function no longer expects a
context argument.

commit 331694e52414c0cd50048daf880a9ace9e29b94a
Author: Isuru Fernando <isurufgmail.com>
Date: Fri Jul 6 09:07:38 2018 -0600

Fix windows build and enable x86_64 on appveyor (230)

* Upload artifacts built on appveyor (228)

* Upload artifacts

* Fix install in appveyor

* Remove windows.h in bli_winsys.c (229)

Looks like it is unneeded.

* Implemented ARG_MAX hack in configure, Makefile.

Details:
- Added support for --enable-arg-max-hack to configure, which will
change the behavior of make when building BLIS so that rather than
invoke the archiver/linker with all of the object files as command
line arguments, those object files are echoed to a temporary file
and then the archiver/linker is fed that temporary file via the
notation. An example of this can be found in the GNU make docs at
https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.

* Enable x86_64 and arg-max-hack on appveyor

* Use gas style assembly for clang on windows

commit a64a780d28c99d35f237f59212772e9beff35b3e
Merge: 89e178ce 3cb396d1
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Jul 6 09:38:42 2018 -0500

Merge pull request 231 from flame/travis-pr

Disable SDE for PRs

commit 3cb396d1ae4ee569f862db201c6a976712fd128e
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Fri Jul 6 09:19:44 2018 -0500

Disable SDE for PRs

Pull requests cannot use Travis secret variables, so SDE needs to be disabled. This PR should suffice as a test.

commit 2c7960c8416ee9b67364be5f2b210fd7a0aec4b5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 5 14:38:33 2018 -0500

Implemented ARG_MAX hack in configure, Makefile.

Details:
- Added support for --enable-arg-max-hack to configure, which will
change the behavior of make when building BLIS so that rather than
invoke the archiver/linker with all of the object files as command
line arguments, those object files are echoed to a temporary file
and then the archiver/linker is fed that temporary file via the
notation. An example of this can be found in the GNU make docs at
https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.

commit c422a5cd191d47e6aeb9cea6de0e348f46e3e318
Merge: b6470262 89e178ce
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jul 5 12:33:35 2018 -0500

Merge branch 'dev'

commit b6470262ea66c0f48a5b4d85ca4bf85c1fb2b3af
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Jul 4 19:14:29 2018 -0600

Remove windows.h in bli_winsys.c (229)

Looks like it is unneeded.

commit eac4bdf98691c5ec784af0dc11d1ad2269840661
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Jul 4 18:31:01 2018 -0600

Upload artifacts built on appveyor (228)

* Upload artifacts

* Fix install in appveyor

commit 89e178ce380439dea951925e33703dc4b979e914
Merge: d868eb3e e32b2ef9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 4 17:51:16 2018 -0500

Merge branch 'master' into dev

commit e32b2ef983ea1c3521dd3821116c0078690f125e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jul 4 17:49:39 2018 -0500

Update to CREDITS file.

commit 14648e137696484e0ff04f89b16c6b4183ea42b8
Author: Isuru Fernando <isurufgmail.com>
Date: Wed Jul 4 16:48:42 2018 -0600

Native windows support using clang (227)

* Add appveyor file

* Build script

* Remove fPIC for now

* copy as

* set CC and CXX

* Change the order of immintrin.h

* Fix testsuite header

* Move testsuite defs to .c

* Fix appveyor file

* Remove fPIC again and fix strerror_r missing bug

* Remove appveyor script

* cd to blis directory

* Fix sleep implementation

* Add f2c_types_win.h

* Fix f2c compilation

* Remove rdp and rename appveyor.yml

* Remove setenv declaration in test header

* set CPICFLAGS to empty

* Fix another immintrin.h issue

* Escape CFLAGS and LDFLAGS

* Fix more ?mmintrin.h issues

* Build x86_64 in appveyor

* override LIBM LIBPTHREAD AR AS

* override pthreads in configure

* Move windows definitions to bli_winsys.h

* Fix LIBPTHREAD default value

* Build intel64 in appveyor for now

commit b45ea92fc6f77f2313b50dbe95922f838cbead07
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jul 3 18:27:29 2018 -0500

Added typed (BLAS-like) API code examples.

Details:
- Added new example code to examples/tapi demonstrating how to use the
BLIS typed API. These code examples directly mirror the corresponding
example code files in examples/oapi. This setup provides a convenient
opportunity for newcomers to BLIS to compare and contrast the typed
and object APIs when they are used to perform the same tasks.
- Minor cleanups to examples/oapi.

commit d868eb3e200f657a1284c4cc933e7a4d25260dce
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 29 12:36:04 2018 -0500

Implemented bli_obj_scalar_cast_to().

Details:
- Implemented bli_obj_scalar_cast_to(), which will typecast the value in
the internal scalar of an obj_t to a specified datatype.
- Changed bli_obj_scalar_attach() so that the scalar value being attached
is first typecast to the storage datatype of the destination object
rather than the target datatype.
- Reformatted function type signatures in bli_obj_scalar.c as well as
prototypes in its corresponding header file.

commit 52d80b5f09517d80ac8a7c96983a576c1ec2080b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 29 12:30:44 2018 -0500

Fixed static funcs related to target and exec dts.

Details:
- Fixed incorrect bit shifts in the following static functions:
bli_obj_set_target_domain()
bli_obj_set_target_prec()
bli_obj_set_exec_domain()
bli_obj_set_exec_prec()
- Fixed incorrect bitmask in bli_dt_proj_to_single_prec().
- Updated bli_obj_real_part() and bli_obj_imag_part() so that it updates
the target and exec datatypes (in addition to the storage datatypes).

commit e006f2d0eeb229c1cd05a424496a774c29bdc5d7
Merge: bd8c55fe dafca7a0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 27 15:54:38 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit bd8c55fe268e8e352508341ebd739ef4fc68eb92
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 27 15:52:37 2018 -0500

Added dt_on_output field to auxinfo_t.

Details:
- Added a new field to the auxinfo_t struct that can be used, in theory,
to request type conversion before the microkernel stores/accumulates
its microtile back to memory.
- Added the appropriate get/set static functions to bli_type_defs.h.

commit dafca7a0c2c72aaf15cb588b2bef6f246abb1905
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jun 25 16:20:10 2018 -0500

Fix botched memory addressing in Penryn kernel (no effect for GAS output).

commit de493b0f349efebab98ab17f063d4d3d932c24c3
Merge: 195480be a7166feb
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jun 25 14:26:06 2018 -0500

Merge pull request 226 from devinamatthews/dev

Finish macroization of assembly ukernels.

commit 195480beb589db7d582646f556e855c611d4c3a9
Merge: 07c3d0a9 3f387ca3
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 25 13:24:21 2018 -0500

Merge branch 'master' into dev

commit 3f387ca35e42519f0d6a154814e4c8800fa2acb8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 25 12:32:03 2018 -0500

Fixed bugs in configure's select_cc() function.

Details:
- This commit fixes several bugs in configure relating to selecting a C
compiler. By dumb luck, two of the two bugs sort of cancelled each
other out in most use cases, which manifested as the expected behavior.
Thanks to Mathieu Poumeyrol for bringing this issue to our attention,
and to Devin Matthews for suggesting the more portable way of
capturing both stdout and stderr and suggesting a return code check
instead of testing stdout/stderr.
- The first bug: As the values of the compiler search list are iterated
over, only stderr is captured when querying a compiler with --version
rather than both stdout and stderr.
- The second bug: After each query, a conditional attempted to test
whether the query resulted in anything being output. That conditional
erroneously was using "-z" instead of "-n" for non-emptiness. Thus,
most of the time, stderr was empty (because the --version info was
being output on stdout), and since it was empty, the -z conditional
(intended to execute only when a compiler was found to be responsive)
executed.
- A third bug was also fixed in the way that the merged stdout/stderr
output was tested for non-emptiness (moving the 'cat' invocation to
another line and testing the contents of a variable instead).
- The three bugs above have been fixed as part of a partial rewrite of
the select_cc() function in terms of a return code check, which
obviated the need to save the output of stdout and stderr.
- The fourth bug involved a misnamed variable in the right-hand side
of a statement intended to prepend CC to search_list when CC was
non-empty. This typically did not manifest as a bug since usually CC
(if it was set) was set to a value that was known to work.

commit a7166feb1053814b7dd27f3879ae38acfc9637fc
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Mon Jun 25 12:09:18 2018 -0500

Finish macroization of assembly ukernels.

commit f986396c2af5de06283b9834112782afd0a8907e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 22 18:12:40 2018 -0500

Added 'configure --help' text for CFLAGS, LDFLAGS.

Details:
- Added mention of the new support for preset CFLAGS, LDFLAGS to the
bottom of the text output by './configure --help'.
- Updated usage example to use 'haswell' instead of 'sandybridge'.

commit 884175d9ffb62e49535e6c1f7d58fb3b83e7e78f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 22 18:08:43 2018 -0500

Added configure support for preset CFLAGS, LDFLAGS.

Details:
- Any preexisting values set to the CFLAGS environment variable (or the
CFLAGS variable if given on the command line) are saved by configure
for later inclusion (prepending, to be precise) along with the
compiler flags automatically determined by the BLIS build system.
LDFLAGS is treated in a similar manner.) Thanks to Dave Love for
requesting this feature in issue 223 and Mathieu Poumeyrol for his
support on this and a previous related issue.
- Comment updates to build/config.mk.in.
- Strip whitespace from return value of various cflags functions in
common.mk.

commit 07c3d0a95190bd23f0cd2ef220deb3384d8378d1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 21 12:35:07 2018 -0500

Update to CREDITS file.

commit a1ebbbf158c7b34c9032ef45431bc610b6f14858
Merge: 17928b1c c81c6f23
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 15:37:53 2018 -0500

Merge pull request 224 from devinamatthews/asm-macros

Asm macros

commit c81c6f23b9547b5d55ae68fd5a3bbd8a78290b6b
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 15:20:44 2018 -0500

Fix problem with inc and dec macros.

commit 5a63971c822fd452f97ba869625c8e87f6cbeebc
Merge: b4d94e54 17928b1c
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 14:07:49 2018 -0500

Merge remote-tracking branch 'upstream/dev' into asm-macros

commit b4d94e54d44cf30e4bb452ca5263be3473c0582d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Wed Jun 20 14:07:24 2018 -0500

Convert x86 microkernels to assembly macros.

commit 17928b1c9941aa58aef1f122c793e2b14e705267
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 17:59:03 2018 -0500

Added static funcs bli_dt_domain(), bli_dt_prec().

Details:
- Added definitions of static functions bli_dt_domain()/bli_dt_prec(),
which extract a dom_t domain or prec_t precision value, respectively,
from a num_t datatype.
- Changed the return types of bli_obj_domain() and bli_obj_prec() from
objbits_t to dom_t and prec_t. (Not sure why they were ever set to
return objbits_t.)

commit 5f7fbb7115b1bf532c169dfd9adef84c41a95031
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 15:38:55 2018 -0500

Static funcs for projecting dt to single/double.

Details:
- Added static functions for projecting a datatype to single precision
or double precision, both for obj_t's storage datatypes and standalone
datatypes.

commit d4a22702c7a90273dc14f271db465c2e11e5b87e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 14:54:57 2018 -0500

Set up haswell config for optional col-pref ukrs.

Details:
- Added two presently-disabled cpp blocks in bli_cntx_init_haswell.c to
easily allow one to switch to a set of column-preferential gemm
microkernels (in the haswell subconfiguration). The second column-
preferring block sets the the register blocksizes to their appropriate
values. However, cache blocksizes are left unchanged, and therefore are
likely suboptimal. This should be addressed later.

commit f317c2e31bfc329cb6bb4e06005e45b9c8a9d6a7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 19 12:21:23 2018 -0500

Added get/set static funcs for exec dt/dom/prec.

Details:
- Added functions to bli_obj_macro_defs.h to get and set the target
domain and target precision bits in the obj_t, and also added the
appropriate support in bli_type_defs.h.

commit e88a5b8da8c26caebd2b0fb73b30836fb5417c9c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 18 15:56:26 2018 -0500

Implemented castm, castv operations.

Details:
- Implemented castm and castv operations, which behave like copym and
copyv except where the obj_t operands can be of different datatypes.
These new operations, however, unlike copym/copyv, do not build upon
existing level-1v kernels.
- Reorganized projm, projv into a 'proj' subdirectory of frame/base (to
match the newly added frame/base/cast directory).
- Added new macros to bli_gentfunc_macro_defs.h, _gentprot_macro_defs.h
that insert GENTFUNC2/GENTPROT2 macros for all non-homogeneous datatype
combinations. Previously, one had to invoke two additional macros--one
which mixed domains only and another that included all remaining
cases--in order to get full type combination coverage.
- Defined a new static function, bli_set_dims_incs_2m(), to aid in the
setting of various variables in the implementations of bli_??castm().
This static function joins others like it in bli_param_macro_defs.h.
- Comment update to bli_copysc.h.

commit 2000cdff59272974438e88e0e82d8e1a32710325
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 18 14:17:28 2018 -0500

Update to CREDITS file.

commit ed2c8aed848ba2dede18df090cf2e0b6e4cc059f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 18 11:49:34 2018 -0500

Temporarily disabled small matrix handling on zen.

Details:
- Disabled small matrix handling in config/zen/bli_family_zen.h due to
what appears to be a bug that manifests as failures in the single and
double precision real level-3 BLAS test drivers (visible via
out.sblat3 and out.dblat3). Thanks to Robin Christ for reporting this
issue.

commit ed20392c500940bfc0947795c1ff7c8c24f8e26f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 15 16:31:22 2018 -0500

Added get/set static funcs for exec dt/dom/prec.

Details:
- Added functions to bli_obj_macro_defs.h to get and set the execution
domain and execution precision bits in the obj_t.
- Added/rearranged a few functions in bli_obj_macro_defs.h.
- Renamed some macros in bli_type_defs.h: EXECUTION -> EXEC.

commit 22594e8e9ab55f5bc0e69d96a23e128502849999
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 14 17:35:23 2018 -0500

Updated sandbox/ref99 according to f97a86f.

Details:
- Applied changes to ref99 sandbox analagous to those applied to
framework code in f97a86f. This involves setting the pack schemas of
A and B objects temporarily to communicate those desired schemas to
the control tree creation function in blx_gemm_cntl.c. This allows us
to (henceforth) query the schemas from the control tree rather than
the context.

commit 1b5d0424d2c7e5eac33e02359c12917ef280949f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 13 18:41:32 2018 -0500

Prototype column-preferential zen gemm ukernels.

Details:
- Added prototypes to bli_kernels_zen.h for each of the four gemm
microkernels that prefer outputting to column storage.

commit f88c2e7a539e383297e846e6d4647058dd3db128
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 13 18:27:46 2018 -0500

Defined static function bli_blksz_scale_def_max().

Details:
- Added a new static function to bli_blksz.h that scales both the default
(regular) blocksize as well as the maximum blocksize in the blksz_t
object. Reminder: maximum blocksizes have different meanings in
different contexts. For register blocksizes, they refer to the packing
register blocksizes (PACKMR or PACKNR) while for cache blocksizes, they
refer to the maximum blocksize to use during the final iteration of a
loop.

commit 87db5c048e0c7f37351fda486abaf7d19fc5821c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 12 19:38:37 2018 -0500

Changed usage of virtual microkernel slots in cntx.

Details:
- Changed the way virtual microkernels are handled in the context.
Previously, there were query routines such as bli_cntx_get_l3_ukr_dt()
which returned the native ukernel for a datatype if the method was
equal to BLIS_NAT, or the virtual ukernel for that datatype if the
method was some other value. Going forward, the context native and
virtual ukernel slots will both be initialized to native ukernel
function pointers for native execution, and for non-native execution
the virtual ukernel pointer will be something else. This allows us
to always query the virtual ukernel slot (from within, say, the
macrokernel) without needing any logic in the query routine to decide
which function pointer (native or virtual) to return. (Essentially,
the logic has been shifted to init-time instead of compute-time.)
This scheme will also allow generalized virtual ukernels as a way
to insert extra logic in between the macrokernel and the native
microkernel.
- Initialize native contexts (in bli_cntx_ref.c) with native ukernel
function addresses stored to the virtual ukernel slots pursuant to
the above policy change.
- Renamed all static functions that were native/virtual-ambiguous, such
as bli_cntx_get_l3_ukr_dt() or bli_cntx_l3_ukr_prefers_cols_dt()
pursuant to the above polilcy change. Those routines now use the
substring "get_l3_vir_ukr" in their name instead of "get_l3_ukr". All
of these functions were static functions defined in bli_cntx.h, and
most uses were in level-3 front-ends and macrokernels.
- Deprecated anti_pref bool_t in context, along with related functions
such as bli_cntx_l3_ukr_eff_dislikes_storage_of(), now that 1m's
panel-block execution is disabled.

commit dbaf440540837b03643190cd685ed889fa7fd212
Merge: 22aa44eb 2610fff0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 11 12:37:04 2018 -0500

Merge branch 'master' into dev

commit 2610fff0b07bdb345cb2e334ef6bea0c63c8cead
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 11 12:32:54 2018 -0500

Renamed 1m packm kernels from _1e to _1er.

Details:
- Renamed the reference packm kernels used by 1m. Previously, they used
a _1e suffix, which was confusing since they packed to both 1e and 1r
schemas. This was likely an artifact of the time when there were
separate kernels for each schema before I decided to combine them into
a single function (per datatype and panel dimension), and the 1e
functions were the ones to inherit the 1r functionality. The kernels
have now been renamed to use a _1er suffix.

commit 7af5283dcc3dded114852d6013d33134021b81aa
Author: sraut <Biplab.Rautamd.com>
Date: Mon Jun 11 15:00:22 2018 +0530

added check condition on n-dimension for XA'=B intrinsic code to process till 128 size

Change-Id: I95d020a5ca3ea21d446b8c2e379d56e1eea18530

commit 712de9b371a8727682352a2f52cd4880de905f0b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 9 14:36:30 2018 -0500

Added missing semicolon in 03obj_view.c

Details:
- Thanks to Tony Skjellum for pointing out this typo due to a
last-minute change to the source prior to committing.

commit 043d0cd37ef4a27b1901eeb89d40083cfb2a57ba
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 9 13:46:49 2018 -0500

Implemented bli_acquire_mpart(), added example code.

Details:
- Implemented bli_acquire_mpart(), a general-purpose submatrix view
function that will alias an obj_t to be a submatrix "view" of an
existing obj_t.
- Renumbered examples in examples/oapi and inserted a new example file,
03obj_view.c, which shows how to use bli_acquire_mpart() to obtain
submatrix views of existing objects, which can then be used to
indirectly modify the parent object.

commit f1908d39767baef56077def69126d96f805ee27e
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 8 14:22:22 2018 -0500

Fixed broken input.operations.fast.

Details:
- Removed three input lines from input.operations.fast (labeled
"test sequential micro-kernel") that I intended to remove in bd02c4e.
These lines prevented 'make check' (and 'make checkblis-fast') from
completing correctly. Note: This bug was fixed in 3df39b3, but that
commit has not yet been merged into master, hence this redundant
commit. Thanks to Robert van de Geijn for reporting this issue.

commit 262a62e3482c5caa947a89cabb562b5887555bd6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 8 12:10:54 2018 -0500

Fixed undefined ref in steamroller/excavator configs.

Details:
- Fixed erroneous calls to bli_cntx_init_piledriver_ref() in
bli_cntx_init_steamroller() and bli_cntx_init_excavator(), which
should have been to their respectively-named bli_cntx_init_*()
functions instead. Thanks to qnerd for bringing these bugs to our
attention.

commit 22aa44ebec2c7884bdc944775a1aa7534ab53f0d
Merge: 65fae950 b65d0b84
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 17:42:59 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit 65fae95074d239354737355bbe6f202d4f8b2871
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 17:41:09 2018 -0500

Implemented bli_setrm, _setim, _setrv, _setiv.

Details:
- Defined new wrappers to setm/setv operations in frame/base/bli_setri.c
that will target only the real or only the imaginary parts of a
matrix/vector object.
- Updated bli_obj_real_part() so that the complex-specific portions of
the function are not executed if the object is real.
- Defined bli_obj_imag_part().
- Caveat: If bli_obj_imag_part() is called on a real object, it does
nothing, leaving the destination object untouched. The caller must
take care to only call the function on complex objects.
- Reordered some of the static functions in bli_obj_macro_defs.h related
to aliasing.

commit b65d0b841b7e4357bc2cf743bbb03384a3ab0bfa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 14:38:41 2018 -0500

Fixed bug in bli_dt_proj_to_complex().

Details:
- Fixed a bug identical to the one fixed in 0a4a27e, except this time in
the bli_obj_param_defs.h header file. It looks like the only consumers
of this static function were in bli_l0_oapi.c, and so this may not have
been manifesting (yet).

commit 55b6abdf7458e31df3ad01796d67c2332c776948
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 14:08:12 2018 -0500

Enforce consistent datatypes in most object APIs.

Details:
- Added logic to level-1v, -1d, -1f, -1m, -2, and -3 operations' _check()
functions to ensure that all operands are of the same datatype. There
are some exceptions that were left out, such as the _check() function
for the various norm operations since they have a different idea of
datatype consistency (ie: the norm object must be the real projection
of the primary input vector/matrix object).

commit 513138b1a1ecebd015580423c779810cae5c67f2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu Jun 7 12:24:47 2018 -0500

Defined/implemented bli_projv().

Details:
- Added an implementation for bli_projv() to go along with the
implementation of bli_projm() added in 0a4a27e. The only difference
between the two is that bli_projv() may only be used on vectors,
whereas bli_projm() is general-purpose.
- Added a _check() function corresponding to bli_projv().

commit 5f71c1e719eb482b2a4e40daa280c4f7d05b6963
Merge: b5a641e9 3df39b37
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 19:06:14 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit b5a641e968469805906eb2c971384d12ad1beac5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 19:05:37 2018 -0500

Added char-to-dt and dt-to-char mapping functions.

Details:
- Defined additional functions in bli_param_map.c:
bli_param_map_char_to_blis_dt()
bli_param_map_blis_to_char_dt()
which will map a char to its corresponding num_t, or vice versa.

commit 0a4a27e1a4487480410bc0b1bb034bcf97583214
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 19:02:29 2018 -0500

Defined/implemented bli_projm().

Details:
- Defined a new operation in frame/base/bli_proj.c, bli_projm(), which
behaves like bli_copym(), except that operands a and b are allowed to
contain data of differing domains (e.g. a is real while b is complex,
or vice versa). The file is named bli_proj.c, rather than bli_projm.c,
with the intention that a 'v' vector version of the function may be
added to the same file (at some point in the future).
- Added supporting bli_check_*() functions in bli_check.c to confirm
consistent precisions between to datatypes/objects, as well as the
appropriate error message in bli_error.c and a new error code in
bli_type_defs.h.
- Wrote a bli_projm_check() function to go along with bli_projm().
- Defined static function bli_obj_real_part() in bli_obj_macro_defs.h,
which will initialize an obj_t alias to the real part of the source
object.
- Fixed a bug in the static function bli_dt_proj_to_complex(), found
in bli_param_macro_defs.h. Thankfully, there were no calls to the
function to produce buggy behavior.

commit 3df39b37a0134befa34b6b6259db98467c7bc965
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed Jun 6 15:35:05 2018 -0500

Fixed recently broken input.operations.fast.

Details:
- Removed "test sequential front-end" lines from microkernel test
entries of input.operations.fast. This change was meant for inclusion
in bd02c4e but was missed due to slightly different wording of the
comment (I used "sed //d" to remove the lines). This fixes the broken
'make checkblis-fast' (and 'make check') targets.

commit 695cd520e2f5eab938f66afe9fe36201ab2700c5
Author: sraut <Biplab.Rautamd.com>
Date: Wed Jun 6 11:48:56 2018 +0530

AMD Copyright information changed to 2018

Change-Id: Idfd11afd5d252f8063d0158680d24bf7e2854469

commit df1dd24fd896821de60917b429f303bab7fd0d4b
Author: sraut <Biplab.Rautamd.com>
Date: Wed Jun 6 11:24:33 2018 +0530

small matrix trsm intrinsics optimization code for AX=B and XA'=B

Change-Id: I90123c4d9adbd314c867995cd19dc975150b448c

commit 3f48c38164b4135515b5c752c506fdccc4480be2
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 5 16:52:35 2018 -0500

Cosmetic fix to configure output in config.mk.

Details:
- Fixed configure so that MK_ENABLE_MEMKIND is assigned "no" when the
option is disabled due to libmemkind not being present. This wasn't
affecting anything since the one use of the variable (in common.mk)
was formulated as "ifeq ($(MK_ENABLE_MEMKIND),yes)". That is, the
variable being empty was effectively equivalent to it being set to
"no".
- Comment updates to build/config.mk.in, common.mk.

commit 5df201260f64aa98a365931f6d2da70144d69932
Merge: 1b9af85e 96d2774b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 5 16:14:19 2018 -0500

Merge branch 'master' into dev

commit 1b9af85ec98d91bb2b27aadaa3df344d18faff35
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue Jun 5 16:07:13 2018 -0500

Updated ref99 call to _cntx_set_thrloop_from_env().

Details:
- Reordered the arguments in the ref99 sandbox's call to
bli_cntx_set_thrloop_from_env() to be consistent with the updated
function signature from f97a86f. Thanks to Devangi Parikh for
reporting this issue.

commit 96d2774b4cb44ff1e8b5798d7cfc83154a607624
Author: Tyler Michael Smith <tmscs.utexas.edu>
Date: Tue Jun 5 14:17:39 2018 +0200

Make bli_auxinfo_next_b() return b_next, not a_next (216)

commit d4c24ea5f644eb635046e7fe249d3e8e58b4c98a
Author: sraut <biplab.rautamd.com>
Date: Tue Jun 5 15:42:59 2018 +0530

copyright message changed to 2018

Change-Id: I33c1ebda41bc7f1973ff19e3b1947bdad62b4d44

commit 3f1ba4e646776699ebfaa042fe24691d9e2f55d0
Author: sraut <biplab.rautamd.com>
Date: Tue Jun 5 14:21:13 2018 +0530

copyright changed to 2018

Change-Id: Ie916c7cd6f95aedc3cab6eec3a703c9ddb333bc3

commit bd02c4e9f7fe07487276e61507335d48c8e05f35
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Jun 4 13:42:17 2018 -0500

Cleanups to testsuite, input.operations format.

Details:
- Removed the line in each operation entry in input.operations titled
"test sequential front-end" and the corresponding support for the lines
in the testsuite input parsing code. This line was included in the some
of the earliest versions of the testsuite, back when I intended to
eventually have separate multithreaded APIs. Specifically, I envisioned
that multithreaded and sequential testing could be enabled or disabled
on an operation level. However, BLIS evolved in a different direction
and still does not have multithreaded-specific APIs (even if it will
eventually someday). But even if it did have such APIs, I doubt I would
allow the user to enable/disable them on an operation level. Thus, this
was a zombie future parameter that was never used and never made sense
to begin with. The one instance of the front_seq variable, used in the
various libblis_test_<operation>() functions to guard the call to the
operation test driver, that remains was commented out instead of
deleted so that someday it could be easily changed via sed, if desired.
- Various minor cleanups to the testsuite code, including consolidating
use of DISABLE and DISABLE_ALL and reexpressing certain conditional
expressions in the libblis_test_<operation>() functions in terms of
boolean functions.

commit 2c6d99b99e50d70f904da298a0c59be16cc5c180
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jun 3 18:13:36 2018 -0500

Fixed names out of alphabetical order in CREDITS.

commit 7a207e8f2c5046f8b295a78e029ff2de765c7409
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jun 3 18:04:27 2018 -0500

Disabled indirect blacklisting (issue 214).

Details:
- Return early from function, pass_config_kernel_registries(), that
implements indirect blacklisting of subconfigurations (during pass 0).
In short, I realized that indirect blacklisting is not needed in the
situations I envisioned, and can actually cause problems under certain
circumstances. Thanks to Tony Skjellum for reporting the issue (214)
that led to this commit, and to Devin Matthews for prompting me to
realize that indirect blacklisting was unnecessary, at least as
originally envisioned.

commit d7fb32682057c7458c8891c0eedafc374fd9beef
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun Jun 3 13:20:37 2018 -0500

Fixed syntax artifacts from 4b36e85 in examples.

Details:
- Fixed artifacts of malformed recursive sed expressions used when
preparing 4b36e85, in which most function-like macros were converted
to static functions. The syntactically defective code was contained
entirely in examples/oapi. Thanks to Tony Skjellum for reporting this
issue.
- Update to CREDITS file.

commit ed7dedfd4a07eefeb5a038f9899afb8053b45383
Merge: f97a86f3 469727d4
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 2 20:29:53 2018 -0500

Merge branch 'master' into dev

commit f97a86f322a6e3e31f33c89befc66189b0b8c64f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Jun 2 20:28:20 2018 -0500

Updated setting/querying pack schema (cntx->cntl).

- Query pack schemas in level-3 bli_*_front() functions and store those
values in the schema bitfields of the correponding obj_t's when the
cntx's method is not BLIS_NAT. (When method is BLIS_NAT, the default
native schemas are stored to the obj_t's.)
- In bli_l3_cntl_create_if(), query the schemas stored to the obj_t's in
bli_*_front(), clear the schema bitfields, and pass the queried values
into bli_gemm_cntl_create() and bli_trsm_cntl_create().
- Updated APIs for bli_gemm_cntl_create() and bli_trsm_cntl_create() to
take schemas for A and B, and use these values to initialize the
appropriate control tree nodes. (Also cpp-disabled the panel-block cntl
tree creation variant, bli_gemmpb_cntl_create(), as it has not been
employed by BLIS in quite some time.)
- Simplified querying of schema in bli_packm_init() thanks to above
changes.
- Updated openmp and pthreads definitions of bli_l3_thread_decorator()
so that thread-local aliases of matrix operands are guaranteed, even
if aliasing is disabled within the internal back-end functions (e.g.
bli_gemm_int.c). Also added a comment to bli_thrcomm_single.c
explaining why the extra aliasing is not needed there.
- Change bli_gemm() and level-3 friends so that the operation's ind()
function is called only if all matrix operands have the same datatype,
and only if that datatype is complex. The former condition is needed
in preparation for work related to mixed domain operands, while the
latter helps with readability, especially for those who don't want to
venture into frame/ind.
- Reshuffled arguments in bli_cntx_set_thrloop_from_env() to be
consistent with BLIS calling conventions (modified argument(s) are
last), and updated all invocations in the level-3 _front() functions.
- Comment updates to bli_cntx_set_thrloop_from_env().

commit 965db85d29977d228ea744581edf2b682eb8e8a8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri Jun 1 12:32:15 2018 -0500

Updated macro invocations in bli_gemm_ker_var2.c.

Details:
- Updated "get next a/b micropanel" macro invocations in
bli_gemm_ker_var2.c according to changes in 9588625.
- Comment update in bli_cntx.c.

commit 8749fa0b48a7710f4115023e2c46bc80167bc8f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 31 12:34:01 2018 -0500

Cleanups to ref99/README.md, test/3m4m/Makefile.

Details:
- Minor edits to sandbox/ref99/README.md.
- Removed cpp guards in sandbox/ref99/thread/blx_gemm_thread.h to be
consistent with other headers in sandbox/ref99.
- Additional targets and related cleanups in test/3m4m/Makefile.

commit 9588625c43c86ef1bde8140f620a30f52420e6a6
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 30 15:19:53 2018 -0500

Renamed "next micropanel" macros in _l3_thrinfo.h.

Details:
- Renamed several macros defined in bli_l3_thrinfo.h designed to compute
the values of a_next and b_next to insert into an auxinfo_t struct in
level-3 macrokernels. (Previously, the macros did not use a bli_
prefix.)
- Updated instances of above macro usage within various macrokernels.

commit e4420591225fca2f63ca74ef6a23b962fcd4bec0
Merge: 34f974d1 850a8a46
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 17:12:22 2018 -0500

Merge branch 'dev' of github.com:flame/blis into dev

commit 34f974d1a83a7d29ba09f67e392d361231fdf99c
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 17:11:52 2018 -0500

More tweaks/updates to sandbox/ref99/README.md.

commit 850a8a46c0a569a2652d8c200e5c53b61bcf988d
Author: Devin Matthews <dmatthewsutexas.edu>
Date: Tue May 29 13:51:21 2018 -0500

Test all x86_64 configurations*... (212)

* Add custom SDE cpuid files.

* Set up testing of all x86_64 architectures (except bulldozer) using SDE.

* Update .travis.yml

[ci skip]

* Update do_testsuite.sh

[ci skip]

* Updated .travis.yml with my secret token.

Details:
- Replaced Devin's temporary secret token with my own, which is used by
Travis when accessing the Intel SDE via Dropbox.

* Work around CPUID dispatch in glibc/libm by patching ld.so.

* Detect path of loader at runtime.

* Attempt to make SDE run on Travis

* Allow unpatched ld.so if we don't know how to patch it.

I *think* this only happens for older glibc without the multi-arch stuff (e.g. Ubuntu 14.04 on Travis), but who knows?

* Upgrade Travis to gcc-6 and binutils-2.26.

* Try to get Travis to use the right assembler.

* Apparently you need ld-2.26 too.

* Try to also patch ld.so from Ubuntu 14.04.

* Take the nuclear option.

* Account for non-absolute dependencies in ldd output.

* String manipulation fail.

* Update patch-ld-so.py

* Add Zen to SDE testing.

* Removed dead variable from travis/do_testsuite.sh.

Details:
- Removed 'BLIS_ENABLE_TEST_OUTPUT=yes' from make invocations in
travis/do_testsuite.sh. This variable is no longer present in the
BLIS build system (if it ever was?), and therefore has no effect.

commit 42ea02a34e5c144893fe239ae55daef895d92677
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 12:48:14 2018 -0500

Renamed c99 sandbox to ref99.

Details:
- Renamed sandbox/c99 to sandbox/ref99. I wanted to name the sandbox so
that it would be thought of as a "reference" sandbox. I kept the "99"
to differientiate it from future reference sandboxes that may be
written in another language (such as C++).
- Updates to sandbox/ref99/README.md.

commit 0e7205ccef50dccd4306cf427a63633396472813
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 29 12:36:13 2018 -0500

Remove sandbox/.gitkeep now that dir is non-empty.

commit 3a4603858e3819cbd6ed7dd67d0fc0b3f89ed254
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat May 26 15:51:08 2018 -0500

More README.md updates to sandbox/c99.

Details:
- Added a section that walks the reader through how to configure BLIS to
use a gemm sandbox.

commit 2bad97f6bdf4642884d60fc03970549902a54d74
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat May 26 15:31:16 2018 -0500

Updates to CREDITS, sandbox/c99/README.md.

commit 2b4a447526effa3e847a7e5c15c3758573f12318
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 25 18:51:23 2018 -0500

Initial implementation of c99 "reference" sandbox.

Details:
- Added a c99 sandbox (in sandbox/c99) to serve as a starting point for
others looking to experiment with alternative implementations of gemm
in BLIS. Note that this sandbox implementation is a first draft and
will be refined over time.
- Minor updates to Makefile and common.mk to restrict what source files
get recompiled when sandbox files are touched.
- Added an initial draft of a README.md in sandbox/c99.

commit 469727d4f8a976d8713afb4d0b6235c322498db0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 25 16:17:13 2018 -0500

Very minor comment updates.

commit 66dbe69a0f9359bf1e39b5672ee365213de2e3ee
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 25 15:45:53 2018 -0500

Converted macros to static funcs in _packm_cntl.h.

Details:
- Converted various macros in frame/1m/packm/bli_packm_cntl.h (designed
to access fields of a packm_params_t struct) to static functions.

commit 22deef2f5463a47e3b3c37fc313d17550f10ee06
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 24 14:28:55 2018 -0500

Support alternative gemm implementation sandboxes.

Detail:
- configure:
- add support for --enable-sandbox=NAME to configure script, where NAME
is a subdirectory of a new 'sandbox' directory that contains an
alternative implementation of gemm. (For now, only implementations of
gemm may be provided via a sandbox.);
- add support for C++ compiler. C++ compilers are handled in a manner
similar to that of C compilers, in that a default search order is
used, and that CXX is searched for first, if the variable is set. In
practice, the C++ compiler that is selected should correspond to the
selected C compiler. (Example: If gcc is selected for C, g++ should
be selected for C++.) The result of the search is output to config.mk
via build/config.mk.in. NOTE: The use of C++ in BLIS is still
hypothetical, but may eventually move to being experimental. This
support was intended only for use of C++ within a gemm sandbox.
- build/config.mk.in:
- define SANDBOX variable containing sandbox subdirectory name.
- build/bli_config.in:
- define either of the BLIS_ENABLE_SANDBOX or BLIS_DISABLE_SANDBOX
macros in bli_config.h.
- common.mk:
- include makefile fragments that were propagated into the specified
sandbox subdirectory;
- generate different CFLAGS for sandboxes, as well as a separate
CXXFLAGS variable for sandboxes when C++ source files are compiled;
- isolate into a single location lists of file suffixes for various
purposes.
- reorganized/clean up code related to identifying header files and
paths.
- Makefile:
- generate object filepaths for and compile source code files found in
sandbox sub-directory;
- remove makefile fragments placed in sandbox sub-directory (cleanmk);
- various other cleanups.
- Added .cc, .cpp, and .cxx to list of suffixes of files to recognize in
makefile fragments (via build/gen-make-frags/suffix_list).
- Updated blis.h to conditionally include bli_sandbox.h (via a new file,
bli_sbox.h), which each sandbox is assumed to use for any type
definitions and function prototypes it wishes to export out to blis.h.
- Conditionally disable bli_gemmnat() implementation in frame/3 when
BLIS_ENABLE_SANDBOX is defined.

commit 25e3501ed57a0db7f860c88b7199b36049aec12a
Merge: 216a4cb9 5140ee34
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 24 13:57:16 2018 -0500

Merge branch 'master' into dev

commit 5140ee3424c744981a3fed3b5a748ebbfc111388
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 23 16:56:14 2018 -0500

Updated types of bli_is_[un]aligned_to() functions.

Details:
- Changed the void* arguments of the following static functions:
bli_is_aligned_to()
bli_is_unaligned_to()
bli_offset_past_alignment()
to siz_t, and the return type of bli_offset_past_alignment() from
guint_t to siz_t. This allows for more versatile usage of these
functions (e.g. when aligning both pointers and leading dimension).
- Updated all invocations of these functions, mostly in kernels/penryn
but also in kernels/bgq, to include explicit typecasts to siz_t when
pointer arguments are passed in.
- Thanks to Devin Matthews for pointing out this potential bug (via issue
211).
- Deleted a few trailing spaces in various penryn kernels.
- Removed duplicate instances of the words "derived" and "THEORY" from
various kernel license headers, likely from a malformed recursive sed
performed long ago.

commit 216a4cb9cb87fa4c93f6ceb6ae90602e5018b305
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 18 18:47:03 2018 -0500

Minor update to flatten-headers.[py|sh] help text.

Details:
- Fixed a typo and removed some outdated language from the help text of
flatten-headers.py and flatten-headers.sh.

commit 962a706a6f56ea070ac4683f0af69c7e59af8ecb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 18 18:19:40 2018 -0500

Updated LICENSE file to mention HP Enterprise.

Details:
- Added HP Enterprise to the LICENSE file. Previously, only the source
files touched by HPE contained the corresponding copyright notices.
(This oversight was unintentional.)
- Updated file-level copyright notices to include a comma, to match
the formatting used for UT and AMD copyrights.

commit efa43e13effe901ad31e734ac90f027e89473bd9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 18 12:20:40 2018 -0500

More updates to CREDITS and RELEASING files.

commit f94ab97af8e86baf9ee9a9cbaef8bb3712df2e11
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 17:45:31 2018 -0500

Update to CREDITS file.

commit 4919b10c005e006a6d818eb8f865f9dbd8aa16df
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 16:38:49 2018 -0500

Minor changes to README.md and CONTRIBUTING.md.

commit b89451187e8321b673a1cf7603c8d48028d9d4c8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 16:23:06 2018 -0500

README.md update.

Details:
- Added "Contributing" section with relevant links.

commit af244194e7d76276a1b90fe59f9307dde0429e1d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 15:38:02 2018 -0500

Removed explicit critical sec. from bli_memsys.c.

Details:
- Removed critical sections protecting the initialization/finalization of
bli_memsys.c. These synchronization mechanisms are no longer needed now
that BLIS initializes all APIs via pthread_once().

commit 10c9e8f95254d8c6436c4d3cb093fa5544b45c90
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 17 15:22:51 2018 -0500

Cache hardware's arch_t id after querying once.

Details:
- Added logic to bli_arch.c that will call what was previously the body
of bli_arch_query_id() only once and then cache the value in a static
variable local to the file. (Previously, the arch_t associated with
the hardware/configuration was queried every time bli_arch_query_id()
was called, which was at least once per level-3 function call. Thanks
to Devin Matthews for suggesting this feature via issue 175.
- Added -lpthread to the compile/link command line of the compiler
invocation that compiles build/detect/config/config_detect.c, which
prints the string identifying the detected configuration, since it
is now needed due to new pthread_once() logic in bli_arch.c.
- Implementation note: I chose to implement this arch_t caching feature
via pthread_once(), using a separate pthread_once_t variable local to
the file, rather than calling bli_init_once(). The reason is that I
did not want to require bli_init() as a prerequisite to this function.
bli_init() already calls several sub-components, some of which make use
of bli_arch_query_id(), and therefore it would be easy to fall into a
circular self-init situation (which usually causes pthreads to hang
indefinitely).

commit f28a15293890ac6fbceac229fd204dbc9fec6e27
Author: Francisco Igual <figualucm.es>
Date: Thu May 17 09:26:14 2018 +0000

Fixed clobber list bug in ARMv8 ukernel

commit 2e31dd7852b4d6a9355899cf9659d4b8130461cb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 16 17:28:33 2018 -0500

Inserted missing integer typecasting into ukernels.

Details:
- Inserted missing safeguards into most microkernels to ensure that the
integers read by the microkernel's assembly instructions are of the
appropriate size. In many cases, this bug was going undetected likely
because the compiler was inserting zero padding before the integers
in the calling function, allowing the assembly code to read 64-bits
in a way that did not corrupt the "lower" 32 integer bits with garbage
in the higher bits. Thanks to Francisco Igual and Devangi Parikh for
finding this issue.

commit 12dfa9516428b4092554f0ce70b07571d35de222
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 16 12:46:57 2018 -0500

Fixed a bug in determining default integer size.

Details:
- Fixed a bug that would cause configurations to inadvertantly define
their integers to be 32 bits when those environments actually call for
64-bit integers. While either BLIS_ARCH_64 or BLIS_ARCH_32 is defined
in bli_system.h (based on whether preprocessor macros such as __x86_64
or __aarch64__ are defined by the environment), bli_system.h was being
included *after* bli_config_macro_defs.h, in which the BLIS_ARCH_64
macro was used to choose an integer type size in the event that
BLIS_INT_TYPE_SIZE was not already defined by configure via
bli_config.h. And due to the structure of the cpp code in that file,
the 32-bit integer case was being chosen. Thanks to Francisco Igual
and Devangi Parikh for their help in isolating this bug.
- Moved the include of hbwmalloc.h and related preprocessor code to
bli_kernel_macro_defs.h to facilitate the reshuffling of the include
for bli_system.h in blis.h.

commit f930cec0f35824c0f9ebbd218614209217d491cb
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 15 17:47:08 2018 -0500

More tweaks to CONTRIBUTING.md.

commit 173e30ff7d293ba31f3fab8ab0c0a695eda3d4fd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 15 14:48:34 2018 -0500

Added initial draft of CONTRIBUTING.md file.

Details:
- Thanks to the Ruby on Rails project for providing a good template off
of which to build.

commit 6e25e758b444bf725046674e1e64c6a52421749d
Author: Nico Schlömer <nico.schloemergmail.com>
Date: Tue May 15 14:03:20 2018 +0200

Debian config (206)

* add debian config

* correct wording in the README

commit fcf6c6a3c87da08a7cdb92b102489b991ef7a644
Author: Alex Arslan <ararslancomcast.net>
Date: Mon May 14 18:41:03 2018 -0700

Fix shared library builds on platforms other than Linux and macOS (209)

* Fix detection of systems other than Linux and macOS

The way the logic is currently laid out, any platform that isn't Linux
gets assigned the .dylib shared library extension and the macOS-specific
compiler flags. This reverses the logic to check for macOS first, and
have the fallback use the Linux definitions, which apply to most other
systems as well.

* Use SHLIB_EXT instead of SO_SUF

The former is more standard, as jakirkham pointed out in a comment.

commit 6f7f51048c48f31d691c06451d0fd2cbc453ad03
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 18:41:56 2018 -0500

Echo cc_vendor when printing compiler version.

Details:
- Echo the ${cc_vendor} when informing the user of the compiler's version.
Previously, the actual ${cc} (which could be a path to the executable)
was being printed, which has already been printed by that point in the
configure script.

commit ad67dc4e348b0a381efc057573a6b03cc7e26db0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 18:35:28 2018 -0500

Communicate cc, cc_vendor to make via config.mk.

Details:
- Historically, the compiler selection has happened statically in the
various make_defs.mk and would only be overriden by setting CC (either
prior to running configure or as a configure argument). However, in
the last couple months, configure has evolved to contain rather
sophisticated compiler detection logic for the purposes of blacklisting
sub-configurations. It only makes sense that configure now fully take
over the responsibility of selecting a compiler from the GNU make side
of the build system. Thanks to Alex Arslan for his help exposing this
issue.
- Substitute found_cc into CC in config.mk via configure.
- Set a new variable, CC_VENDOR, in config.mk via substitution from
configure, and disable the corresponding CC_VENDOR code in common.mk.
- Disabled default compiler selection (usually gcc) in the sub-configs'
various make_def.mk files.

commit 20af119fc97ec6120017a7a5ba5f9aaa920c7640
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 17:44:58 2018 -0500

Added README.md to 'config' directory.

Details:
- Added a brief README.md file to the config directory to redirect those
who may be exploring the source tree to the ConfigurationHowTo wiki.
(Included is a very brief explanation of configurations for those who
don't have time to read the wiki.) Thanks to Nico Schlömer for this
suggestion.

commit 9dbce16269c3e1f27c7a0d64372cc76aed30dfc1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 17:04:54 2018 -0500

Search for 'cc clang gcc' on OpenBSD, FreeBSD.

Details:
- Swapped gcc and clang in the compiler search list for OpenBSD.
- Use the same search list for FreeBSD as above.

commit 55ebf24d63128b5fd15b10160485667415a02a55
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 14 16:19:08 2018 -0500

Change compiler search order on OpenBSD.

Details:
- Set a compiler search list (and order) as a function of the OS detected
via 'uname -s'. By default, this list and order is 'gcc clang cc' for
Linux and Darwin (OS X), and any other OS except OpenBSD). On OpenBSD,
we use 'cc gcc clang' because OpenBSD's default installation of gcc
(4.2.1) is too old for BLIS. Thanks to Alex Arslan for reporting this
issue and suggesting a fix.

commit 4fb353bd90e6642c8aeffd1b1e6329f54eee4bb4
Merge: 4b36e85b 8a2857b5
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sun May 13 17:50:51 2018 -0500

Merge branch 'master' into dev

commit 8a2857b5e3c633b18c24f2275110437a702a71d0
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 11 18:42:05 2018 -0500

Fixed README.md typo; mention 'make check'.

commit 543935c02f9335142d2e485a15f37dbaebe012ed
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 11 18:35:32 2018 -0500

Updated README.md with Ubuntu packages link.

Details:
- Created a separate section of README.md for external packages, with
one bullet each for Dave Love's rpms and Nico Schlömer's Ubuntu apt
packages. Thanks to Dave and Nico for their contributions.

commit af1d8470b56d3b2a1c8513d366d788dddcb84baa
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Fri May 11 17:49:58 2018 -0500

Better handling of shared libraries on OS X.

Details:
- Use the .dylib shared library suffix on OS X (instead of .so in Linux).
- Link with the -dynamiclib and -install_name options on OS X (instead of
-shared and -soname in Linux).
- Determine operating system (e.g. Linux, Darwin) during configure and
substitute into config.mk.in rather than run 'uname -s' during make.
- Echo operating system during configure.

commit 4b72a462d7467cf815422aafac7b05037d2e3b13
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 10 18:35:38 2018 -0500

Enable building shared library by default.

Details:
- Tweaked configure so that the shared library is generated by default.
- Updated --help text and configure's feedback messages reporting the
status of the static/shared builds.
- Changed the order of build product installation so that headers are
installed last, after libraries and symlinks.

commit b699bb1ff03c6e9baaa054805b4939983ae7145b
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Thu May 10 15:54:17 2018 -0500

Adopt Linux-like .so versioning at install-time.

Details:
- Changed the naming conventions used for installed libraries and
symlinks to more closely mirror patterns used by typical GNU/Linux
libraries. Whereas previously static and shared libraries were
installed and symlinked as follows:

(library) libblis-0.3.2-15-haswell.a
(library) libblis-0.3.2-15-haswell.so
(symlink) libblis.a -> libblis-0.3.2-15-haswell.a
(symlink) libblis.so -> libblis-0.3.2-15-haswell.so

we now use the following naming conventions:

(library) libblis.a
(symlink) libblis.so -> libblis.so.0.1.2
(symlink) libblis.so.0 -> libblis.so.0.1.2
(library) libblis.so.0.1.2

where 0.1.2 indicates shared library major, minor, and build versions
of 0, 1, and 2, respectively. The conventional version string can
still be queried by linking to the library in question and then calling
bli_info_get_version_str(). (The testsuite binary does this
automatically at startup.)
- Added logic to common.mk to set the soname field in the shared library
via the -soname linker flag.
- Added a 'so_version' file to the top-level directory containing two
lines. The first line specifies the .so major version number, and the
second line specifies the minor and build version numbers joined with
a '.'. This file is read by configure and those values substituted
into build/config.mk.in to define SO_MAJOR, SO_MINORB, and SO_MMB
variables.

commit fc2d9ec6bf46f6e5b19d196208415ce433e95b10
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 9 15:19:28 2018 -0500

Tweaks to top-level clean and distclean targets.

Details:
- Moved the removal of bli_config.h from cleanh to distclean.
- Removed cleantest as a dependency of clean.

commit bf0350305971e3991861b5117a13fda31ff97b6d
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 8 16:49:22 2018 -0500

Renamed (shortened) a few build system variables.

Details:
- Renamed the following variables in config.mk (via build/config.mk.in):
BLIS_ENABLE_VERBOSE_MAKE_OUTPUT -> ENABLE_VERBOSE
BLIS_ENABLE_STATIC_BUILD -> MK_ENABLE_STATIC
BLIS_ENABLE_SHARED_BUILD -> MK_ENABLE_SHARED
BLIS_ENABLE_BLAS2BLIS -> MK_ENABLE_BLAS
BLIS_ENABLE_CBLAS -> MK_ENABLE_CBLAS
BLIS_ENABLE_MEMKIND -> MK_ENABLE_MEMKIND
and also renamed all uses of these variables in makefiles and makefile
fragments. Notice that we use the "MK_" prefix so that those variables
can be easily differentiated (such as via grep) from their "BLIS_" C
preprocessor macro counterparts.
- Other whitespace changes to build/config.mk.in.
- Renamed the following C preprocessor macros in bli_config.h (via
build/bli_config.h.in):
BLIS_ENABLE_BLAS2BLIS -> BLIS_ENABLE_BLAS
BLIS_DISABLE_BLAS2BLIS -> BLIS_DISABLE_BLAS
BLIS_BLAS2BLIS_INT_TYPE_SIZE -> BLIS_BLAS_INT_TYPE_SIZE
and also renamed all relevant uses of these macros in BLIS source
files.
- Renamed "blas2blis" variable occurrences in configure to "blas", as
was done in build/config.mk.in and build/bli_config.h.in.
- Renamed the following functions in frame/base/bli_info.c:
bli_info_get_enable_blas2blis() -> bli_info_get_enable_blas()
bli_info_get_blas2blis_int_type_size()
-> bli_info_get_blas_int_type_size()
- Remove bli_config.h during 'make cleanh' target of top-level Makefile.

commit 4b36e85be9b516b4089b24768f881dd976668997
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 8 14:26:30 2018 -0500

Converted function-like macros to static functions.

Details:
- Converted most C preprocessor macros in bli_param_macro_defs.h and
bli_obj_macro_defs.h to static functions.
- Reshuffled some functions/macros to bli_misc_macro_defs.h and also
between bli_param_macro_defs.h and bli_obj_macro_defs.h.
- Changed obj_t-initializing macros in bli_type_defs.h to static
functions.
- Removed some old references to BLIS_TWO and BLIS_MINUS_TWO from
bli_constants.h.
- Whitespace changes in select files (four spaces to single tab).

commit 7e5648ca150757b874f6823da832f3798c40b9f9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 7 18:59:19 2018 -0500

Add configure support for --libdir, --includedir.

Details:
- Added support for two new configure options: --libdir and --includedir.
They specify the precise install directories for libraries and header
files, respectively, and override any location implied by the --prefix
option (including the default install prefix, if --prefix was not
given). Thanks to Nico Schlömer for suggesting this via issue 195.
- Removed the INSTALL_PREFIX definition/anchor from build/config.mk.in
and replaced it with corresponding definitions/anchors for libdir and
includedir.
- Updated top-level Makefile to use the new variables, INSTALL_LIBDIR
and INSTALL_INCDIR, instead of INSTALL_PREFIX (which is now no longer
needed by make).
- Set default sane values for INSTALL_LIBDIR and INSTALL_INCDIR in
common.mk when configure has not been run, as is already done for
DIST_PATH. This is to safeguard against statements in the top-level
Makefile that use 'find' to locate old libraries and headers for the
uninstall targets, which run regardless of make target. Without setting
INSTALL_LIBDIR and INSTALL_INCDIR, those variables are empty and the
'find' ends up looking at '/', which is obviously not what we want.
(Also enclosed those definitions in an IS_CONFIGURED guard so that they
won't get evaluated unless configure has been run.)
- Rearranged "ifeq ($(IS_CONFIGURED),yes)" conditionals in Makefile to
reduce occurrences and separated "local" and top-level components of
cleanblastest and cleanblistest targets to improve readability.
- Adjusted out-of-tree builds so that they are no longer oblivious to
the .git directories, if present, and thus now properly augment version
strings with the appropriate patch number.
- Include missing version string in 'configure --help' output.

commit b09e4e8852a6c42895910e3bcb9041124dc8bf9f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 7 14:37:50 2018 -0500

Allow 'make clean' and friends without configuring.

Details:
- Modified top-level Makefile so that a user can run 'make distclean',
'make clean', or any of the other clean-related targets prior to
running configure (or after a previous 'make distclean'). Thanks to
Nico Schlömer for suggesting this via issue 197.
- Made the cleanblastest and cleanblistest more comprehensive in that
they now clean out build products that would have resulted from local
compilation (ie: builds performed within the 'blastest' or 'testsuite'
directories).
- Added "cc" to list of expected compiler "vendors" since the CC variable
seems to automatically be set to "cc" on Ubuntu 16.04 (which is just an
alias to gcc).
- Comment update to build/config.mk.in.

commit 35c5a1449c3efe0b2ec43cdefcfdf00e71828149
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon May 7 12:04:57 2018 -0500

No longer update version file during configure.

Details:
- Recycled the core functionality of build/update-version-file.sh into a
function in configure, disabling the updating of the 'version' file in
the process. Instead of writing the patched version string back to the
version file and then reading it again from within configure, the
patched version string is now saved directly to a variable in the main()
function in configure. This will prevent developers from accidentally
committing configure-induced changes to the version file in between
releases.

commit 8adb2f919b62da4a2885ae04a10925e0e6a2e304
Author: Mathieu Poumeyrol <kaliusers.noreply.github.com>
Date: Sun May 6 19:58:16 2018 +0200

Some cross compilations fixes (198)

* cross-compilation fixes
* add doc ranlib variable
* icc support -dumpversion, posix compatible test, plus one stupid mistake
* retab
* revert version as requested

commit 89acd9ebe516eeb97006dba344354bfc98826645
Merge: 4cff432d 0557eba7
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 2 12:53:35 2018 -0500

Merge branch 'amd'

commit 4cff432d707891ada705b039a7e043558bbf3c51
Author: Nisanth M P <31736542+nisanthmpamdusers.noreply.github.com>
Date: Wed May 2 23:20:42 2018 +0530

AMD specific optimizations for target 'zen' (194)

Re-enabled AMD-specific optimizations for zen.

Details:
- Re-enabled Zen-specific cache blocksizes for 'zen' sub-configuration.
- Re-enabled small matrix gemm optimization for 'zen'.
- These were both temporarily disabled during a previous merge simply due to lack of Zen hardware for testing.

commit 8eda5fe7f678b413cb274bd84716995a7d0b87a9
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Wed May 2 12:20:37 2018 -0500

Typo fix in README.md.

commit 0557eba78f5fcf28f0f039f28da79498ffde848c
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Mar 19 12:49:26 2018 +0530

Re-enabling the small matrix gemm optimization for target zen

Change-Id: I13872784586984634d728cd99a00f71c3f904395

commit df78ceb3d6f33a27fe69017854405edaea7c40e5
Author: Nisanth M P <nisanth.padinharepattamd.com>
Date: Mon Mar 19 11:34:32 2018 +0530

Re-enabling Zen optimized cache block sizes for config target zen

Change-Id: I8191421b876755b31590323c66156d4a814575f1

commit 5e515f9a76f4aaf43dc21315a34d797726ca8069
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 1 13:44:10 2018 -0500

Tweaked new language in README.md.

commit 1ddd9e316ad5024af8b606dfcebd1e7d587a130f
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Tue May 1 13:36:28 2018 -0500

Added link to Dave Love's Fedora Copr page.

Details:
- Added a blurb to README.md advertising Dave Love's Copr homepage,
which contains rpm packages for RHEL/Fedora-like distributions.

commit 078a852f738c66c6468bd5e64b06467edc9057fd
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 30 16:15:26 2018 -0500

Minor tweaks to top-level 'make clean' target.

Details:
- Execute 'cleanh' target as part of 'clean'
- Remove cblas.h file from 'include/<configname>/' as part of 'cleanh'
target.
- Updated the echoed (non-verbose) text for uniformity.

commit 75d0d1057dda69c655bd1cd8f791cb39b54d99b8
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Mon Apr 30 14:57:33 2018 -0500

Renamed various datatype-related macros/functions.

Details:
- Renamed the following macros in bli_obj_macro_defs.h and
bli_param_macro_defs.h:
- bli_obj_datatype() -> bli_obj_dt()
- bli_obj_target_datatype() -> bli_obj_target_dt()
- bli_obj_execution_datatype() -> bli_obj_exec_dt()
- bli_obj_set_datatype() -> bli_obj_set_dt()
- bli_obj_set_target_datatype() -> bli_obj_set_target_dt()
- bli_obj_set_execution_datatype() -> bli_obj_set_exec_dt()
- bli_obj_datatype_proj_to_real() -> bli_obj_dt_proj_to_real()
- bli_obj_datatype_proj_to_complex() -> bli_obj_dt_proj_to_complex()
- bli_datatype_proj_to_real() -> bli_dt_proj_to_real()
- bli_datatype_proj_to_complex() -> bli_dt_proj_to_complex()
- Renamed the following functions in bli_obj.c:
- bli_datatype_size() -> bli_dt_size()
- bli_datatype_string() -> bli_dt_string()
- bli_datatype_union() -> bli_dt_union()
- Removed a pair of old level-1f penryn intrinsics kernels that were no
longer in use.

commit 01c4173238baf08e7f6700a3f91a2ea58cca50c1
Author: Field G. Van Zee <fieldcs.utexas.edu>
Date: Sat Apr 28 14:07:34 2018 -0500

CHANGELOG update (0.3.2)

Page 2 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.