Kernel-tuner

Latest version: v1.0

Safety actively analyzes 630217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.4.0

Added
- support for (lambda) function instead of list of strings for restrictions
- support for (lambda) function instead of list for specifying grid divisors
- support for (lambda) function instead of tuple for specifying problem_size
- function to store the top tuning results
- function to create header file with device targets from stored results
- support for using tuning results in PythonKernel
- option to control measurements using observers
- support for NVML tunable parameters
- option to simulate auto-tuning searches from existing cache files
- Cupy backend to support C++ templated CUDA kernels
- support for templated CUDA kernels using PyCUDA backend
- documentation on tunable parameter vocabulary

0.3.2

Added
- support loop unrolling using params that start with loop_unroll_factor
- always insert "define kernel_tuner 1" to allow preprocessor ifdef kernel_tuner
- support for user-defined metrics
- support for choosing the optimization starting point x0 for most strategies

Changed
- more compact output is printed to the terminal
- sequential runner runs first kernel in the parameter space to warm up device
- updated tutorials to demonstrate use of user-defined metrics

0.3.1

Added
- kernelbuilder functionality for including kernels in Python applications
- smem_args option for dynamically allocated shared memory in CUDA kernels

Changed
- bugfix for Nvidia devices without internal current sensor

0.3.0

Changed
- fix for output checking, custom verify functions are called just once
- benchmarking now returns multiple results not only time
- more sophisticated implementation of genetic algorithm strategy
- how the "method" option is passed, now use strategy_options

Added
- Bayesian Optimizaton strategy, use strategy="bayes_opt"
- support for kernels that use texture memory in CUDA
- support for measuring energy consumption of CUDA kernels
- option to set strategy_options to pass strategy specific options
- option to cache and restart from tuned kernel configurations cachefile

Removed
- Python 2 support, it may still work but we no longer test for Python 2
- Noodles parallel runner

0.2.0

Changed
- no longer replacing kernel names with instance strings during tuning
- bugfix in tempfile creation that lead to too many open files error

Added
- A minimal Fortran example and basic Fortran support
- Particle Swarm Optimization strategy, use strategy="pso"
- Simulated Annealing strategy, use strategy="simulated_annealing"
- Firefly Algorithm strategy, use strategy="firefly_algorithm"
- Genetic Algorithm strategy, use strategy="genetic_algorithm"

0.1.9

Changed
- bugfix for C backend for byte array arguments
- argument type mismatches throw warning instead of exception

Added
- wrapper functionality to wrap C++ functions
- citation file and zenodo doi generation for releases

Page 2 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.