Keras-adamw

Latest version: v1.38

Safety actively analyzes 621803 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.38

- Fixed 'L1' object has no attribute 'l2' in TF 2.3.1 (and vice versa for non-`l1_l2` objects)
- Moved testing to TF2.3.1

1.37

control_dependencies` moved from `tensorflow.python.ops` to `tensorflow.python.framework.ops`; for backwards-compatibility, edited code to use `tf.control_dependencies`.

Further, TF2.3.0 isn't compatible with Keras 2.3.1 and earlier; unsure of later versions, but development proceeds with `tf.keras`.

1.36

Existing code normalized as: `norm = sqrt(batch_size / total_iterations)`, where `total_iterations` = (number of fits per epoch) * (number of epochs in restart). However, `total_iterations = total_samples / batch_size` --> `norm = batch_size * sqrt(1 / (total_iterations_per_epoch * epochs))`, making `norm` scale _linearly_ with `batch_size`, which differs from authors' sqrt.

Users who never changed `batch_size` throughout training will be unaffected. (λ = λ_norm * sqrt(b / BT); λ_norm is what we pick, our "guess". The idea of normalization is to make it so that if our guess works well for `batch_size=32`, it'll work well for `batch_size=16` - but if `batch_size` is never changed, then performance is only affected by the guess.)

Main change [here](https://github.com/OverLordGoldDragon/keras-adamw/pull/53/filesdiff-220519926b87c12115d2f727803fbe6bR19), closing 52.

**Updating existing code**: for a choice of λ_norm that previously worked well, apply `*= sqrt(batch_size)`. Ex: `Dense(bias_regularizer=l2(1e-4))` --> `Dense(bias_regularizer=l2(1e-4 * sqrt(32)))`.

1.35

**FEATURE**: `autorestart` option which automatically handles Warm Restarts by resetting `t_cur=0` after `total_iterations` iterations.

- Defaults to `True` if `use_cosine_annealing=True`, else `False`
- Must use `use_cosine_annealing=True` if using `autorestart=True`

Updated README and `example.py`.

1.32

**BUGFIXES**:
- Last weight in network would be updated with `t_cur` one update ahead, desynchronizing it from all other weights
- `AdamW` in `keras` (optimizers.py, optimizers_225.py) weight updates were _not_ mediated by `eta_t`, so cosine annealing had no effect.

**FEATURES**:
- Added `lr_t` to tf.keras optimizers to track "actual" learning rate externally; use `K.eval(model.optimizer.lr_t)` to get "actual" learning rate for given `t_cur` and `iterations`
- Added `lr_t` vs. iterations plot to README, and source code in `example.py`

**MISC**:
- Added `test_updates` to ensure all weights update synchronously, and that `eta_t` first applies on weights as-is and _then_ updates according to `t_cur`
- Fixes 47

1.31

**BUGFIXES**:
- `SGDW` with `momentum=0` would bug per variable scoping issues; rewritten code is correct and should run a little faster. Files affected: `optimizers_v2.py`, `optimizers_225tf.py`

**MISC**:
- Added test case for `SGDW(momentum=0)`
- Added control test for `SGDW(momentum=0)` vs `SGD(momentum=0)`
- `tests/import_selection.py` -> `tests/backend.py`
- `test_optimizers.py` can now run as `__main__` without manually changing paths / working directories

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.