Features
Flexible hyperparameter search space
The parameters to be included in hyperparameter optimization can now be selected using the argument `--search_parameter_kewords {list-of-keywords}`. The parameters supported are: activation, aggregation, aggregation_norm, batch_size, depth, dropout, ffn_hidden_size, ffn_num_layers, final_lr, hidden_size, init_lr, max_lr, warmup_epochs. Some special kewords are also included for groups of keywords or different search behavior: basic, learning_rate, all, linked_hidden_size.
PR 299
Missing targets in uncertainty calibration datasets
Added capabilities to the uncertainty calibration and evaluation methods to allow them to handle missing target values in multitask jobs. This capability was already included in the normal training of models, now implemented in uncertainty calibration and evaluation.
PR 295
Issue 292
Multitask evaluation for tasks of different magnitudes
When evaluation metrics tend to scale with the magnitude of a task (e.g., rmse), averaging metrics between tasks has been replaced with a geometric mean function. This makes the average metric in multitask regression jobs be less dominated by large magnitude targets. This was previously an issue for hyperparameter optimization and the evaluation of optimal epoch during model training, though the calculation of loss for gradient descent is on scaled targets and was already not scale dependent.
PR 290
Empty test set allowed
An empty test split can now be used during training. This was previously possible only using the `cv-no-test` split method, but now it is available more widely when specifying split sizes, for example with `--split_sizes 0.8 0.2 0`.
PR 284, 260 related
Issue 279
Updates to conda environment and docker file
Conda environment building will now prefer to use the pytorch channel over the conda-forge channel. The Dockerfile has been updated to use micromamba, allowing for faster environment solves than conda and removing a potential licensing issue.
PR 276
Bug Fixes
Fix MCC loss for multiclass jobs
Corrected a calculation problem in the loss function that was returning infinite loss inappropriately. Also adopted the convention of returning loss of zero when infinite loss is returned, as often happens in very unbalanced datasets. Added appropriate unit testing.
PR 309
Issue 306
Correct code error in ence uncertainty evaluation
Corrects an error in the ence uncertainty evaluation method that made that method unusable. Bug was introduced during PR 305.
PR 302
Issue 301
Fixed link to MoleculeNet website
Corrected the link to the MoleculeNet benchmark dataset website in the readme, following MoleculeNet migrating to a new site location.
PR 296
Multitarget uncertainty calibration mve weighting method
Previously, this method only worked for single task jobs, now has been extended to work for multitask models as well.
PR 291
Remove unused verion.py file
Version tracking in Chemprop no longer uses the __version__.py file and it was removed.
PR 283
Multiclass argument typo in readme
Corrected a typo where the number of classes used in multiclass regression should have been indicated as `--multiclass_num_classes`.
PR 281
Repair individual ensemble predictions
Refactoring of prediction file during the addition of uncertainty functions disabled the option to return the individual predictions of each member of an ensemble of models. Option is now available again.
PR 274