Adeft

Latest version: v0.11.2

Safety actively analyzes 619516 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.11.2

This release adds support for Python 3.11.

0.11.1

This release updates the functions for downloading models and other resources from s3 to use boto3 instead of the unmaintained wget package. Downloads should now be more reliable.

0.11.0

This release fixes a bug that caused the grounding GUI to not work when adeft is pip installed. The adeft folder for the pretrained models is now placed in a platform specific user data folder by default rather than in a hidden folder in the users home directory. Users are still able to override this default by setting the environment variable `ADEFT_HOME`. Tests have been updated to use pytest instead of nose.

0.10.0

This release makes several changes concerning model statistics.

1. The global precision, recall, and F1 scores for a classifier now use micro-averaging to aggregate across the scores for different positive class labels rather than taking an average weighted by the frequencies for each positive label. Micro-averaging looks at global counts of true positives, false positives, and false negatives
across all positive labels. A true positive involves any positive labeled datapoint classified correctly. A false positive involves any positive labeled datapoint that has been classified incorrectly. A false negative involves any datapoint being classified incorrectly to a positive labeled datapoint. Note that false positives and false negatives can overlap. Micro-averaging is easier to reason about and interpret and using it allows for some simplification of implementation in other places. The original decision to use the weighted average was made with little thought at a time when we were making less use of model statistics.

2. A method has been added to `adeft.disambiguate.AdeftDisambiguator` that allows the set of positive labels to be updated while recomputing global model statistics. Previously it was required to retrain the model. This is facilitated by storing the entire label vs label confusion matrix for each CV fold upon training a model and serializing this when saving the model.

Bug fixes and a smaller changes were also made
1. A bug was fixed that was causing the labels in model statistics to fail to update when `adeft.disambiguate.AdeftDisambiguator.modify_groundings` was used to update groundings in a model.
2. A bug was fixed that caused the labels attribute of an `adeft.disambiguate.AdeftDisambiguator` to not contain labels for which no defining pattern exists. (These labels are typically for texts manually curated in Entrez as mentioning a particular gene with the shortform of interest as a synonym but which are not abbreviations.)
3. A new attribute was added to classifiers called `other_metadata`. Anything jsonable stored within this attribute will be preserved upon model serialization. We are using this to store any relevant information needed to retrain a model that does not fit into the existing attributes. This allows for simplification of the retraining process.
4. Some small updates have been made to the introductory Jupyter notebook.

0.9.0

This release makes a number of improvements to the grounding GUI.

1. Previously, actions such as deleting an entry or toggling a label as positive/negative would cause the scroll position and text entered into the input boxes to be lost. This made using the app tedious since the page would refresh to the top after each action, making it burdensome for example to delete many groundings or toggle many labels in sequence. This has been remedied.
2. The input boxes at the top are now fixed in a sticky position making it unnecessary to scroll back and forth in order to select rows and then enter groundings. They now follow along as the user scrolls.
3. Columns of the table are now sortable. The headers for each column are now buttons masquerading as links. Clicking each header will cause the rows to be sorted by that column. This is useful for example to aid in scanning for similar longforms or to group every row together that has the same grounding.
4. The user may now pass in a csv file of known groundings with rows of the form namespace, identifier, standard name (e.g HGNC,6091,INSR). It is then only necessary to enter the namespace and one of the identifier or standard name into the input boxes for any grounding that has a row in the supplied table.
5. Entered groundings are now color coded, with one color for groundings where the standard name and identifier match in a row in the supplied groundings csv file, another color for groundings where the standard name and identifier do not match based on the table, and black if there are no rows in the table for the entered standard name and identifier. The colors have been chosen so that the contrast can hopefully be detected by most color blind users; instead of the standard green for match, red for match, approximations have been chosen for these colors based on the [Wong color palette](https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40).
6. Any rows provided the grounding `ignore` will have their longforms dropped from the generated grounding map. These are displayed with a special color to highlight the special semantic role.
7. Labels without a namespace will not appear in the column of labels which can be toggled as positive/negative.

These changes should make the GUI much more user friendly and less tedious to use.

0.8.0

This release fixes several bugs and makes some small updates.

Fixes have been made for
1. A bug in AdeftMiner.prune that broke this method but was undiscovered due to lack of testing. The bug has been fixed and a test has been added.
2. Training adeft models throwing an error for the edge case where there are more than two labels with only one positive label.
3. The longform scorer throwing an error when there are punctuation characters in the shortform.
4. The GUI not working when the multiprocessing start method is set to spawn. This caused the GUI to fail on windows, where fork is unavailable. This should resolve issue 49.
5. The deprecated parameter iid has been removed from internal use of Scikit-learn's GridSearchCV, removing a deprecation warning.

The following other changes have been made

1. AdeftLabeler now requires unique identifiers along with the texts passed into process_texts. Instead of passing in a list of texts, the process_texts method now takes a list of tuples of the form (text, identifier). The output list now contains tuples of the form (text, label, identifier). This is useful for mapping back from texts in the generated corpus to texts in the input. Texts without defining patterns are filtered out completely and those with defining patterns have the defining patterns replaced with only the shortform, making mapping backwards nontrivial without the identifiers.
2. Adeft's home folder can now be specified by setting the environment variable ADEFT_HOME in the user's profile. The default is now the hidden folder ".adeft" in the users home directory with subfolders for different adeft versions.
3. The parameter class_weight from Scikit-learn's implementation of logistic regression is now exposed as a parameter of AdeftClassifier. This allows for provided different weights in the loss function for different class labels.

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.