Ecnet

Latest version: v4.1.2

Safety actively analyzes 631249 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 6

3.3.0

- Addition of validated, PaDEL/alvaDesc-generated YSI databases
- Update to repository links, author information
- ecnet.utils.data_utils now forces UTF-8 encoding for all database creation/saving
- ML back-end updated to TensorFlow 2.0.0
- No API changes to _ecnet.models.mlp.MultilayerPerceptron_
- Existing .h5 model files **will not work** with the updated class

_**Note:** initially, PyTorch was looked at as an alternative; however, after tests to evaluate performance were conducted and the viability of installing PyTorch on high-performance machines available to the ECRL were both deemed inadequate, updating to TensorFlow 2.0.0 was deemed the most appropriate action._
- Only the following hyper-parameters are tuned with the built-in functions:
- **Learning rate** of Adam optimization function
- **Learning rate decay** of Adam optimization function
- **Batch size** during training
- **Patience** (if validating, epochs to wait for better validation loss, else terminate training)
- **Size** of each hidden layer

_**Note:** with the relatively small number of samples our models are trained with, it does not make sense to adjust hyper-parameters such as **beta_1**, **beta_2**, and **epsilon**. The hyper-parameters listed above are theorized to play a much more important role with how the models train/perform._
- Added the UML ECRL's general publication workflow as _ecnet.workflows.ecrl_workflow.create_model_
- If using _ecnet.Server_ and not creating a project, a single model's filename can now be specified as an additional argument (default: _model.h5_)
- TensorFlow's _verbose_ argument is now propagated from _ecnet.Server.train_ to the model during training; added as an additional argument
- _ecnet.models.mlp.MultilayerPerceptron.fit_ now returns a tuple: _(learn losses, validation losses)_; _learn losses_ and _validation losses_ are both lists containing loss values (mean squared error) at every epoch; if training a single model using _ecnet.Server.train_, this tuple is returned; if not performing validation, the _validation losses_ list is populated with _None_ elements equal in size to the _learn losses_ list
- If installing using _setup.py_, installing TensorFlow is optional; to skip the installation of the pre-compiled PyPI distribution of TensorFlow, run _setup.py_ with **python setup.py --omit_tf install**

_**Note:** other methods of installing TensorFlow offer clear benefits (GPU support, different CPU instruction sets, etc.), therefore we want to provide an option for the user to use an existing installation of TensorFlow instead of forcing the PyPI-sourced version._

3.2.3

- If validation/test sets are empty, input parameter limiting processes will still run
- _Server.limit\_inputs_ now correctly returns input parameter names, importances

3.2.2

- _ecnet.Server.remove_outliers_ and _ecnet.tasks.remove_outliers_ have been removed
- _while detecting outliers may be beneficial in determining abnormalities in data, removing them entirely is likely not the right approach (in terms of fuel property prediction). Once a viable usage has been determined, outlier detection will be included._
- Added the _batch_size_ hyper-parameter, included in the default model configuration and hyper-parameter tuning process
- Relevant unit tests updated
- Any missing model configuration variables from config files generated with previous versions of ECNet will now be set to their default values
- Additional unit tests added
- Added option to convert SMILES to MDL during PaDEL-based database creation
- Additional unit test added
- Added PaDEL-generated databases for all properties
- _ecnet.tasks.limit\_inputs.limit\_rforest_ now relies on _sklearn.ensemble.RandomForestRegressor_ as its only dependency
- _limit\_rforest_ now returns list of parameter names/importances instead of a modified DataFrame
- _Server.limit\_inputs_ also returns a list of parameter names/importances
- Removed the _ditto-lib_ dependency
- Bug fixes:
- _Server.\_sets_ now loads when a PRJ file is opened via _ecnet.Server_
- _ecnet.utils.data\_utils.DataFrame.set\_inputs_ now immediately applies selected inputs to L/V/T sets
- _ParityPlot_ parity lines now scale to reflect data minimum/maximum
- More robust unit tests for MultilayerPerceptron, database creation, input parameter limiting
- All unit tests may now be run individually

3.2.1

- Training an MLP using a validation set now uses Keras' early stopping callback to determine learning cutoff, preserves weights at best validation loss
- Moved multiprocessing.set_start_method to multiprocessed tasks

3.2.0

1.) The following conversions have been removed from ECNet:
- get_smiles
- smiles_to_descriptors
- smiles_to_mdl
- mdl_to_descriptors

*Note: these were adding clutter, and were not within the main scope of ECNet.

2.) PaDEL-Descriptor is no longer bundled into ECNet

*Note: with the removal of conversion functions, this is no longer needed.

3.) Database creation functions now rely on two separate packages:
- PaDELPy (https://github.com/ECRL/PaDELPy) - QSPR descriptor generation using PaDEL-Descriptor
- alvaDescPy (https://github.com/ECRL/alvaDescPy) - QSPR descriptor generation using alvaDesc

*Note: it made sense to create separate packages for interfacing with these software, a Python interface for generating QSPR descriptors is generally quite handy.

4.) _ecnet.tools.database.create_db_'s arguments have been changed:

python
>>> ecnet.tools.database.create_db(['CC', 'CCC'], 'my_database.csv', targets=[13, 47])


Construct using alvaDesc:

python
>>> ecnet.tools.database.create_db(['CC', 'CCC'], 'my_database.csv', targets=[13, 47], backend='alvadesc')


*Note: supplying SMILES strings and targets using lists makes more sense than requiring the user to create a separate file - this change allows the user to choose where the data comes from.

5.) _ecnet.tools.project.predict_'s arguments have been changed:
python
>>> results = ecnet.tools.project.predict(['CC', 'CCC'], 'my_project.prj')
>>> print(results)
[[13], [47]]


*Note: similar to why we switched to lists as inputs in database creation, makes more sense

6.) _ecnet.Server_ has been rearranged a bit:
- project training has been moved to a separate function at _ecnet.tasks.training.train_project_
- various functions have been moved to _ecnet.utils.server_utils_:
- creating a project folder structure
- saving a project as a .prj file
- opening a .prj file to use
- task-specific logging messages have been moved to their respective functions in _ecnet.tasks_

*Note: _ecnet.Server_ needed to be shrunk down, and functions that were obviously utilities were moved into utility files. This should also provide more direct access to the "back-end" of ECNet (subverting Server usage), allowing greater variation in experimental procedure.

7.) Added a suite of unit tests implemented with the _unittest_ library:
- in addition to Server unit tests, individual utilities of ECNet are tested
- added a Python script, _/tests/test_all.py_, to automatically run all unit tests and report a summary of successes/failures

*Note: it's time for "proper" unit testing, and that means implementing a unit testing package. I'm looking forward to expanding ECNet's tests and introduce more automation into the testing process.

8.) Installation now forces TensorFlow 1.13.1 to be installed

*Note: I've encountered _pip install tensorflow_ installing the 2.0.0 beta, which ECNet does not currently support - we'll make the change when we're ready (and so is Keras)

9.) Changed/added a variety of databases to the _/databases/_ directory
- All databases constructed using alvaDesc
- All SMILES strings have been validated with respect to compound name
- PubChemPy (https://github.com/mcs07/PubChemPy) is a lifesaver
- Compounds not found on PubChem were validated in-house by an ECRL research assistant

*Note: in order to ensure accurate QSPR-descriptor to experimental value correlation, accurate SMILES strings are necessary (assuming descriptors are being generated using them).

3.1.2

- All methods/functions now enforce specific types for arguments, return values
- calc_r2 function now uses scikit-learn's r2_score function
- Changed unit testing scheme, now uses unittest library
- added a suite of unit tests

Page 2 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.