- Make Hall, PTSD and Ace into demo datasets with red text
- Add option to remove projects
- Improved documentation on installation.
- Prevent users from skipping step 3 by requiring 1 included item.
- Faster and more accurate search.
- Add the option to configure the hostname and port number. See https://asreview.readthedocs.io/en/latest/installation.htmlserver-installation
- Fixed an issue where the `__init__.py` file was missing for the `asreview.io` package.
- You can now provide multiple datasets. They will be merged/appended together.
- You can provide a dataset with only inclusions. Use the flag `--included_dataset`. You are able to provide multiple included datasets.
- You can provide a dataset with only exclusions. Use the flag `--excluded_dataset`. Multiple files are possible.
- You can provide a labelled prior dataset. Use the flag`--prior_dataset`. Multiple files are possible.
- If given a partially labeled dataset, ASReview will now continue with those labels.
- A partially labeled dataset can now be simulated by ASReview.
- The flag `--extra_dataset` is removed, since its functionality is now covered by `--prior_dataset`.
- Oracle mode now takes the width of your console more into account.
- `prior_included` and `prior_excluded` are phased out.
- The option `--log_file` has been renamed to `--state_file`. The old option is (for now) still available.
- Fix several issues with the parameters in the config file not being the same as `model.param`.
- The `DenseNNLayer` model has been renamed to `NN2Layer` to have a more consistent naming scheme. In the same vein, it is now available under `asreview.models.nn_2_layer`.
- Functionality of creating the feature matrix has been moved from the factory `asreview.review.factory` to the review base class. Thus, instead of supplying the feature matrix to the review class, you should supply an `ASReviewData` instance.
- The current query is now stored in the log file.
- The feature matrix is now stored in the log file. This should improve performance, when restarting ASReview.
- When using the `reviewer.query` function, you can supply a different query strategy.
- It is now possible to write extensions for reading different file formats, using the `asreview.readers` entry-point.
- The Logger now has a property `settings` that replaces the `add_settings` method.
- Everything related to the `logger` functionality has been renamed to `state`. That means that arguments have been changed, class names have been changed, function names have been changed, etc.
- Improve documentation.
- Add extra CLI argument: --feature_extraction
- Set the feature extraction method from the command line.
- Fix an issue where the program would break if the number of prior inclusions and exclusions were not equal.
- Fix an issue where hyperopt would create int64 values that would break the simulation.
- Fix mixed query strategy calling itself "mixed" instead of the proper name.
- Fix hyperopt parameters in base classes being unavailable for optimization.
- Fix hyperopt definition of `tfidf`:`ngram_max` to return the appropriate value.
- Fix hyperopt implementation for `nn2-layer` model.
- Fix the embedding matrix being present in the default parameters of the LSTM models.
- Fix an issue where feature extraction parameters were not properly decoded from a configuration file.
- Add new member functions `from_file` and `from_path` to Analysis class.
- Fix the attribute `name` in several classes to match their class name.
- Add a new property `param` to `BaseModel` to get the current parameters of a model. This should eliminate a number of potential bugs.
- Change argument/attribute `workers`of `Doc2Vec` class to `n_jobs` to make it follow *SKLearn* convention.
- The settings of the review are now added within the Review class, instead of in the factory.
- Phase out some `os.path` usage in favor of `pathlib`.
- Improved unit tests.
- Replaces PyInquirer by Questionary. This solves issues with other python packages like Ipython and Jupyter Notebook.
- If asreview refuses to install, manually uninstall PyInquirer (`pip uninstall pyinquirer`) and then try again to install asreview.
- Improved KeyBoard interrupts
- Check logfile extensions
This release improves packaging, publishing of new releases and uploading to PyPI. No internal changes to ASReview.
New models, query strategies and API changes
- Due to significant API changes, the log file versions have been updated. As a result, log files created with older version of ASReview will not be able to be read with the new version. Keep using the old version with these old log files (for reproducibility purposes, this is generally a good idea).
- Different options for installing the package are now available. In an effort to keep the number of dependencies in check, the dependencies of some models are optional. In order to use these models, it is necessary to install these packages manually (an error will be shown giving the name of the missing package). You can also use `pip install asreview[all]` to install all optional dependencies automatically.
- New Model: `nn-2-layer`
- Dense Neural Network consisting of two layers. Seems to work well with the new doc2vec feature extraction.
- New Model: `rf`
- Random Forest model (sklearn).
- New Model: `logistic`
- Logistic regression model.
- New Balance strategy: `double`
- This is the same strategy as the `triple` balance strategy, except there is
- New Query strategy: `cluster`
- This query strategy uses K-Means clustering to divide the papers in different clusters. It then randomly selects one of these clusters and finds the one with the highest probability in that cluster.
- New Query strategy: `mixed`
- This is a new 'class' of query strategies, where query strategies can be mixed. Previously only `rand_max` was implemented, but any two query strategies can be combined.
- New Feature Extraction: `doc2vec`
- Uses the doc2vec model from the `gensim` package.
- New Feature Extraction: `sbert`
- Uses the Sentence BERT model with a pretrained (provided by sbert) dataset. This is probably not ideal, and as such I haven't had much success with it.
- New Feature Extraction: `embedding-idf`
- Uses the average of word embeddings weighted by inverse document frequency.
- Create abstract 'super' model above all types of models.
- Move feature extraction out of the models. This means that one can now use different feature extractors with the same model, although some restrictions apply.
- Remove ModAL from the active learning process.
- We were not using modAL all that intensively anymore. The main reason for removal is that modAL uses a system that requires functions/arguments to be passed around. Now we're using classes, which improves the readability and maintainability.
- Align all types of models (train, query, balance, feature extraction) with a similar class structure.
- Improve and align the hyper parameter optimization of the different types of models.
- Remove the `get_data` member function from the `ASreviewData` class. It was not a very useful structure that was often used to get one piece of data and throw away the other two. As a replacement, use `as_data.texts` to get the texts (title + abstract), and `as_data.labels` to get the labels.
- Lots of renamed classes. It is generally advised to search with a string.
- The query strategy `rand` has been renamed to `random`.
- A lot of documentation was added/updated.
- New and improved unit tests. Query models are added to unit tests and tested for 'cheating'. Feature extraction received their own tests.
- Improve documentation
- Fix excel import
- Add support for excel (.xlsx) files.
- Add support for PubMed XML files.
- Add wider support for ris/csv files.
- Introduce hooks for hyperopt optimization
-- Hyperopt is a package for hyperparameter optimization, and these hooks make it much easier to do the optimization of said hyperparameters.
-- The hyperopt package is optional.
- Model specific action are removed from the review package and moved in the model package.
- Models are now classes, instead of a generator function.
- Add HDF5 as a storage container for logging.
-- It works parallel to the still available JSON logger. It has an advantage in both speed and disk requirements (~x2.4).
- The JSON logger is updated, and it's version updated to 2.0, making it incompatible with log files created with an earlier version of ASReview. Automatically detected if this is the case.
- Add hooks for final_labels
-- This is a slightly obscure feature, where you can have for example two levels of inclusion:
--- After reading the abstract -> model trained on this
--- After reading full text
-- This is a simulation only feature that can be enabled by putting abstract_only=True in the [global_settings] section of a config file.
- Update the command line interface
-- Oracle only change.
-- This makes the command line interface much more useful, asking more questions and giving more options what to do next.
-- It uses the PyInquirer package, which unfortunately means that we cannot test oracle mode anymore with pytest (needs a TTY).
- The analysis part of the simulation project was ported over, since it can be generally useful.
- Documentation was updated.
- Add support for Tensorflow 2.0.
- Various (API) changes.
- Entry points and module renamed into `asreview`