What's new?
This is a major release. The entire code base has been fully refactored to use a much more object-oriented design. This should make it much easier to make improvements and to add extensions. As result, there have been significant changes to the RSMTool API (see link in documentation below for more details).
New features
New learners
* New regressors from the latest SKLL release (v1.5.1) have been added to ``rsmtool``.
* `rsmtool` can now be used with both regressors and classifiers from SKLL, including classifiers that produce probabilistic output which can be used to produce expected values as predictions.
See the [SKLL documentation](http://skll.readthedocs.io/en/latest/run_experiment.html?highlight=learners#learners) for the full list of learners.
Enhanced outputs
* Users can now specify the ``file_format`` configuration option to save intermediate files in either ``tsv``, ``csv``, or ``xlsx`` format.
* Users can specify a ``use_thumbnails`` configuration option that will embed clickable thumbnails in the HTML report, rather than full-sized images. Upon clicking the thumbnails, full-sized images will be displayed in a new window. This is particularly useful for larger reports with many images, improving both the readability and the loading speed of such reports.
* Reports for `rsmtool`, `rsmeval`, and `rsmsummarize` now contain a new section containing links to intermediate files (``intermediate_file_paths.ipynb``) so that users can now easily inspect these files from the report itself.
New configuration options
* Users can now specify ``features`` in the configuration file as a ``list``. When providing a list of features, signs or transformations cannot be specified. This makes creating configuration files for simple experiments much easier and faster.
* Users can now specify a ``skll_objective`` for tuning the SKLL learners used in their experiments.
* Users can now specify a ``flag_column_test`` configuration option to use different flags for the test file and the training file.
* Users can now specify a ``standardize_features`` boolean option if they do not want the feature values standardized, which is the default.
New evaluations
* `rsmtool` and `rsmeval` now compute disattenuated correlations if the data includes two human scores.
Code changes
* New helper classes have been added to ``rsmtool``, which allow easy reading, writing, and manipulation of multiple ``pandas`` data frames.
- ``container.DataContainer()``: A class to encapsulate multiple data frames.
- ``reader.DataReader()``: A class to read multiple tabular files into a ``DataContainer()`` object.
- ``writer.DataWriter()``: A class to write all data frames contained in a``DataContainer()`` object to separate files, with a specified file extension.
* The ``rsmtool`` module is now installable via ``pip``, in addition to being installable with ``conda``.
* `preprocessor.trim()` can now take both numpy arrays and lists as inputs.
Bugfixes
* Fixed warning in ``rsmcompare`` when computing summary evaluations.
* Previously confusion matrices forced human scores to integers, while score distributions used the value "as is". Now both analyses use rounded human scores.
* Length columns are now forced to numeric, if they are non-numeric.
Documentation
* Added documentation for refactored [API](http://rsmtool.readthedocs.io/en/latest/api.html).
* Added detailed documentation about [how to write RSMTool tests](http://rsmtool.readthedocs.io/en/latest/contributing.html#rsmtool-tests).