Major Features and Improvements
* Add support for detecting drift and distribution skew in numeric features.
* `tfdv.validate_statistics` now also reports the raw measurements of
distribution skew/drift (if any is done), regardless whether skew/drift is
detected. The report is in the `drift_skew_info` of the `Anomalies` proto
(return value of `validate_statistics`).
* From this release TFDV will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tensorflow-data-validation
Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFDV available on PyPI by running the
command `pip install tensorflow-data-validation` .
Bug Fixes and Other Changes
* Added `tfdv.load_stats_binary` to load stats what were written using
`tfdv.WriteStatisticsToText` (now `tfdv.WriteStatisticsToBinaryFile`).
* Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: DOMAIN_INVALID_FOR_TYPE, UNEXPECTED_DATA_TYPE,
FEATURE_MISSING_NAME, FEATURE_MISSING_TYPE, INVALID_SCHEMA_SPECIFICATION
* Fixed a bug that `import tensorflow_data_validation` would fail if IPython
is not installed. IPython is an optional dependency of TFDV.
* Depends on `apache-beam[gcp]>=2.25,<3`.
* Depends on `tensorflow-metadata>=0.25,<0.26`.
* Depends on `tensorflow-transform>=0.25,<0.26`.
* Depends on `tfx-bsl>=0.25,<0.26`.
* Depends on `scikit-learn>=1.0,<2` (mutual-information installation).
Known Issues
* N/A
Breaking Changes
* `tfdv.WriteStatisticsToText` is renamed as
`tfdv.WriteStatisticsToBinaryFile`. The former is still available but will
be removed in a future release.
Deprecations
* N/A