Tensorflow-data-validation

Latest version: v1.15.1

Safety actively analyzes 630217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

0.22.1

Major Features and Improvements

* Statistics generation is now able to handle arbitrarily nested arrow
List/LargeList types. Stats about the list elements' presence and valency
are computed at each nest level, and stored in a newly added field,
`valency_and_presence_stats` in `CommonStatistics`.

Bug Fixes and Other Changes

* Trigger DATASET_HIGH_NUM_EXAMPLES when a dataset has more than the specified
limit on number of examples.
* Fix bug in display_anomalies that prevented dataset-level anomalies from
being displayed.
* Trigger anomalies when a feature has a number of unique values that does not
conform to the specified minimum/maximum.
* Trigger anomalies when a float feature has unexpected Inf / -Inf values.
* Depends on `apache-beam[gcp]>=2.22,<3`.
* Depends on `pandas>=0.24,<2`.
* Depends on `tensorflow-metadata>=0.22.2,<0.23.0`.
* Depends on `tfx-bsl>=0.22.1,<0.23.0`.

Known Issues

Breaking Changes

Deprecations

0.22.0

Major Features and Improvements

Bug Fixes and Other Changes

* Crop values in natural language stats generator.
* Switch to using PyBind11 instead of SWIG for wrapping C++ libraries.
* CSV decoder support for multivalent columns by using tfx_bsl's decoder.
* When inferring a schema entry for a feature, do not add a shape with dim = 0
when min_num_values = 0.
* Add utility methods `tfdv.get_slice_stats` to get statistics for a slice and
`tfdv.compare_slices` to compare statistics of two slices using Facets.
* Make `tfdv.load_stats_text` and `tfdv.write_stats_text` public.
* Add PTransforms `tfdv.WriteStatisticsToText` and
`tfdv.WriteStatisticsToTFRecord` to write statistics proto to text and
tfrecord files respectively.
* Modify `tfdv.load_statistics` to handle reading statistics from TFRecord and
text files.
* Added an extra requirement group `mutual-information`. As a result, barebone
TFDV does not require `scikit-learn` any more.
* Added an extra requirement group `visualization`. As a result, barebone TFDV
does not require `ipython` any more.
* Added an extra requirement group `all` that specifies all the extra
dependencies TFDV needs. Use `pip install tensorflow-data-validation[all]`
to pull in those dependencies.
* Depends on `pyarrow>=0.16,<0.17`.
* Depends on `apache-beam[gcp]>=2.20,<3`.
* Depends on `ipython>=7,<8;python_version>="3"'.
* Depends on `scikit-learn>=0.18,<0.24'.
* Depends on `tensorflow>=1.15,!=2.0.*,<3`.
* Depends on `tensorflow-metadata>=0.22.0,<0.23`.
* Depends on `tensorflow-transform>=0.22,<0.23`.
* Depends on `tfx-bsl>=0.22,<0.23`.

Known Issues

* (Known issue resolution) It is no longer necessary to use Apache Beam 2.17
when running TFDV on Windows. The current release of Apache Beam will work.

Breaking Changes

* `tfdv.GenerateStatistics` now accepts a PCollection of `pa.RecordBatch`
instead of `pa.Table`.
* All the TFDV coders now output a PCollection of `pa.RecordBatch` instead of
a PCollection of `pa.Table`.
* `tfdv.validate_instances` and
`tfdv.api.validation_api.IdentifyAnomalousExamples` now takes
`pa.RecordBatch` as input instead of `pa.Table`.
* The `StatsGenerator` interface (and all its sub-classes) now takes
`pa.RecordBatch` as the input data instead of `pa.Table`.
* Custom slicing functions now accepts a `pa.RecordBatch` instead of
`pa.Table` as input and should output a tuple `(slice_key, record_batch)`.

Deprecations

* Deprecating Py2 support.

0.21.5

Major Features and Improvements

* Add `label_feature` to `StatsOptions` and enable `LiftStatsGenerator` when
`label_feature` and `schema` are provided.
* Add JSON serialization support for StatsOptions.

Bug Fixes and Other Changes
* Only requires `avro-python3>=1.8.1,!=1.9.2.*,<2.0.0` on Python 3.5 + MacOS

Breaking Changes

Deprecations

0.21.4

Major Features and Improvements

* Support visualizing feature value lift in facets visualization.

Bug Fixes and Other Changes

* Fix issue writing out string feature values in LiftStatsGenerator.
* Requires 'apache-beam[gcp]>=2.17,<3'.
* Requires 'tensorflow-transform>=0.21.1,<0.22'.
* Requires 'tfx-bsl>=0.21.3,<0.22'.

Breaking Changes

Deprecations

0.21.2

Major Features and Improvements

Bug Fixes and Other Changes

* Fix facets visualization.
* Optimize LiftStatsGenerator for string features.
* Make `_WeightedCounter` serializable.
* Add support computing for weighted examples in LiftStatsGenerator.

Breaking Changes

Deprecations

* `tfdv.TFExampleDecoder` has been removed. This legacy decoder converts
serialized `tf.Example` to a dict of numpy arrays, which is the legacy
input format (prior to Apache Arrow). TFDV has stopped accepting that format
since 0.14. Use `tfdv.DecodeTFExample` instead.

0.21.1

Major Features and Improvements

Bug Fixes and Other Changes
* Do validation on weighted feature stats.
* During schema inference, skip features which are missing common stats. This
makes schema inference work when the input stats are generated from some
pre-existing, unknown schema.
* Fix facets visualization in Chrome >=M80.

Known Issues

* Running TFDV with Apache Beam 2.18 or 2.19 does not work on Windows. If you
are using TFDV on Windows, use Apache Beam 2.17.

Breaking Changes

Deprecations

Page 6 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.