Tensorflow-data-validation

Latest version: v1.15.1

Safety actively analyzes 630254 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

1.4.0

Major Features and Improvements

* Float features can now be analyzed as categorical for the purposes of top-k
and unique count using experimental sketch based generators.
* Support SQL based slicing in TFDV. This would enable slicing (using SQL) in
TFX OSS and Dataflow environments. SQL based slicing is currently not
supported on Windows.

Bug Fixes and Other Changes

* Variance calculations have been updated to be more numerically stable for
large datasets or large magnitude numeric data.
* When running per-example validation against a schema, output of
validate_examples_in_tfrecord and validate_examples_in_csv now optionally
return samples of anomalous examples.
* Changes to source code ensures that it can now work with `pyarrow>=3`.
* Add `load_anomalies_binary` utility function.
* Merge two accumulators at a time instead of batching.
* BasicStatsGenerator is now responsible for setting
FeatureNameStatistics.Type. Previously it was possible for a top-k generator
and BasicStatsGenerator to set different types for categorical numeric
features with physical type STRING.
* Depends on `pyarrow>=1,<6`.
* Depends on `tensorflow-metadata>=1.4,<1.5`.
* Depends on `tfx-bsl>=1.4,<1.5`.
* PartitionedStatsFn can optionally provide their own PTransform to control
how inputs are partitioned.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* Deprecated python 3.6 support.

1.3.0

Major Features and Improvements

Bug Fixes and Other Changes

* Fixed bug in JensenShannonDivergence calculation affecting comparisons of
histograms that each contain a single value.
* Fixed bug in dataset constraints validation that caused failures with very
large numbers of examples.
* Fixed a bug wherein slicing on a feature missing from some batches could
produce slice keys derived from a different feature.
* Depends on `apache-beam[gcp]>=2.32,<3`.
* Depends on
`tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3`.
* Depends on `tfx-bsl>=1.3,<1.4`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* N/A

1.2.0

Major Features and Improvements

* Added statistics/generators/mutual_information.py. It estimates AMI using a
knn estimation. It differs from sklearn_mutual_information.py in that this
supports multivalent features/labels (by encoding) and multivariate
features/labels. The plan is to deprecate sklearn_mutual_information.py in
the future.
* Fixed NonStreamingCustomStatsGenerator to respect max_batches_per_partition.

Bug Fixes and Other Changes

* Switched from namedtuple to tfx_namedtuple in order to avoid pickling issues
with PySpark.
* Depends on 'scikit-learn>=0.23,<0.24' ("mutual-information" extra only)
* Depends on 'scipy>=1.5,<2' ("mutual-information" extra only)
* Depends on `apache-beam[gcp]>=2.31,<3`.
* Depends on `tensorflow-metadata>=1.2,<1.3`.
* Depends on `tfx-bsl>=1.2,<1.3`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* N/A

1.1.1

Major Features and Improvements

* N/A

Bug Fixes and Other Changes

* Depends on `google-cloud-bigquery>=1.28.0,<2.21`.
* Depends on `tfx-bsl>=1.1.1,<1.2`.
* Fixes error when using tfdv.experimental_get_feature_value_slicer with
pandas==1.3.0.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* N/A

1.1.0

Major Features and Improvements

* N/A

Bug Fixes and Other Changes

* Optimized certain stats generators that needs to materialize the input
RecordBatches.
* Depends on `protobuf>=3.13,<4`.
* Depends on `tensorflow-metadata>=1.1,<1.2`.
* Depends on `tfx-bsl>=1.1,<1.2`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* N/A

1.0.0

Major Features and Improvements

* N/A

Bug Fixes and Other Changes

* Increased the threshold beyond which a string feature value is considered
"large" by the experimental sketch-based top-k/unique generator to 1024.
* Added normalized AMI to sklearn mutual information generator.
* Depends on `apache-beam[gcp]>=2.29,<3`.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3`.
* Depends on `tensorflow-metadata>=1.0,<1.1`.
* Depends on `tfx-bsl>=1.0,<1.1`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* Removed the following deprecated symbols. Their deprecation was announced
in 0.30.0.
- `tfdv.validate_instance`
- `tfdv.lift_stats_generator`
- `tfdv.partitioned_stats_generator`
- `tfdv.get_feature_value_slicer`
* Removed parameter `compression_type` in
`tfdv.generate_statistics_from_tfrecord`

Page 3 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.