Tensorflow-data-validation

Latest version: v1.15.1

Safety actively analyzes 630305 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 8

0.30.0

Major Features and Improvements

* This version is the last version before TFDV 1.0. Once 1.0, all the TFDV
public APIs (i.e. symbols in the root `__init__.py`) will be subject to
semantic versioning. We are deprecating some public APIs in this version
and they will be removed in 1.0.

* Sketch-based top-k/unique stats generator now is able to detect invalid
utf-8 sequences / large texts and replace them with a placeholder.
It will not suffer from memory issue usually caused by image / large text
features in the data. Note that this generator is not by default used yet.
* Added `StatsOptions.experimental_use_sketch_based_topk_uniques` which
enables the sketch-based top-k/unique stats generator.

Bug Fixes and Other Changes

* Fixed bug in `display_schema` that caused domains not to be displayed.
* Modified how `get_schema_dataframe` outputs numeric domains.
* Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: INVALID_DOMAIN_SPECIFICATION and MULTIPLE_REASONS.
* Depends on `tensorflow-metadata>=0.30,<0.31`.
* Depends on `tfx-bsl>=0.30,<0.31`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* `tfdv.LiftStatsGenerator` is going to be removed in the next version from
the public API. To enable that generator,
supply `StatsOptions.label_feature`
* `tfdv.NonStreamingCustomStatsGenerator` is going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.
* `tfdv.validate_instance` is going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.
* Removed `tfdv.DecodeCSV`, `tfdv.DecodeTFExample` (deprecated in 0.27).
* Removed `feature_whitelist` in `tfdv.StatsOptions` (deprecated in 0.28).
Use `feature_allowlist` instead.
* `tfdv.get_feature_value_slicer` is deprecated.
`tfdv.experimental_get_feature_value_slicer` is introduced as a replacement.
TFDV is likely to have a different slicing functionality post 1.0, which
may not be compatible with the current slicers.
* `StatsOptions.slicing_functions` is deprecated.
`StatsOptions.experimental_slicing_functions` is introduced as a
replacement.
* `tfdv.WriteStatisticsToText` is removed (deprecated in 0.25.0).
* Parameter `compression_type` in `tfdv.generate_statistics_from_tfrecord`
is deprecated. The compression type is currently automatically determined.

0.29.0

Major Features and Improvements

* N/A

Bug Fixes and Other Changes

* Added check for invalid min and max values for `values_counts` for nested
features.
* Bumped the mininum bazel version required to build TFDV to 3.7.2.
* Depends on `absl-py>=0.9,<0.13`.
* Depends on `tensorflow-metadata>=0.29,<0.30`.
* Depends on `tfx-bsl>=0.29,<0.30`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* N/A

0.28.0

Major Features and Improvements

* Add anomaly detection for max bytes size for images.

Bug Fixes and Other Changes

* Depends on `numpy>=1.16,<1.20`.
* Fixed a bug that affected all CombinerFeatureStatsGenerators.
* Allow for `bytes` type in `get_feature_value_slicer` in addition to `Text`
and `int`.
* Fixed a bug that caused TFDV to improperly infer a fixed shape when
`tfdv.infer_schema` and `tfdv.update_schema` were called with
`infer_feature_shape=True`.
* Deprecated parameter `infer_feature_shape` of function `tfdv.update_schema`.
If a schema feature has a pre-defined shape, `tfdv.update_schema` will
always validate it. Otherwise, it will not try to add a shape.
* Deprecated `tfdv.StatsOptions.feature_whitelist` and added
`feature_allowlist` as a replacement. The former will be removed in the next
release.
* Added `get_schema_dataframe` and `get_anomalies_dataframe` utility
functions.
* Depends on `apache-beam[gcp]>=2.28,<3`.
* Depends on `tensorflow-metadata>=0.28,<0.29`.
* Depends on `tfx-bsl>=0.28.1,<0.29`.

Known Issues

* N/A

Breaking Changes

* N/A

Deprecations

* N/A

0.27.0

Major Features and Improvements

* Performance improvement to `BasicStatsGenerator`.

Bug Fixes and Other Changes

* Added a `compact()` and `setup()` interface to `CombinerStatsGenerator`,
`CombinerFeatureStatsWrapperGenerator`, `BasicStatsGenerator`,
`CompositeStatsGenerator`, and `ConstituentStatsGenerator`.
* Stopped depending on `tensorflow-transform`.
* Depends on `apache-beam[gcp]>=2.27,<3`.
* Depends on `pyarrow>=1,<3`.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<3`.
* Depends on `tensorflow-metadata>=0.27,<0.28`.
* Depends on `tfx-bsl>=0.27,<0.28`.

Known Issues

* N/A

Breaking changes

* N/A

Deprecations

* `tfdv.DecodeCSV` and `tfdv.DecodeTFExample` are deprecated. Use
`tfx_bsl.public.tfxio.CsvTFXIO` and `tfx_bsl.public.tfxio.TFExampleRecord`
instead.

0.26.1

Major Features and Improvements

* N/A

Bug Fixes and Other Changes

* Depends on `apache-beam[gcp]>=2.25,!=2.26.*,<2.29`.

Known Issues

* N/A

Breaking changes

* N/A

Deprecations

* N/A

0.26.0

Major Features and Improvements

* Added support for per-feature example weights which allows associating each
column its specific weight column. See the `per_feature_weight_override`
parameter in `StatsOptions.__init__`.

Bug Fixes and Other Changes

* Newly added LifecycleStage.DISABLED is now exempt from validation (similar
to LifecycleStage.DEPRECATED, etc).
* Fixed a bug where TFDV blindly trusts the claim type in the provided schema.
TFDV now computes the stats according to the actual type of the data, and
only when the actual type matches the claim in the schema will it compute
type-specific stats (e.g. categorical ints).
* Added an option to control whether to add default stats generators when
`tfdv.GenerateStatistics()`.
* Started using a new quantiles computation routine that does not depend on
TF. This could potentially increase the performance of TFDV under certain
workloads.
* Extending schema_util to support sematic domains.
* Moving natural_language_stats_generator to
natural_language_domain_inferring_stats_generator and creating a new
natural_language_stats_generator based on the fields of
natural_language_domain.
* Providing vocab_utils to assist in opening / loading vocabulary files.
* A SchemaDiff will be reported upon J-S skew/drift.
* Fixed a bug in FLOAT_TYPE_SMALL_FLOAT anomaly message.
* Depends on `apache-beam[gcp]>=2.25,!=2.26.*,<3`.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.4.*,<3`.
* Depends on `tensorflow-metadata>=0.26,<0.27`.
* Depends on `tensorflow-transform>=0.26,<0.27`.
* Depends on `tfx-bsl>=0.26,<0.27`.

Known Issues

* N/A

Breaking changes

* N/A

Deprecations

* N/A

Page 4 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.