Tensorflow-transform

Latest version: v1.15.0

Safety actively analyzes 625251 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 8

0.25.0

Major Features and Improvements

* Updated the "Getting Started" guide and examples to demonstrate the support
for both the "instance dict" and the "TFXIO" format. Users are encouraged to
start using the "TFXIO" format, expecially in cases where
[pre-canned TFXIO implementations](https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/tfxio)
is available as it offers better performance.
* From this release TFT will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:

pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tensorflow-transform

Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFT available on PyPI by running the
command `pip install tensorflow-transform` .

Bug Fixes and Other Changes

* `TFTransformOutput.transform_raw_features` and `TransformFeaturesLayer` can
be used when a transform fn is exported as a TF2 SavedModel and imported in
graph mode.
* Utility methods in `tft.inspect_preprocessing_fn` now take an optional
parameter `force_tf_compat_v1`. If this is False, the `preprocessing_fn` is
traced using tf.function in TF 2.x when TF 2 behaviors are enabled.
* Switching to a wrapper for `collections.namedtuple` to ensure compatibility
with PySpark which modifies classes produced by the factory.
* Caching has been disabled for `tft.tukey_h_params`, `tft.tukey_location` and
`tft.tukey_scale` due to the cached accumulator being non-deterministic.
* Track variables created within the `preprocessing_fn` in the native TF 2
implementation.
* `TFTransformOutput.transform_raw_features` returns a wrapped python dict
that overrides pop to return None instead of raising a KeyError when called
with a key not found in the dictionary. This is done as preparation for
switching the default value of `drop_unused_features` to True.
* Vocabularies written in `tfrecord_gzip` format no longer filter out entries
that are empty or that include a newline character.
* Depends on `apache-beam[gcp]>=2.25,<3`.
* Depends on `tensorflow-metadata>=0.25,<0.26`.
* Depends on `tfx-bsl>=0.25,<0.26`.

Breaking changes

* N/A

Deprecations

* The `decode` method of the available coders (`tft.coders.CsvCoder` and
`tft.coders.ExampleProtoCoder`) has been deprecated and removed.
[Canned TFXIO implementations](https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/tfxio)
should be used to read and decode data instead.

0.24.1

Major Features and Improvements

* N/A

Bug Fixes and Other Changes

* Depends on `apache-beam[gcp]>=2.24,<3`.
* Depends on `tfx-bsl>=0.24.1,<0.25`.

Breaking changes

* N/A

Deprecations

* N/A

0.24.0

Major Features and Improvements

* Added native TF 2 implementation of Transform's Beam APIs -
`tft.AnalyzeDataset`, `tft.AnalyzeDatasetWithCache`,
`tft.AnalyzeAndTransformDataset` and `tft.TransformDataset`. The default
behavior will continue to use Tensorflow's compat.v1 APIs. This can be
overridden by setting `tft.Context.force_tf_compat_v1=False`. The default
behavior for TF 2 users will be switched to the new native implementation in
a future release.

Bug Fixes and Other Changes

* Added a small fanout to analyzers' `CombineGlobally` for improved
performance.
* `TransformFeaturesLayer` can be called after being saved as an attribute to
a Keras Model, even if the layer isn't used in the Model.
* Depends on `absl-py>=0.9,<0.11`.
* Depends on `protobuf>=3.9.2,<4`.
* Depends on `tensorflow-metadata>=0.24,<0.25`.
* Depends on `tfx-bsl>=0.24,<0.25`.

Breaking changes

* N/A

Deprecations

* Deprecating Py3.5 support.
* Parameter `use_tfxio` in the initializer of `Context` is deprecated. TFT
Beam APIs now accepts both "instance dicts" and "TFXIO" input formats.
Setting it will have no effect and it will be removed in the next version.

0.23.0

Major Features and Improvements

* Added `tft.scale_to_gaussian` to transform input to standard gaussian.
* Vocabulary related analyzers and mappers now accept a `file_format` argument
allowing the vocabulary to be saved in TFRecord format. The default format
remains text (TFRecord format requires tensorflow>=2.4).

Bug Fixes and Other Changes

* Enable `SavedModelLoader` to import and apply TF2 SavedModels.
* `tft.min`, `tft.max`, `tft.sum`, `tft.covariance` and `tft.pca` now have
default output values to properly process empty analysis datasets.
* `tft.scale_by_min_max`, `tft.scale_to_0_1` and the corresponding per-key
versions now apply a sigmoid function to scale tensors if the analysis
dataset is either empty or contains a single distinct value.
* Added best-effort tf.text op registration when loading transformation
graphs.
* Vocabularies computed over numerical features will now assign values to
entries with equal frequency in reverse lexicographical order as well,
similarly to string features.
* Fixed an issue that causes the `TABLE_INITIALIZERS` graph collection to
contain a tensor instead of an op when a TF2 SavedModel or a TF2 Hub Module
containing a table is loaded inside the `preprocessing_fn`.
* Fixes an issue where the output tensors of `tft.TransformFeaturesLayer`
would all have unknown shapes.
* Stopped depending on `avro-python3`.
* Depends on `apache-beam[gcp]>=2.23,<3`.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4`.
* Depends on `tensorflow-metadata>=0.23,<0.24`.
* Depends on `tfx-bsl>=0.23,<0.24`.

Breaking changes

* Existing caches (for all analyzers) are automatically invalidated.

Deprecations

* Deprecating Py2 support.
* Note: We plan to remove Python 3.5 support after this release.

0.22.0

Major Features and Improvements

Bug Fixes and Other Changes
* `tft.bucketize_per_key` no longer assumes that the keys during
transformation existed in the analysis dataset. If a key is missing then the
assigned bucket will be -1.
* `tft.estimated_probability_density`, when `categorical=True`, no longer
assumes that the values during transformation existed in the analysis dataset,
and will assume 0 density in that case.
* Switched analyzer cache representation of dataset keys from using a primitive
str to a DatasetKey class.
* `tft_beam.analyzer_cache.ReadAnalysisCacheFromFS` can now filter cache entry
keys when given a `cache_entry_keys` parameter. `cache_entry_keys` can be
produced by utilizing `get_analysis_cache_entry_keys`.
* Reduced number of shuffles via packing multiple combine merges into a
single Beam combiner.
* Switch `tft.TransformFeaturesLayer` to use the TF 2 `tf.saved_model.load` API
to load a previously exported SavedModel.
* Adds `tft.sparse_tensor_left_align` as a utility which aligns
`tf.SparseTensor`s to the left.
* Depends on `avro-python3>=1.8.1,!=1.9.2.*,<2.0.0` for Python3.5 + MacOS.
* Depends on `apache-beam[gcp]>=2.20.0,<3`.
* Depends on `tensorflow>=1.15,!=2.0.*,<2.3`.
* Depends on `tensorflow-metadata>=0.22.0,<0.23.0`.
* Depends on `tfx-bsl>=0.22.0,<0.23.0`.

Breaking changes
* `tft.AnalyzeDatasetWithCache` no longer accepts a flat pcollection as an
input. Instead it will flatten the datasets in the `input_values_pcoll_dict`
input if needed.
* `tft.TransformFeaturesLayer` no longer takes a parameter
`drop_unused_features`. Its default behavior is now equivalent to having set
`drop_unused_features` to `True`.

Deprecations

0.21.2

Major Features and Improvements
* Expanded capability for per-key analyzers to analyze larger sets of keys that
would not fit in memory, by storing the key-value pairs in vocabulary files.
This is enabled by passing a `per_key_filename` to `tft.count_per_key` and
`tft.scale_to_z_score_per_key`.
* Added `tft.TransformFeaturesLayer` and
`tft.TFTransformOutput.transform_features_layers` to allow transforming
features for a TensorFlow Keras model.

Bug Fixes and Other Changes

* `tft.apply_buckets_with_interpolation` now handles NaN values by imputing with
the middle of the normalized range.
* Depends on `tfx-bsl>=0.21.3,<0.22`.

Breaking changes

Deprecations

Page 5 of 8

Releases

Has known vulnerabilities

Previous Next

Tensorflow-transform

Page 5 of 8

0.25.0

0.24.1

0.24.0

0.23.0

0.22.0

0.21.2

Page 5 of 8

Links

Releases