Tensorflow-datasets

Latest version: v4.9.4

Safety actively analyzes 629723 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 6

4.8.2

Deprecated

- Python 3.7 support: this is the last version of TFDS supporting Python 3.7.
Future versions will use Python 3.8.

Fixed

- `tfds new` and `tfds build` better support the new recommended datasets
organization, where individual datasets have their own package under
`datasets/`, builder class is called `Builder` and is defined within module
`${dsname}_dataset_builder.py`.

Security

4.8.1

Changed

- Added file `valid_tags.txt` to not break builds.
- TFDS no longer relies on TensorFlow DTypes. We chose NumPy DTypes to keep the
typing expressiveness, while dropping the heavy dependency on TensorFlow. We
migrated all our internal datasets. Please, migrate accordingly:
- `tf.bool`: `np.bool_`
- `tf.string`: `np.str_`
- `tf.int64`, `tf.int32`, etc: `np.int64`, `np.int32`, etc
- `tf.float64`, `tf.float32`, etc: `np.float64`, `np.float32`, etc

4.8.0

Added

- [API] `DatasetBuilder`'s description and citations can be specified in
dedicated `README.md` and `CITATIONS.bib` files, within the dataset package
(see https://www.tensorflow.org/datasets/add_dataset).
- Tags can be associated to Datasets, in the `TAGS.txt` file. For
now, they are only used in the generated documentation.
- [API][Experimental] New `ViewBuilder` to define datasets as transformations
of existing datasets. Also adds `tfds.transform` with functionality to apply
transformations.
- Loggers are also called on `tfds.as_numpy(...)`, base `Logger` class has a
new corresponding method.
- `tfds.core.DatasetBuilder` can have a default limit for the number of
simultaneous downloads. `tfds.download.DownloadConfig` can override it.
- `tfds.features.Audio` supports storing raw audio data for lazy decoding.
- The number of shards can be overridden when preparing a dataset:
`builder.download_and_prepare(download_config=tfds.download.DownloadConfig(num_shards=42))`.
Alternatively, you can configure the min and max shard size if you want TFDS
to compute the number of shards for you, but want to have control over the
shard sizes.

Changed

Deprecated

Removed

Fixed

Security

4.7.0

Added

- [API] Added
[TfDataBuilder](https://www.tensorflow.org/datasets/format_specific_dataset_builders#datasets_based_on_tfdatadataset)
that is handy for storing experimental ad hoc TFDS datasets in notebook-like
environments such that they can be versioned, described, and easily shared
with teammates.
- [API] Added options to create format-specific dataset builders. The new API
now includes a number of NLP-specific builders, such as:
- [CoNNL](https://www.tensorflow.org/datasets/format_specific_dataset_builders#conll)
- [CoNNL-U](https://www.tensorflow.org/datasets/format_specific_dataset_builders#conllu)
- [API] Added `tfds.beam.inc_counter` to reduce `beam.metrics.Metrics.counter`
boilerplate
- [API] Added options to group together existing TFDS datasets into
[dataset collections](https://www.tensorflow.org/datasets/dataset_collections)
and to perform simple operations over them.
- [Documentation] update, specifically:
- [New guide](https://www.tensorflow.org/datasets/format_specific_dataset_builders)
on format-specific dataset builders;
- [New guide](https://www.tensorflow.org/datasets/add_dataset_collection)
on adding new dataset collections to TFDS;
- Updated [TFDS CLI](https://www.tensorflow.org/datasets/cli)
documentation.
- [TFDS CLI] Supports custom config through Json (e.g. `tfds build my_dataset
--config='{"name": "my_custom_config", "description": "Abc"}'`)
- New datasets:
- [conll2003](https://www.tensorflow.org/datasets/catalog/conll2003)
- [universal_dependency 2.10](https://www.tensorflow.org/datasets/catalog/universal_dependency)
- [bucc](https://www.tensorflow.org/datasets/catalog/bucc)
- [i_naturalist2021](https://www.tensorflow.org/datasets/catalog/i_naturalist2021)
- [mtnt](https://www.tensorflow.org/datasets/catalog/mtnt) Machine
Translation of Noisy Text.
- [placesfull](https://www.tensorflow.org/datasets/catalog/placesfull)
- [tatoeba](https://www.tensorflow.org/datasets/catalog/tatoeba)
- [user_libri_audio](https://www.tensorflow.org/datasets/catalog/user_libri_audio)
- [user_libri_text](https://www.tensorflow.org/datasets/catalog/user_libri_text)
- [xtreme_pos](https://www.tensorflow.org/datasets/catalog/xtreme_pos)
- [yahoo_ltrc](https://www.tensorflow.org/datasets/catalog/yahoo_ltrc)
- Updated datasets:
- [C4](https://www.tensorflow.org/datasets/catalog/c4) was updated to
version 3.1.
- [common_voice](https://www.tensorflow.org/datasets/catalog/common_voice)
was updated to a more recent snapshot.
- [wikipedia](https://www.tensorflow.org/datasets/catalog/wikipedia) was
updated with the `20220620` snapshot.
- New dataset collections, such as
[xtreme](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/dataset_collections/xtreme/xtreme.py)
and
[LongT5](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/dataset_collections/longt5/longt5.py)

Changed

- The base `Logger` class expects more information to be passed to the
`as_dataset` method. This should only be relevant to people who have
implemented and registered custom `Logger` class(es).
- You can set `DEFAULT_BUILDER_CONFIG_NAME` in a `DatasetBuilder` to change
the default config if it shouldn't be the first builder config defined in
`BUILDER_CONFIGS`.

Deprecated

Removed

Fixed

- Various datasets
- In Linux, when loading a dataset from a directory that is not your home
(`~`) directory, a new `~` directory is not created in the current directory
(fixes [4117](https://github.com/tensorflow/datasets/issues/4117)).

Security

4.6.0

Added

- Support for community datasets on GCS.
- [API] `tfds.builder_from_directory` and `tfds.builder_from_directories`, see
https://www.tensorflow.org/datasets/external_tfrecord#directly_from_folder.
- [API] Dash ("-") support in split names.
- [API] `file_format` argument to `download_and_prepare` method, allowing user
to specify an alternative file format to store prepared data (e.g.
"riegeli").
- [API] `file_format` to `DatasetInfo` string representation.
- [API] Expose the return value of Beam pipelines. This allows for users to
read the Beam metrics.
- [API] Expose Feature `tf_example_spec` to public.
- [API] `doc` kwarg on `Feature`s, to describe a feature.
- [Documentation] Features description is shown on
[TFDS Catalog](https://www.tensorflow.org/datasets/catalog/overview).
- [Documentation] More metadata about HuggingFace datasets in TFDS catalog.
- [Performance] Parallel load of metadata files.
- [Testing] TFDS tests are now run using GitHub actions - misc improvements
such as caching and sharding.
- [Testing] Improvements to MockFs.
- New datasets.

Changed

- [API] `num_shards` is now optional in the shard name.

Removed

- TFDS pathlib API, migrated to a self-contained `etils.epath` (see
https://github.com/google/etils).

Fixed

- Various datasets.
- Dataset builders that are defined adhoc (e.g. in Colab).
- Better `DatasetNotFoundError` messages.
- Don't set `deterministic` on a global level but locally in interleave, so it
only apply to interleave and not all transformations.
- Google drive downloader.

4.5.2

Added

- [API] `split=tfds.split_for_jax_process('train')` (alias of
`tfds.even_splits('train', n=jax.process_count())[jax.process_index()]`).
- [Documentation] update.

Fixed

- Import bug on Windows (3709).

Page 2 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.