Tensorflow-datasets

Latest version: v4.9.4

Safety actively analyzes 629723 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 6

4.5.0

Added

- [API] Better split API:
- Splits can be selected using shards: `split='train[3shard]'`.
- Underscore supported in numbers for better readability:
`split='train[:500_000]'`.
- Select the union of all splits with `split='all'`.
- [`tfds.even_splits`](https://www.tensorflow.org/datasets/splits#tfdseven_splits_multi-host_training)
is more precise and flexible:
- Return splits exactly of the same size when passed
`tfds.even_splits('train', n=3, drop_remainder=True)`.
- Works on subsplits `tfds.even_splits('train[:75%]', n=3)` or even
nested.
- Can be composed with other splits: `tfds.even_splits('train', n=3)[0] +
'test'`.
- [API] `serialize_example` / `deserialize_example` methods on features to
encode/decode example to proto: `example_bytes =
features.serialize_example(example_data)`.
- [API] `Audio` feature now supports `encoding='zlib'` for better compression.
- [API] Features specs are exposed in proto for better compatibility with
other languages.
- [API] Create beam pipeline using TFDS as input with
[tfds.beam.ReadFromTFDS](https://www.tensorflow.org/datasets/api_docs/python/tfds/beam/ReadFromTFDS).
- [API] Support setting the file formats in `tfds build
--file_format=tfrecord`.
- [API] Typing annotations exposed in `tfds.typing`.
- [API] `tfds.ReadConfig` has a new `assert_cardinality=False` argument to
disable cardinality.
- [API] `tfds.display_progress_bar(True)` for functional control.
- [API] DatasetInfo exposes `.release_notes`.
- Support for huge number of shards (>99999).
- [Performance] Faster dataset generation (using tfrecords).
- [Testing] Mock dataset now supports nested datasets
- [Testing] Customize the number of sub examples
- [Documentation] Community datasets:
https://www.tensorflow.org/datasets/community_catalog/overview.
- [Documentation]
[Guide on TFDS and determinism](https://www.tensorflow.org/datasets/determinism).
- [[RLDS](https://github.com/google-research/rlds)] Support for nested
datasets features.
- [[RLDS](https://github.com/google-research/rlds)] New datasets: Robomimic,
D4RL Ant Maze, RLU Real World RL, and RLU Atari with ordered episodes.
- New datasets.

Deprecated

- Python 3.6 support: this is the last version of TFDS supporting Python 3.6.
Future versions will use Python 3.7.

Fixed

- Misc bugs.

4.4.0

Added

- [API]
[`PartialDecoding` support](https://www.tensorflow.org/datasets/decode#only_decode_a_sub-set_of_the_features),
to decode only a subset of the features (for performances).
- [API] `tfds.features.LabeledImage` for semantic segmentation (like image but
with additional `info.features['image_label'].name` label metadata).
- [API] float32 support for `tfds.features.Image` (e.g. for depth map).
- [API] Loading datasets from files now supports custom
`tfds.features.FeatureConnector`.
- [API] All FeatureConnector can now have a `None` dimension anywhere
(previously restricted to the first position).
- [API] `tfds.features.Tensor()` can have arbitrary number of dynamic
dimension (`Tensor(..., shape=(None, None, 3, None)`)).
- [API] `tfds.features.Tensor` can now be serialised as bytes, instead of
float/int values (to allow better compression): `Tensor(...,
encoding='zlib')`.
- [API] Support for datasets with `None` in `tfds.as_numpy`.
- Script to add TFDS metadata files to existing TF-record (see
[doc](https://www.tensorflow.org/datasets/external_tfrecord)).
- [TESTING] `tfds.testing.mock_data` now supports:
- non-scalar tensors with dtype `tf.string`;
- `builder_from_files` and path-based community datasets.
- [Documentation] Catalog now exposes links to
[KnowYourData visualisations](https://knowyourdata-tfds.withgoogle.com/).
- [Documentation] Guide on
[common implementation gotchas](https://www.tensorflow.org/datasets/common_gotchas).
- Many new reinforcement learning datasets. Changed
- [API] Dataset generated with `disable_shuffling=True` are now read in
generation order.

Fixed

- File format automatically restored (for datasets generated with
`tfds.builder(..., file_format=)`).
- Dynamically set number of worker threads during extraction.
- Update progress bar during download even if downloads are cached.
- Misc bug fixes.

4.3.0

Added

- [API] `dataset.info.splits['train'].num_shards` to expose the number of
shards to the user.
- [API] `tfds.features.Dataset` to have a field containing sub-datasets (e.g.
used in RL datasets).
- [API] dtype and `tf.uint16` support in `tfds.features.Video`.
- [API] `DatasetInfo.license` field to add redistributing information.
- [API] `.copy`, `.format` methods to GPath objects.
- [Performances] `tfds.benchmark(ds)` (compatible with any iterator, not just
`tf.data`, better colab representation).
- [Performances] Faster `tfds.as_numpy()` (avoid extra `tf.Tensor` <>
`np.array` copy).
- [Testing] Support for custom `BuilderConfig` in `DatasetBuilderTest`.
- [Testing] `DatasetBuilderTest` now has a `dummy_data` class property which
can be used in `setUpClass`.
- [Testing] `add_tfds_id` and cardinality support to `tfds.testing.mock_data`.
- [Documentation] Better `tfds.as_dataframe` visualisation (Sequence, ragged
tensor, semantic masks with `use_colormap`).
- [Experimental] Community datasets support. To allow dynamically import
datasets defined outside the TFDS repository.
- [Experimental] Hugging-face compatibility wrapper to use Hugging-face
datasets directly in TFDS.
- [Experimental] Riegeli format support.
- [Experimental] `DatasetInfo.disable_shuffling` to force examples to be read
in generation order.
- New datasets.

Fixed

- Many bugs.

4.2.0

Added

- [CLI] `tfds build` to the CLI. See
[documentation](https://www.tensorflow.org/datasets/cli#tfds_build_download_and_prepare_a_dataset).
- [API] `tfds.features.Dataset` to represent nested datasets.
- [API] `tfds.ReadConfig(add_tfds_id=True)` to add a unique id to the example
`ex['tfds_id']` (e.g. `b'train.tfrecord-00012-of-01024__123'`).
- [API] `num_parallel_calls` option to `tfds.ReadConfig` to overwrite to
default `AUTOTUNE` option.
- [API] `tfds.ImageFolder` support for `tfds.decode.SkipDecoder`.
- [API] Multichannel audio support to `tfds.features.Audio`.
- [API] `try_gcs` to `tfds.builder(..., try_gcs=True)`
- Better `tfds.as_dataframe` visualization (ffmpeg video if installed,
bounding boxes,...).
- [TESTING] Allow `max_examples_per_splits=0` in `tfds build
--max_examples_per_splits=0` to test `_split_generators` only (without
`_generate_examples`).
- New datasets.

Changed

- [API] DownloadManager now returns
[Pathlib-like](https://docs.python.org/3/library/pathlib.html#basic-use)
objects.
- [API] Simpler `BuilderConfig` definition: class `VERSION` and
`RELEASE_NOTES` are applied to all `BuilderConfig`. Config description is
now optional.
- [API] To guarantee better deterministic, new validations are performed on
the keys when creating a dataset (to avoid filenames as keys
(non-deterministic) and restrict key to `str`, `bytes` and `int`). New
errors likely indicates an issue in the dataset implementation.
- [API] `tfds.core.benchmark` now returns a `pd.DataFrame` (instead of a
`dict`).
- [API] `tfds.units` is not visible anymore from the public API.
- Datasets updates.

Deprecated

Removed

- Configs for all text datasets. Only plain text version is kept. For example:
`multi_nli/plain_text` -> `multi_nli`.

Fixed

- [API] Datasets returned by `tfds.as_numpy` are compatible with `len(ds)`.
- Support 0-len sequence with images of dynamic shape (Fix 2616).
- Progression bar correctly updated when copying files.
- Better debugging and error message (e.g. human readable size,...).
- Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM
visual artifacts, GCS crash on windows, re-download when checksums updated,
...).

4.1.0

Added

- It is now easier to create datasets outside TFDS repository (see our updated
[dataset creation guide](https://www.tensorflow.org/datasets/add_dataset)).
- When generating a dataset, if download fails for any reason, it is now
possible to manually download the data. See
[doc](https://www.tensorflow.org/datasets/overview#manual_download_if_download_fails).
- `tfds.core.as_path` to create pathlib.Path-like objects compatible with GCS
(e.g. `tfds.core.as_path('gs://my-bucket/labels.csv').read_text()`).
- `verify_ssl=` option to `tfds.download.DownloadConfig` to disable SSH
certificate during download.
- New datasets. Changed
- All dataset inherit from `tfds.core.GeneratorBasedBuilder`. Converting a
dataset to beam now only require changing `_generate_examples` (see
[example and doc](https://www.tensorflow.org/datasets/beam_datasets#instructions)).
- `_split_generators` should now returns `{'split_name':
self._generate_examples(), ...}` (but current datasets are backward
compatible).
- Better `pathlib.Path`, `os.PathLike` compatibility: `dl_manager.manual_dir`
now returns a pathlib-Like object. Example: `python text =
(dl_manager.manual_dir / 'downloaded-text.txt').read_text()` Note: Other
`dl_manager.download`, `.extract`,... will return pathlib-like objects in
future versions. `FeatureConnector`,... and most functions should accept
`PathLike` objects. Let us know if some functions you need are missing.
- `--record_checksums` now assume the new dataset-as-folder model.

Deprecated

- `tfds.core.SplitGenerator`, `tfds.core.BeamBasedBuilder` are deprecated and
will be removed in a future version.

Fixed

- `BuilderConfig` are now compatible with Beam datasets 2348
- `tfds.features.Images` can accept encoded `bytes` images directly (useful
when used with `img_name, img_bytes =
dl_manager.iter_archive('images.zip')`).
- Doc API now show deprecated methods, abstract methods to overwrite are now
documented.
- You can generate `imagenet2012` with only a single split (e.g. only the
validation data). Other split will be skipped if not present.

4.0.1

Fixed

- `tfds.load` when generation code isn't present.
- GCS compatibility.

Page 3 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.