Tensorflow-datasets

Latest version: v4.9.4

Safety actively analyzes 629639 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 6

4.0.0

Added

- Dataset-as-folder: Dataset can now be self-contained module in a folder with
checksums, dummy data,... This simplify implementing datasets outside the
TFDS repository.
- `tfds.load` can now load dataset without using the generation class. So
`tfds.load('my_dataset:1.0.0')` can work even if `MyDataset.VERSION ==
'2.0.0'` (See 2493).
- TFDS CLI (see https://www.tensorflow.org/datasets/cli for detail).
- `tfds.testing.mock_data` does not require metadata files anymore!
- `tfds.as_dataframe(ds, ds_info)` with custom visualisation
([example](https://www.tensorflow.org/datasets/overview#tfdsas_dataframe)).
- `tfds.even_splits` to generate subsplits (e.g. `tfds.even_splits('train',
n=3) == ['train[0%:33%]', 'train[33%:67%]', ...]`.
- `DatasetBuilder.RELEASE_NOTES` property.
- `tfds.features.Image` now supports PNG with 4-channels.
- `tfds.ImageFolder` now supports custom shape, dtype.
- Downloaded URLs are available through `MyDataset.url_infos`.
- `skip_prefetch` option to `tfds.ReadConfig`.
- `as_supervised=True` support for `tfds.show_examples`, `tfds.as_dataframe`.
- tfds.features can now be saved/loaded, you may have to overwrite
[FeatureConnector.from_json_content](https://www.tensorflow.org/datasets/api_docs/python/tfds/features/FeatureConnector?version=nightly#from_json_content)
and `FeatureConnector.to_json_content` to support this feature.
- Script to detect dead-urls.
- New datasets.

Changed

- `tfds.as_numpy()` now returns an iterable which can be iterated multiple
times. To migrate: `next(ds)` -> `next(iter(ds))`.
- Rename `tfds.features.text.Xyz` -> `tfds.deprecated.text.Xyz`.

Removed

- `DatasetBuilder.IN_DEVELOPMENT` property.
- `tfds.core.disallow_positional_args` (should use Py3 `*,` instead).
- Testing against TF 1.15. Requires Python 3.6.8+.

Fixed

- Better archive extension detection for `dl_manager.download_and_extract`.
- Fix `tfds.__version__` in TFDS nightly to be PEP440 compliant
- Fix crash when GCS not available.
- Improved open-source workflow, contributor guide, documentation.
- Many other internal cleanups, bugs, dead code removal, py2->py3 cleanup,
pytype annotations,...
- Datasets updates.

3.2.1

Fixed

- Issue with GCS on Windows.

3.2.0

Added

- [API] `tfds.ImageFolder` and `tfds.TranslateFolder` to easily create custom
datasets with your custom data.
- [API] `tfds.ReadConfig(input_context=)` to shard dataset, for better
multi-worker compatibility (1426).
- [API] The default `data_dir` can be controlled by the `TFDS_DATA_DIR`
environment variable.
- [API] Better usability when developing datasets outside TFDS: downloads are
always cached, checksums are optional.
- Scripts to help deployment/documentation (Generate catalog documentation,
export all metadata files, ...).
- [Documentation] Catalog display images
([example](https://www.tensorflow.org/datasets/catalog/sun397#sun397standard-part2-120k)).
- [Documentation] Catalog shows which dataset have been recently added and are
only available in `tfds-nightly`
<span class="material-icons">nights_stay</span>.
- [API] `tfds.show_statistics(ds_info)` to display
[FACETS OVERVIEW](https://pair-code.github.io/facets/). Note: This require
the dataset to have been generated with the statistics.

Deprecated

- `tfds.features.text` encoding API. Please use
[tensorflow_text](https://www.tensorflow.org/tutorials/tensorflow_text/intro)
instead.

Removed

- `tfds.load('image_label_folder')` in favor of the more user-friendly
`tfds.ImageFolder`.

Fixed

- Fix deterministic example order on Windows when path was used as key (this
only impacts a few datasets). Now example order should be the same on all
platforms.
- Misc performances improvements for both generation and reading (e.g. use
`__slot__`, fix parallelisation bug in `tf.data.TFRecordReader`, ...).
- Misc fixes (typo, types annotations, better error messages, fixing dead
links, better windows compatibility, ...).

3.1.0

Added

- [API] `tfds.builder_cls(name)` to access a DatasetBuilder class by name
- [API] `info.split['train'].filenames` for access to the tf-record files.
- [API] `tfds.core.add_data_dir` to register an additional data dir.
- [Testing] Support for custom decoders in `tfds.testing.mock_data`.
- [Documentation] Shows which datasets are only present in `tfds-nightly`.
- [Documentation] Display images for supported datasets.

Changed

- Rename `tfds.core.NamedSplit`, `tfds.core.SplitBase` -> `tfds.Split`. Now
`tfds.Split.TRAIN`,... are instance of `tfds.Split`.
- Rename `interleave_parallel_reads` -> `interleave_cycle_length` for
`tfds.ReadConfig`.
- Invert ds, ds_info argument orders for `tfds.show_examples`.

Deprecated

- `tfds.features.text` encoding API. Please use `tensorflow_text` instead.

Removed

- `num_shards` argument from `tfds.core.SplitGenerator`. This argument was
ignored as shards are automatically computed.
- Most `ds.with_options` which where applied by TFDS. Now use `tf.data`
default.

Fixed

- Better error messages.
- Windows compatibility.

3.0.0

Added

- `DownloadManager` is now pickable (can be used inside Beam pipelines).
- `tfds.features.Audio`:
- Support float as returned value.
- Expose sample_rate through `info.features['audio'].sample_rate`.
- Support for encoding audio features from file objects.
- More datasets.

Changed

- New `image_classification` section. Some datasets have been move there from
`images`.
- `DownloadConfig` does not append the dataset name anymore (manual data
should be in `<manual_dir>/` instead of `<manual_dir>/<dataset_name>/`).
- Tests now check that all `dl_manager.download` urls has registered
checksums. To opt-out, add `SKIP_CHECKSUMS = True` to your
`DatasetBuilderTestCase`.
- `tfds.load` now always returns `tf.compat.v2.Dataset`. If you're using still
using `tf.compat.v1`:
- Use `tf.compat.v1.data.make_one_shot_iterator(ds)` rather than
`ds.make_one_shot_iterator()`.
- Use `isinstance(ds, tf.compat.v2.Dataset)` instead of `isinstance(ds,
tf.data.Dataset)`.

Deprecated

- The `tfds.features.text` encoding API is deprecated. Please use
[tensorflow_text](https://www.tensorflow.org/tutorials/tensorflow_text/intro)
instead.
- `num_shards` argument of `tfds.core.SplitGenerator` is currently ignored and
will be removed in the next version.

Removed

- Legacy mode `tfds.experiment.S3` has been removed
- `in_memory` argument has been removed from `as_dataset`/`tfds.load` (small
datasets are now auto-cached).
- `tfds.Split.ALL`.

Fixed

- Various bugs, better error messages, documentation improvements.

2.1.0

Added

- Datasets expose `info.dataset_size` and `info.download_size`.
- [Auto-caching small datasets](https://www.tensorflow.org/datasets/performances#auto-caching).
- Datasets expose their cardinality `num_examples =
tf.data.experimental.cardinality(ds)` (Requires tf-nightly or TF >= 2.2.0)
- Get the number of example in a sub-splits with:
`info.splits['train[70%:]'].num_examples`

Changes

- All datasets generated with 2.1.0 cannot be loaded with previous version
(previous datasets can be read with `2.1.0` however).

Deprecated

- `in_memory` argument is deprecated and will be removed in a future version.

Page 4 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.