Kartothek

Latest version: v5.3.0

Safety actively analyzes 627364 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 7

4.0.1

============================

* Fixed dataset corruption after updates when table names other than "table" are used (445).

4.0.0

============================

This is a major release of kartothek with breaking API changes.

* Removal of complex user input (see gh427)
* Removal of multi table feature
* Removal of `kartothek.io.merge` module
* class :class:`~kartothek.core.dataset.DatasetMetadata` now has an attribute called `schema` which replaces the previous attribute `table_meta` and returns only a single schema
* All outputs which previously returned a sequence of dictionaries where each key-value pair would correspond to a table-data pair now returns only one :class:`pandas.DataFrame`
* All read pipelines will now automatically infer the table to read such that it is no longer necessary to provide `table` or `table_name` as an input argument
* All writing pipelines which previously supported a complex user input type now expose an argument `table_name` which can be used to continue usage of legacy datasets (i.e. datasets with an intrinsic, non-trivial table name). This usage is discouraged and we recommend users to migrate to a default table name (i.e. leave it None / `table`)
* All pipelines which previously accepted an argument `tables` to select the subset of tables to load no longer accept this keyword. Instead the to-be-loaded table will be inferred
* Trying to read a multi-tabled dataset will now cause an exception telling users that this is no longer supported with kartothek 4.0
* The dict schema for :meth:`~kartothek.core.dataset.DatasetMetadataBase.to_dict` and :meth:`~kartothek.core.dataset.DatasetMetadata.from_dict` changed replacing a dictionary in `table_meta` with the simple `schema`
* All pipeline arguments which previously accepted a dictionary of sequences to describe a table specific subset of columns now accept plain sequences (e.g. `columns`, `categoricals`)
* Remove the following list of deprecated arguments for io pipelines
* label_filter
* central_partition_metadata
* load_dynamic_metadata
* load_dataset_metadata
* concat_partitions_on_primary_index
* Remove `output_dataset_uuid` and `df_serializer` from :func:`kartothek.io.eager.commit_dataset` since these arguments didn't have any effect
* Remove `metadata`, `df_serializer`, `overwrite`, `metadata_merger` from :func:`kartothek.io.eager.write_single_partition`
* :func:`~kartothek.io.eager.store_dataframes_as_dataset` now requires a list as an input
* Default value for argument `date_as_object` is now universally set to ``True``. The behaviour for `False` will be deprecated and removed in the next major release
* No longer allow to pass `delete_scope` as a delayed object to :func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf`
* :func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf` and :func:`~kartothek.io.dask.dataframe.store_dataset_from_ddf` now return a `dd.core.Scalar` object. This enables all `dask.DataFrame` graph optimizations by default.
* Remove argument `table_name` from :func:`~kartothek.io.dask.dataframe.collect_dataset_metadata`

3.20.0

===========================

This will be the final release in the 3.X series. Please ensure your existing
codebase does not raise any DeprecationWarning from kartothek and migrate your
import paths ahead of time to the new :mod:`kartothek.api` modules to ensure a
smooth migration to 4.X.

* Introduce :mod:`kartothek.api` as the public definition of the API. See also :doc:`versioning`.
* Introduce `DatasetMetadataBase.schema` to prepare deprecation of `table_meta`
* :func:`~kartothek.io.eager.read_dataset_as_dataframes` and
:func:`~kartothek.io.iter.read_dataset_as_dataframes__iterator` now correctly return
categoricals as requested for misaligned categories.

3.19.1

===========================

* Allow ``pyarrow==3`` as a dependency.
* Fix a bug in :func:`~kartothek.io_components.utils.align_categories` for dataframes
with missings and of non-categorical dtype.
* Fix an issue with the cube index validation introduced in v3.19.0 (413).

3.19.0

===========================

* Fix an issue where updates on cubes or updates on datatsets using
dask.dataframe might not update all secondary indices, resulting in a corrupt
state after the update
* Expose compression type and row group chunk size in Cube interface via optional
parameter of type :class:`~kartothek.serialization.ParquetSerializer`.
* Add retries to :func:`~kartothek.serialization._parquet.ParquetSerializer.restore_dataframe`
IOErrors on long running ktk + dask tasks have been observed. Until the root cause is fixed,
the serialization is retried to gain more stability.

3.18.0

===========================

* Add ``cube.suppress_index_on`` to switch off the default index creation for dimension columns
* Fixed the import issue of zstd module for `kartothek.core _zmsgpack`.
* Fix a bug in `kartothek.io_components.read.dispatch_metapartitions_from_factory` where
`dispatch_by=[]` would be treated like `dispatch_by=None`, not merging all dataset partitions into
a single partitions.

Page 2 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.