Kartothek

Latest version: v5.3.0

Safety actively analyzes 627248 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 7

3.10.0

===========================

Improvements
^^^^^^^^^^^^
* Dispatch performance improved for large datasets including metadata
* Introduction of ``dispatch_metadata`` kwarg to metapartitions read pipelines
to allow for transition for future breaking release.

Bug fixes
^^^^^^^^^

* Ensure that the empty (sentinel) DataFrame used in :func:`~kartothek.io.eager.read_table`
also has the correct behaviour when using the ``categoricals`` argument.


Breaking changes in ``io_components.read``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* The ``dispatch_metapartitions`` and ``dispatch_metapartitions_from_factory``
will no longer attach index and metadata information to the created MP
instances, unless explicitly requested.

3.9.0

==========================

Improvements
^^^^^^^^^^^^
* Arrow 0.17.X support
* Significant performance improvements for shuffle operations in
:func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf`
for large dask.DataFrames with many payload columns by using in-memory
compression during the shuffle operation.
* Allow calling :func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf`
without `partition_on` when `shuffle=True`.
* :func:`~kartothek.io.dask.dataframe.read_dataset_as_ddf` supports kwarg ``dispatch_by``
to control the internal partitioning structure when creating a dataframe.
* :func:`~kartothek.io.dask.dataframe.read_dataset_as_ddf` and :func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf`
now allow the keyword ``table`` to be optional, using the default SINGLE_TABLE identifier.
(recommended since the multi table dataset support is in sunset).

3.8.2

==========================

Improvements
^^^^^^^^^^^^

* Read performance improved for, especially for partitioned datasets and queries with empty payload columns.

Bug fixes
^^^^^^^^^
* GH262: Raise an exception when trying to partition on a column with null values to prevent silent data loss
* Fix multiple index creation issues (cutting data, crashing) for ``uint`` data
* Fix index update issues for some types resulting in ``TypeError: Trying to update an index with different types...``
messages.
* Fix issues where index creation with empty partitions can lead to ``ValueError: Trying to create non-typesafe index``

3.8.1

==========================

Improvements
^^^^^^^^^^^^

* Only fix column odering when restoring ``DataFrame`` if the ordering is incorrect.

Bug fixes
^^^^^^^^^
* GH248 Fix an issue causing a ValueError to be raised when using `dask_index_on` on non-integer columns
* GH255 Fix an issue causing the python interpreter to shut down when reading an
empty file (see also https://issues.apache.org/jira/browse/ARROW-8142)

3.8.0

==========================

Improvements
^^^^^^^^^^^^

* Add keyword argument `dask_index_on` which reconstructs a dask index from an kartothek index when loading the dataset
* Add method :func:`~kartothek.core.index.IndexBase.observed_values` which returns an array of all observed values of the index column
* Updated and improved documentation w.r.t. guides and API documentation

Bug fixes
^^^^^^^^^
* GH227 Fix a Type error when loading categorical data in dask without
specifying it explicitly
* No longer trigger the SettingWithCopyWarning when using bucketing
* GH228 Fix an issue where empty header creation from a pyarrow schema would not
normalize the schema which causes schema violations during update.
* Fix an issue where :func:`~kartothek.io.eager.create_empty_dataset_header`
would not accept a store factory.

3.7.0

==========================

Improvements
^^^^^^^^^^^^

* Support for pyarrow 0.16.0
* Decrease scheduling overhead for dask based pipelines
* Performance improvements for categorical data when using pyarrow>=0.15.0
* Dask is now able to calculate better size estimates for the following classes:
* :class:`~kartothek.core.dataset.DatasetMetadata`
* :class:`~kartothek.core.factory.DatasetFactory`
* :class:`~kartothek.io_components.metapartition.MetaPartition`
* :class:`~kartothek.core.index.ExplicitSecondaryIndex`
* :class:`~kartothek.core.index.PartitionIndex`
* :class:`~kartothek.core.partition.Partition`
* :class:`~kartothek.core.common_metadata.SchemaWrapper`

Page 5 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.