Kartothek

Latest version: v5.3.0

Safety actively analyzes 627821 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

3.6.2

==========================

Improvements
^^^^^^^^^^^^

* Add more explicit typing to :mod:`kartothek.io.eager`.

Bug fixes
^^^^^^^^^
* Fix an issue where :func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf` would create a column named "_KTK_HASH_BUCKET" in the dataset

3.6.1

==========================

Bug fixes
^^^^^^^^^
* Fix a regression introduced in 3.5.0 where predicates which allow multiple
values for a field would generate duplicates

3.6.0

==========================

New functionality
^^^^^^^^^^^^^^^^^
- The partition on shuffle algorithm in :func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf` now supports
producing deterministic buckets based on hashed input data.

Bug fixes
^^^^^^^^^
- Fix addition of bogus index columns to Parquet files when using `sort_partitions_by`.
- Fix bug where ``partition_on`` in write path drops empty DataFrames and can lead to datasets without tables.

3.5.1

==========================
- Fix potential ``pyarrow.lib.ArrowNotImplementedError`` when trying to store or pickle empty
:class:`~kartothek.core.index.ExplicitSecondaryIndex` objects
- Fix pickling of :class:`~kartothek.core.index.ExplicitSecondaryIndex` unloaded in
`dispatch_metapartitions_from_factory`

3.5.0

==========================

New functionality
^^^^^^^^^^^^^^^^^
- Add support for pyarrow 0.15.0
- Additional functions in `kartothek.serialization` module for dealing with predicates
* :func:`~kartothek.serialization.check_predicates`
* :func:`~kartothek.serialization.filter_predicates_by_column`
* :func:`~kartothek.serialization.columns_in_predicates`
- Added available types for type annotation when dealing with predicates
* `~kartothek.serialization.PredicatesType`
* `~kartothek.serialization.ConjunctionType`
* `~kartothek.serialization.LiteralType`
- Make ``kartothek.io.*read_table*`` methods use default table name if unspecified
- ``MetaPartition.parse_input_to_metapartition`` accepts dicts and list of tuples equivalents as ``obj`` input
- Added `secondary_indices` as a default argument to the `write` pipelines

Bug fixes
^^^^^^^^^
- Input to ``normalize_args`` is properly normalized to ``list``
- ``MetaPartition.load_dataframes`` now raises if table in ``columns`` argument doesn't exist
- require ``urlquote>=1.1.0`` (where ``urlquote.quoting`` was introduced)
- Improve performance for some cases where predicates are used with the `in` operator.
- Correctly preserve :class:`~kartothek.core.index.ExplicitSecondaryIndex` dtype when index is empty
- Fixed DeprecationWarning in pandas ``CategoricalDtype``
- Fixed broken docstring for `store_dataframes_as_dataset`
- Internal operations no longer perform schema validations. This will improve
performance for batched partition operations (e.g. `partition_on`) but will
defer the validation in case of inconsistencies to the final commit. Exception
messages will be less verbose in these cases as before.
- Fix an issue where an empty dataframe of a partition in a multi-table dataset
would raise a schema validation exception
- Fix an issue where the `dispatch_by` keyword would disable partition pruning
- Creating dataset with non existing columns as explicit index to raise a ValueError

Breaking changes
^^^^^^^^^^^^^^^^
- Remove support for pyarrow < 0.13.0
- Move the docs module from `io_components` to `core`

3.4.0

==========================
- Add support for pyarrow 0.14.1
- Use urlquote for faster quoting/unquoting

Page 6 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.