Sdv

Latest version: v1.13.1

Safety actively analyzes 630656 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 10

1.9.0

This release makes a number of improvements. It introduces a new concept to the metadata known as column relationships! Column relationships can be used to define when certain groups of columns in a table should be treated as a special concept (eg. address). You can add a column relationship by using the new `add_column_relationship` method. The metadata detection was also improved by allowing semantic sdtypes (eg. 'email', 'phone_number') to be detected as primary keys.

This release also patches some bugs. An issue messing up the likelihood matching in the `HMASynthesizer` was resolved. The `CTGANSynthesizer` no longer fails when using the `FixedCombinations` constraint. The `Inequality` constraint was also patched to handle datetimes better.

Deprecations

* The `set_address_columns` method is deprecated in favor of `add_column_relationship`.

New Features

* Improve error messages for composite keys - Issue [1684](https://github.com/sdv-dev/SDV/issues/1684) by frances-h
* Add column relationship validation to single table metadata - Issue [1698](https://github.com/sdv-dev/SDV/issues/1698) by frances-h
* Add add_column_relationship method to single table metadata - Issue [1699](https://github.com/sdv-dev/SDV/issues/1699) by frances-h
* Make synthesizers work with column_relationships - Issue [1700](https://github.com/sdv-dev/SDV/issues/1700) by R-Palazzo
* Metadata auto-detection should find primary keys of semantic sdtypes - Issue [1724](https://github.com/sdv-dev/SDV/issues/1724) by fealho

Bugs Fixed

* InvalidDataError for Inequality constraint (even though data is valid) - Issue [1692](https://github.com/sdv-dev/SDV/issues/1692) by fealho
* `BaseIndependentSampler` crashes because it tries to cast id columns - Issue [1712](https://github.com/sdv-dev/SDV/issues/1712) by pvk-developer
* KeyError in `CTGANSynthesizer` when applying `FixedCombinations` constraint - Issue [1717](https://github.com/sdv-dev/SDV/issues/1717) by pvk-developer
* Fix _get_likelihoods not generating likelihood values - Issue [1720](https://github.com/sdv-dev/SDV/pull/1720) by frances-h

1.8.0

This release adds support for the new Diagnostic Report from SDMetrics. This report calculates scores for three basic but important properties of your data: data validity, data structure and in the multi table case, relationship validity. Data validity checks that the columns of your data are valid (eg. correct range or values). Data structure makes sure the synthetic data has the correct columns. Relationship validity checks to make sure key references are correct and the cardinality is within ranges seen in the real data.

Additionally, a few bugs were fixed and functionality was improved around synthesizers. It is now possible to access the loss values for the `TVAESynthesizer` and `CTGANSynthesizer` by using the `get_loss_values` method. The `get_parameters` method is now more detailed and returns all the parameters used to make a synthesizer. The metadata is now capable of detecting some common pii sdtypes. Finally, a bug that made every parent row generated by the `HMASynthesizer` have at least one child row was patched. This should improve cardinality.

Maintenance

* Address `SettingWithCopyWarning` (HMASynthesizer) - Issue [1557](https://github.com/sdv-dev/SDV/issues/1557) by pvk-developer
* Bump SDMetrics version - Issue [1702](https://github.com/sdv-dev/SDV/issues/1702) by amontanez24

New Features

* Allow me to access loss values for GAN-based synthesizers - Issue [1671](https://github.com/sdv-dev/SDV/issues/1671) by frances-h
* Create a unified `get_parameters` method for all multi-table synthesizers - Issue [1674](https://github.com/sdv-dev/SDV/issues/1674) by frances-h
* Set credentials key as variables - Issue [1680](https://github.com/sdv-dev/SDV/issues/1680) by R-Palazzo
* Identifying PII Sdtypes in Metadata - Issue [1683](https://github.com/sdv-dev/SDV/issues/1683) by R-Palazzo
* Make SDV compatible with the latest SDMetrics - Issue [1687](https://github.com/sdv-dev/SDV/issues/1687) by fealho
* SingleTablePreset uses FrequencyEncoder - Issue [1695](https://github.com/sdv-dev/SDV/issues/1695) by fealho

Bugs Fixed

* HMASynthesizer creates too much synthetic data (always creates a child for every parent row) - Issue [1673](https://github.com/sdv-dev/SDV/issues/1673) by frances-h

1.7.0

This release adds an alert to the `CTGANSynthesizer` during preprocessing. The alert informs the user if the fitting of the synthesizer is likely to be slow on their schema. Additionally, it is now possible to enforce that sampled datetime values stay within the range of the fitted data!

This release also makes internal changes to support address data in SDV Enterprise.

New Features

* Add set_address_columns method - Issue [1593](https://github.com/sdv-dev/SDV/issues/1593) by R-Palazzo
* Update_transformers should raise error on address columns - Issue [1594](https://github.com/sdv-dev/SDV/issues/1594) by R-Palazzo
* add_constraints should raise error on address columns - Issue [1595](https://github.com/sdv-dev/SDV/issues/1595) by R-Palazzo
* Print alert if CTGANSynthesizer is likely to be slow - Issue [1658](https://github.com/sdv-dev/SDV/issues/1658) by fealho
* Set enforce_min_max_values to True for datetime transformers - Issue [1676](https://github.com/sdv-dev/SDV/issues/1676) by R-Palazzo

Bugs Fixed

* Unable to visualize metadata (`Error: bad label format` and `CalledProcessError`) - Issue [1625](https://github.com/sdv-dev/SDV/issues/1625) by fealho
* Can't set address columns after fitting - Issue [1661](https://github.com/sdv-dev/SDV/issues/1661) by R-Palazzo

1.6.0

This release improves user messaging in multiple ways. The most notable is that users will now see an alert if the `HMASynthesizer` is likely to be slow for their data's schema. Additionally, the logger messaging for constraints and the error messaging when setting distributions on non-parametric models was made more detailed.

The visualization plots in the `sdv.evaluation` sub-package all got a new parameter called `plot_type`, allowing the users to specify the plot type to use if the one being inferred is not useful. The `sdv.datasets.local.load_csvs` method now has a parameter called `read_csv_parameters`, that allow users to specify how the csvs should be read during loading. The same change was also made to the `sdv.metadata.multi_table.detect_table_from_csv`, `sdv.metadata.multi_table.detect_from_csvs` and `sdv.metadata.single_table.detect_from_csv` methods.

Multiple bugs were resolved including one that caused new categories to be created during the sample step of `CTGANSynthesizer`.

New Features

* Improve debug messages when a constraint falls back to reject sampling approach - Issue [1478](https://github.com/sdv-dev/SDV/issues/1478) by amontanez24
* Constraints should work with timezone-aware datetime columns - Issue [1576](https://github.com/sdv-dev/SDV/issues/1576) by fealho
* Better error message when trying to get distributions from non-parametric models - PR [1633](https://github.com/sdv-dev/SDV/pull/1633) by frances-h
* Add options to read CSV files - Issue [1644](https://github.com/sdv-dev/SDV/issues/1644) by lajohn4747
* Print alert if HMASynthesizer is likely to be slow - Issue [1646](https://github.com/sdv-dev/SDV/issues/1646) by lajohn4747
* Make SDV compatible with SDMetrics 0.12.1 - Issue [1650](https://github.com/sdv-dev/SDV/issues/1650) by pvk-developer
* Make function to estimate number of columns CTGAN produces - Issue [1657](https://github.com/sdv-dev/SDV/issues/1657) by fealho

Bugs Fixed

* In get_available_demos, the num_tables column should be an int - Issue [1420](https://github.com/sdv-dev/SDV/issues/1420) by lajohn4747
* AttributeError when using specific locale strings (es_AR, fr_BE) - Issue [1439](https://github.com/sdv-dev/SDV/issues/1439) by lajohn4747
* Confusing error when passing in an empty dataframe (with constraints) - Issue [1455](https://github.com/sdv-dev/SDV/issues/1455) by lajohn4747
* HMASynthesizer: Better error message for learned distributions (misleading fit error) - Issue [1579](https://github.com/sdv-dev/SDV/issues/1579) by fealho
* Fix tests in SDV after update in RDT 1.7.1 - Issue [1638](https://github.com/sdv-dev/SDV/issues/1638) by lajohn4747
* CTGAN sometimes creates new categories (int data) - Issue [1647](https://github.com/sdv-dev/SDV/issues/1647) by pvk-developer
* CTGAN sometimes creates new categories (object data) - Issue [1648](https://github.com/sdv-dev/SDV/issues/1648) by pvk-developer
* Better error message if I provide an incompatible sdtype/locale combo - Issue [1653](https://github.com/sdv-dev/SDV/issues/1653) by pvk-developer

1.5.0

Several improvements and bug fixes were made in this release. Most notably, the metadata detection was substantially improved. Support for the 'unknown' sdtype was added, providing more flexibility in data representation. The software now attempts to intelligently detect primary keys and identify parent-child relationships in the metadata, streamlining the metadata creation process.

Additionally, issues related to conditional sampling with negative float values, the inability to update transformers for columns created by constraints, and compatibility with numpy version 1.25 and higher were addressed. The default branch was also switched from 'master' to 'main' for better development practices. Various bugs and errors, including those involving HMA and datetime format detection, were also resolved.

New Features

* Improve metadata detection - Issue [1515](https://github.com/sdv-dev/SDV/issues/1515) by R-Palazzo
* Support 'unknown' sdtype - Issue [1516](https://github.com/sdv-dev/SDV/issues/1516) by R-Palazzo
* Detect primary keys in metadata - Issue [1521](https://github.com/sdv-dev/SDV/issues/1521) by frances-h
* Detect relationships in MultiTableMetadata - Issue [1522](https://github.com/sdv-dev/SDV/issues/1522) by frances-h
* Make function to estimate number of columns HMA produces. - Issue [1572](https://github.com/sdv-dev/SDV/issues/1572) by fealho
* Add wrapper for get_cardinalty_plot - Issue [1573](https://github.com/sdv-dev/SDV/issues/1573) by frances-h
* [Metadata detection] Add a cardinality cap when choosing between categorical vs. numerical - Issue [1584](https://github.com/sdv-dev/SDV/issues/1584) by pvk-developer
* [Metadata Detection] Only make primary/foreign keys sdtype `id` (leave others as `unknown`) - Issue [1598](https://github.com/sdv-dev/SDV/issues/1598) by amontanez24
* Check and supply a more descriptive error when trying to use `'gaussian_kde'` with HMA - Issue [1604](https://github.com/sdv-dev/SDV/issues/1604) by frances-h

Bugs Fixed

* Conditional sampling with negative float values doesn't work - Issue [1161](https://github.com/sdv-dev/SDV/issues/1161) by fealho
* Cannot update transformers for columns that get created by constraints (`KeyError`) - Issue [1454](https://github.com/sdv-dev/SDV/issues/1454) by frances-h
* HMA produces KeyError for a schema with 3+ levels of depth - Issue [1558](https://github.com/sdv-dev/SDV/issues/1558) by fealho
* Columns consisting of only Nones are being detected as datetime - Issue [1589](https://github.com/sdv-dev/SDV/issues/1589) by pvk-developer
* HMASynthesizer throws an error when sampling multi table models with three levels of depths - Issue [1600](https://github.com/sdv-dev/SDV/issues/1600) by amontanez24
* `ValueError: Invalid distribution specification` when setting numerical_distributions on child table (HMA) - Issue [1605](https://github.com/sdv-dev/SDV/issues/1605) by fealho
* Bug: updating transformers in DataProcessor resets warning filters - Issue [1618](https://github.com/sdv-dev/SDV/issues/1618) by rwedge

Maintenance

* Investigate how to get numpy >1.25 to pass - Issue [1501](https://github.com/sdv-dev/SDV/issues/1501) by rwedge
* Switch default branch from master to main - Issue [1550](https://github.com/sdv-dev/SDV/issues/1550) by amontanez24

1.4.0

This release makes multiple improvements to the metadata. Both the single and multi table metadata classes now have a `validate_data` method. This method runs checks to validate the data against the current specifications in the metadata. The `SingleTableMetadata.visualize` is also improved. The sequence index is now shown in the same section as the sequence key. It also now shows all key and index information (eg. sequence key, primary key, sequence index) in one section.

The `CTGANSynthesizer` has been made more efficient in the following ways:
1. Boolean columns are now being skipped during `preprocess` like categorial columns are.
2. It is possible to apply other transformations to categorical columns and have `CTGAN` skip the one-hot encoding step.

Additional changes include that the columns labeled with the sdtype `id` will now go through the `IDGenerator` transformer by default and constraint transformations that were being overwritten during sampling will now be respected.

New Features

* Add validate_data method to Metadata - Issue [1518](https://github.com/sdv-dev/SDV/issues/1518) by fealho
* Use IDGenerator for ID columns - Issue [1519](https://github.com/sdv-dev/SDV/issues/1519) by frances-h
* Metadata visualization for sequential data: Only create 2 sections - Issue [1543](https://github.com/sdv-dev/SDV/issues/1543) by frances-h

Bugs Fixed

* Inefficient CTGAN modeling when adding categorical transformers - Issue [1450](https://github.com/sdv-dev/SDV/issues/1450) by fealho
* CTGANSynthesizer is assigning LabelEncoder to boolean columns (instead of None) - Issue [1530](https://github.com/sdv-dev/SDV/issues/1530) by fealho
* Metadata visualization for sequential data: Missing sequence index - Issue [1542](https://github.com/sdv-dev/SDV/issues/1542) by frances-h
* Constraint outputs are being overwritten in DataProcessor.reverse_transform - Issue [1551](https://github.com/sdv-dev/SDV/issues/1551) by amontanez24

Page 2 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.