Google-pandas-load

Latest version: v6.0.0

Safety actively analyzes 621724 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

6.0.0

------------------
API Changes
^^^^^^^^^^^
* pandas==2.* is now required.

Improvement
^^^^^^^^^^^
* The price of a query is not shown in the logs anymore. Instead the number of
gigabytes, which is a simpler metric to compute, is now written in the logs.

5.0.1

------------------
Improvement
^^^^^^^^^^^
* Now, the initialization of :class:`google_pandas_load.loader.LoaderQuickSetup`
raises an error in the following cases:
- the project_id is provided but the dataset_name and the bucket_name are not.
- the dataset_name is provided but the project_id is not.
- the bucket_name is provided but the project_id is not.

Bugfixes
^^^^^^^^
* :class:`google_pandas_load.loader.LoaderQuickSetup` could not be initialized
only with the project_id and the bucket_name. Indeed the dataset_name had
to be provided if the project_id was provided. It is not the case anymore.

5.0.0

------------------
API Changes
^^^^^^^^^^^
* google-cloud-bigquery==3.* is now required.

* google-cloud-storage==2.* is now required.

* The available sources are now: 'query', 'dataset', 'bucket', 'local', 'dataframe'.

* The available destinations are now: 'dataset', 'bucket', 'local', 'dataframe'.

* :class:`google_pandas_load.loader.Loader` parameters are now: bq_client,
dataset_id, gs_client, bucket_name, bucket_dir_path, local_dir_path,
separator, chunk_size, timeout.

* :class:`google_pandas_load.loader.LoaderQuickSetup` parameters are now: project_id,
dataset_name, bucket_name, bucket_dir_path, credentials, local_dir_path,
separator, chunk_size, timeout.

* :class:`google_pandas_load.loader.Loader` getter functions are now:
bq_client, dataset_id, dataset_name, gs_client, bucket_name, bucket,
bucket_dir_path and local_dir_path.

* :class:`google_pandas_load.loader.LoaderQuickSetup` getter functions are now:
project_id, dataset_name, gs_client, bucket_name, bucket,
bucket_dir_path and local_dir_path.

* xmload and xload are removed from loader's methods.

* The mload method is renamed multi_load.

* Now, skip_blank_lines=False when using pandas.read_csv. This function is used
to load data from 'local' to 'dataframe'.

Bugfixes
^^^^^^^^
* The subfolders used to be considered as data in the bucket directory.
It is not the case anymore: only the blobs at the root of the bucket directory
are taken into account.

* The subfolders used to be considered as data in the local directory.
It is not the case anymore: only the files at the root of the local directory
are taken into account.

4.0.0

------------------
API Changes
^^^^^^^^^^^
* google-cloud-bigquery==2.* is now required.

* infer_datetime_format is removed from the arguments of the load methods.
It is set to True for pandas.read_csv when data is loaded
from 'local' to 'dataframe'.

* generated_data_name_prefix is removed from loader's arguments.
It is now impossible to add a custom prefix to generated data_names.

* max_concurrent_google_jobs is removed from loader's arguments. Concurrency
of bq_client jobs is now solely handled by google.

* use_wildcard is removed from loader's arguments. A wildcard is now always used
when data is loaded from 'bq' to 'gs'.

* compress is removed from loader's arguments. Data is now always compressed
when loaded from 'bq' to 'gs' or from 'dataframe' to 'local'.

Improvement
^^^^^^^^^^^
* :class:`google_pandas_load.loader.Loader` has now 8 getter functions:
bq_client, dataset_ref, dataset_id, dataset_name, bucket, bucket_name,
gs_dir_path and local_dir_path.

* Its child class :class:`google_pandas_load.loader_quick_setup.LoaderQuickSetup`
has in addition 2 getter functions: project_id and gs_client.

* The argument removals described in the API Changes section above simplify
the use of this library.

3.0.0

------------------
API Changes
^^^^^^^^^^^
* pandas==1.* is now required.

* For :class:`google_pandas_load.loader_quick_setup.LoaderQuickSetup`, the
parameter dataset_id is replaced by the parameter dataset_name. The reason
for this choice is explained in the Notes section below.

Improvement
^^^^^^^^^^^
* For :meth:`google_pandas_load.loader.Loader.load`, when the parameter
destination is set to 'bq' and the parameter source is set to 'gs' or
'local', the bq_schema parameter is not required anymore. If it is not
passed, it falls back to an inferred value from the CSV with
`google.cloud.bigquery.job.LoadJobConfig.autodetect`_.

Notes
^^^^^
* We use new conventions for naming some BigQuery objects. This causes only one
API change (the second one in the API Changes section above). Let us describe
the new conventions with an example. Suppose we have a BigQuery table whose
address is project1.dataset1.table1. We say that:

- project1 is a project_id.
- project1.dataset1 is a dataset_id.
- project1.dataset1.table1 is a table_id.
- dataset1 is a dataset_name.
- table1 is a table_name.

2.0.1

------------------
Improvement
^^^^^^^^^^^
* The data is deleted in transitional locations even if its transfer fails.

Bugfixes
^^^^^^^^
* The method `google.cloud.bigquery.job.QueryJob.result()`_ is used again
to wait for a google job to be completed. The timeout bug described in
the previous "bugfixes" seems to be due to a Docker configuration problem.

* The end of a step "query_to_bq" produced the log: "Ended source to bq".
It has been corrected to "Ended query to bq".

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.