Apache-beam

Latest version: v2.56.0

Safety actively analyzes 623126 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

2.57.0

Highlights

* New highly anticipated feature X added to Python SDK ([X](https://github.com/apache/beam/issues/X)).
* New highly anticipated feature Y added to Java SDK ([Y](https://github.com/apache/beam/issues/Y)).

I/Os

* Support for X source added (Java/Python) ([X](https://github.com/apache/beam/issues/X)).

New Features / Improvements

* Added Feast feature store handler for enrichment transform (Python) ([30957](https://github.com/apache/beam/issues/30964)).

Breaking Changes

* X behavior was changed ([X](https://github.com/apache/beam/issues/X)).
* Java's View.asList() side inputs are now optimized for iterating rather than
indexing when in the global window.
This new implementation still supports all (immutable) List methods as before,
but some of the random access methods like get() and size() will be slower.
To use the old implementation one can use View.asList().withRandomAccess().

Deprecations

* X behavior is deprecated and will be removed in X versions ([X](https://github.com/apache/beam/issues/X)).

Bugfixes

* Fixed X (Java/Python) ([X](https://github.com/apache/beam/issues/X)).

Security Fixes
* Fixed (CVE-YYYY-NNNN)[https://www.cve.org/CVERecord?id=CVE-YYYY-NNNN] (Java/Python/Go) ([#X](https://github.com/apache/beam/issues/X)).

Known Issues

* ([X](https://github.com/apache/beam/issues/X)).

2.56.0

Highlights

* Added FlinkRunner for Flink 1.17, removed support for Flink 1.12 and 1.13. Previous version of Pipeline running on Flink 1.16 and below can be upgraded to 1.17, if the Pipeline is first updated to Beam 2.56.0 with the same Flink version. After Pipeline runs with Beam 2.56.0, it should be possible to upgrade to FlinkRunner with Flink 1.17. ([29939](https://github.com/apache/beam/issues/29939))

I/Os

* Support for X source added (Java/Python) ([X](https://github.com/apache/beam/issues/X)).
* Upgraded Avro version to 1.11.3, kafka-avro-serializer and kafka-schema-registry-client versions to 7.6.0 (Java) ([30638](https://github.com/apache/beam/pull/30638)).
The newer Avro package is known to have breaking changes. If you are affected, you can keep pinned to older Avro versions which are also tested with Beam.

New Features / Improvements

* Added ability to control the exact number of models loaded across processes by RunInference. This may be useful for pipelines with tight memory constraints ([31052](https://github.com/apache/beam/pull/31052))
* Profiling of Cythonized code has been disabled by default. This might improve performance for some Python pipelines ([30938](https://github.com/apache/beam/pull/30938)).
* Bigtable enrichment handler now accepts a custom function to build a composite row key. (Python) ([30974](https://github.com/apache/beam/issues/30975)).

Breaking Changes

* X behavior was changed ([X](https://github.com/apache/beam/issues/X)).
* Default consumer polling timeout for KafkaIO.Read was increased from 1 second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration duration) to configure this timeout value when necessary ([30870](https://github.com/apache/beam/issues/30870)).
* Python Dataflow users no longer need to manually specify --streaming for pipelines using unbounded sources such as ReadFromPubSub.

Deprecations

* X behavior is deprecated and will be removed in X versions ([X](https://github.com/apache/beam/issues/X)).

Bugfixes

* Fixed locking issue when shutting down inactive bundle processors. Symptoms of this issue include slowness or stuckness in long-running jobs (Python) ([30679](https://github.com/apache/beam/pull/30679)).
* Fixed logging issue that caused silecing the pip output when installing of dependencies provided in `--requirements_file` (Python).
* Fixed pipeline stuckness issue by disallowing versions of grpcio that can cause the stuckness (Python) ([30867](https://github.com/apache/beam/issues/30867)).

Security Fixes
* Fixed (CVE-YYYY-NNNN)[https://www.cve.org/CVERecord?id=CVE-YYYY-NNNN] (Java/Python/Go) ([#X](https://github.com/apache/beam/issues/X)).

Known Issues

* ([X](https://github.com/apache/beam/issues/X)).

2.55.1

Bugfixes

* Fixed issue that broke WriteToJson in languages other than Java (X-lang) ([30776](https://github.com/apache/beam/issues/30776)).

2.55.0

Highlights

* The Python SDK will now include automatically generated wrappers for external Java transforms! ([29834](https://github.com/apache/beam/pull/29834))

I/Os

* Added support for handling bad records to BigQueryIO ([30081](https://github.com/apache/beam/pull/30081)).
* Full Support for Storage Read and Write APIs
* Partial Support for File Loads (Failures writing to files supported, failures loading files to BQ unsupported)
* No Support for Extract or Streaming Inserts
* Added support for handling bad records to PubSubIO ([30372](https://github.com/apache/beam/pull/30372)).
* Support is not available for handling schema mismatches, and enabling error handling for writing to pubsub topics with schemas is not recommended
* `--enableBundling` pipeline option for BigQueryIO DIRECT_READ is replaced by `--enableStorageReadApiV2`. Both were considered experimental and may subject to change (Java) ([26354](https://github.com/apache/beam/issues/26354)).

New Features / Improvements

* Allow writing clustered and not time partitioned BigQuery tables (Java) ([30094](https://github.com/apache/beam/pull/30094)).
* Redis cache support added to RequestResponseIO and Enrichment transform (Python) ([30307](https://github.com/apache/beam/pull/30307))
* Merged sdks/java/fn-execution and runners/core-construction-java into the main SDK. These artifacts were never meant for users, but noting
that they no longer exist. These are steps to bring portability into the core SDK alongside all other core functionality.
* Added Vertex AI Feature Store handler for Enrichment transform (Python) ([30388](https://github.com/apache/beam/pull/30388))

Breaking Changes

* Arrow version was bumped to 15.0.0 from 5.0.0 ([30181](https://github.com/apache/beam/pull/30181)).
* Go SDK users who build custom worker containers may run into issues with the move to distroless containers as a base (see Security Fixes).
* The issue stems from distroless containers lacking additional tools, which current custom container processes may rely on.
* See https://beam.apache.org/documentation/runtime/environments/#from-scratch-go for instructions on building and using a custom container.
* Python SDK has changed the default value for the `--max_cache_memory_usage_mb` pipeline option from 100 to 0. This option was first introduced in 2.52.0 SDK. This change restores the behavior of 2.51.0 SDK, which does not use the state cache. If your pipeline uses iterable side inputs views, consider increasing the cache size by setting the option manually. ([30360](https://github.com/apache/beam/issues/30360)).

Bugfixes

* Fixed SpannerIO.readChangeStream to support propagating credentials from pipeline options
to the getDialect calls for authenticating with Spanner (Java) ([30361](https://github.com/apache/beam/pull/30361)).
* Reduced the number of HTTP requests in GCSIO function calls (Python) ([30205](https://github.com/apache/beam/pull/30205))

Security Fixes

* Go SDK base container image moved to distroless/base-nossl-debian12, reducing vulnerable container surface to kernel and glibc ([30011](https://github.com/apache/beam/pull/30011)).

Known Issues

* In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 ([30679](https://github.com/apache/beam/pull/30679)).
* WriteToJson broken in languages other than Java (X-lang) ([30776](https://github.com/apache/beam/issues/30776)).
* Python pipelines might occasionally become stuck due to a regression in grpcio ([30867](https://github.com/apache/beam/issues/30867)). The issue manifests frequently with Bigtable IO connector, but might also affect other GCP connectors. Fixed in 2.56.0.

2.54.0

Highlights

* [Enrichment Transform](https://s.apache.org/enrichment-transform) along with GCP BigTable handler added to Python SDK ([#30001](https://github.com/apache/beam/pull/30001)).
* Beam Java Batch pipelines run on Google Cloud Dataflow will default to the Portable (Runner V2)[https://cloud.google.com/dataflow/docs/runner-v2] starting with this version. (All other languages are already on Runner V2.)
* This change is still rolling out to the Dataflow service, see (Runner V2 documentation)[https://cloud.google.com/dataflow/docs/runner-v2] for how to enable or disable it intentionally.

I/Os

* Added support for writing to BigQuery dynamic destinations with Python's Storage Write API ([30045](https://github.com/apache/beam/pull/30045))
* Adding support for Tuples DataType in ClickHouse (Java) ([29715](https://github.com/apache/beam/pull/29715)).
* Added support for handling bad records to FileIO, TextIO, AvroIO ([29670](https://github.com/apache/beam/pull/29670)).
* Added support for handling bad records to BigtableIO ([29885](https://github.com/apache/beam/pull/29885)).

New Features / Improvements

* [Enrichment Transform](https://s.apache.org/enrichment-transform) along with GCP BigTable handler added to Python SDK ([#30001](https://github.com/apache/beam/pull/30001)).

Breaking Changes

* N/A

Deprecations

* N/A

Bugfixes

* Fixed a memory leak affecting some Go SDK since 2.46.0. ([28142](https://github.com/apache/beam/pull/28142))

Security Fixes

* N/A

Known Issues

* Some Python pipelines that run with 2.52.0-2.54.0 SDKs and use large materialized side inputs might be affected by a performance regression. To restore the prior behavior on these SDK versions, supply the `--max_cache_memory_usage_mb=0` pipeline option. ([30360](https://github.com/apache/beam/issues/30360)).
* Python pipelines that run with 2.53.0-2.54.0 SDKs and perform file operations on GCS might be affected by excess HTTP requests. This could lead to a performance regression or a permission issue. ([28398](https://github.com/apache/beam/issues/28398))
* In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 ([30679](https://github.com/apache/beam/pull/30679)).

2.53.0

Not secure
Highlights

* Python streaming users that use 2.47.0 and newer versions of Beam should update to version 2.53.0, which fixes a known issue: ([27330](https://github.com/apache/beam/issues/27330)).

I/Os

* TextIO now supports skipping multiple header lines (Java) ([17990](https://github.com/apache/beam/issues/17990)).
* Python GCSIO is now implemented with GCP GCS Client instead of apitools ([25676](https://github.com/apache/beam/issues/25676))
* Added support for handling bad records to KafkaIO (Java) ([29546](https://github.com/apache/beam/pull/29546))
* Add support for generating text embeddings in MLTransform for Vertex AI and Hugging Face Hub models.([29564](https://github.com/apache/beam/pull/29564))
* NATS IO connector added (Go) ([29000](https://github.com/apache/beam/issues/29000)).
* Adding support for LowCardinality (Java) ([29533](https://github.com/apache/beam/pull/29533)).

New Features / Improvements

* The Python SDK now type checks `collections.abc.Collections` types properly. Some type hints that were erroneously allowed by the SDK may now fail. ([29272](https://github.com/apache/beam/pull/29272))
* Running multi-language pipelines locally no longer requires Docker.
Instead, the same (generally auto-started) subprocess used to perform the
expansion can also be used as the cross-language worker.
* Framework for adding Error Handlers to composite transforms added in Java ([29164](https://github.com/apache/beam/pull/29164)).
* Python 3.11 images now include google-cloud-profiler ([29561](https://github.com/apache/beam/pull/29651)).

Deprecations

* Euphoria DSL is deprecated and will be removed in a future release (not before 2.56.0) ([29451](https://github.com/apache/beam/issues/29451))

Bugfixes

* (Python) Fixed sporadic crashes in streaming pipelines that affected some users of 2.47.0 and newer SDKs ([27330](https://github.com/apache/beam/issues/27330)).
* (Python) Fixed a bug that caused MLTransform to drop identical elements in the output PCollection ([29600](https://github.com/apache/beam/issues/29600)).

Security Fixes

* Upgraded to go 1.21.5 to build, fixing [CVE-2023-45285](https://security-tracker.debian.org/tracker/CVE-2023-45285) and [CVE-2023-39326](https://security-tracker.debian.org/tracker/CVE-2023-39326)

Known Issues

* Potential race condition causing NPE in DataflowExecutionStateSampler in Dataflow Java Streaming pipelines ([29987](https://github.com/apache/beam/issues/29987)).
* Some Python pipelines that run with 2.52.0-2.54.0 SDKs and use large materialized side inputs might be affected by a performance regression. To restore the prior behavior on these SDK versions, supply the `--max_cache_memory_usage_mb=0` pipeline option. ([30360](https://github.com/apache/beam/issues/30360)).
* Python pipelines that run with 2.53.0-2.54.0 SDKs and perform file operations on GCS might be affected by excess HTTP requests. This could lead to a performance regression or a permission issue. ([28398](https://github.com/apache/beam/issues/28398))
* In Python pipelines, when shutting down inactive bundle processors, shutdown logic can overaggressively hold the lock, blocking acceptance of new work. Symptoms of this issue include slowness or stuckness in long-running jobs. Fixed in 2.56.0 ([30679](https://github.com/apache/beam/pull/30679)).

Page 1 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.