Ray

Latest version: v2.22.0

Safety actively analyzes 630052 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 15

2.7.1

Not secure

Release Highlights

* Ray Serve:
* Added an `application` tag to the `ray_serve_num_http_error_requests` metric
* Fixed a bug where no data shows up on the `Error QPS per Application` panel in the Ray Dashboard
* RLlib:
* DreamerV3: Bug fix enabling support for continuous actions.
* Ray Train:
* Fix a bug where setting a local storage path on Windows errors ([39951](https://github.com/ray-project/ray/pull/39951))
* Ray Tune:
* Fix a broken `Trial.node_ip` property ([40028](https://github.com/ray-project/ray/pull/40028))
* Ray Core:
* Fixes a segfault when a streaming generator and actor cancel is used together
* Fix autoscaler sdk accidentally initialize ray worker leading to leaked driver showing up in the dashboard.
* Added a new user guide and fixes for the vSphere cluster launcher.
* Fixed a bug where `ray start `would occasionally fail with `ValueError: `acceleratorType` should match v(generation)-(cores/chips).`
* Dashboard:
* Improvement on cluster page UI
* Fix a bug that overview page UI will crash

Ray Libraries

Ray Serve

🔨 Fixes:

* Fixed a bug where no data shows up on the `Error QPS per Application` panel in the Ray Dashboard

RLlib

🔨 Fixes:

* DreamerV3: Bug fix enabling support for continuous actions ([39751](https://github.com/ray-project/ray/issues/39751)).

Ray Core and Ray Clusters

🔨 Fixes:

* Fixed Ray cluster stability on a high latency environment

Thanks

Many thanks to all those who contributed to this release!

chaowanggg, allenwang28, shrekris-anyscale, GeneDer, justinvyu, can-anyscale, edoakes, architkulkarni, rkooo567, rynewang, rickyyx, sven1977

2.7.0

Not secure

Release Highlights

Ray 2.7 release brings important stability improvements and enhancements to Ray libraries, with Ray Train and Ray Serve becoming generally available. Ray 2.7 is accompanied with a GA release of KubeRay.

* Following user feedback, we are rebranding “Ray AI Runtime (AIR)” to “Ray AI Libraries”. Without reducing any of the underlying functionality of the original Ray AI runtime vision as put forth in Ray 2.0, the underlying namespace (ray.air) is consolidated into ray.data, ray.train, and ray.tune. This change reduces the friction for new machine learning (ML) practitioners to quickly understand and leverage Ray for their production machine learning use cases.
* With this release, Ray Serve and Ray Train’s Pytorch support are becoming Generally Available -- indicating that the core APIs have been marked stable and that both libraries have undergone significant production hardening.
* In Ray Serve, we are introducing a new backwards-compatible `DeploymentHandle` API to unify various existing Handle APIs, a high performant gRPC proxy to serve gRPC requests through Ray Serve, along with various stability and usability improvements.
* In Ray Train, we are consolidating various Pytorch-based trainers into the TorchTrainer, reducing the amount of refactoring work new users needed to scale existing training scripts. We are also introducing a new train.Checkpoint API, which provides a consolidated way of interacting with remote and local storage, along with various stability and usability improvements.
* In Ray Core, we’ve added initial integrations with TPUs and AWS accelerators, enabling Ray to natively detect these devices and schedule tasks/actors onto them. Ray Core also officially now supports actor task cancellation and has an experimental streaming generator that supports streaming response to the caller.

Take a look at our [refreshed documentation](https://docs.ray.io/en/releases-2.7.0) and the [Ray 2.7 migration guide](https://docs.google.com/document/d/1J-09US8cXc-tpl2A1BpOrlHLTEDMdIJp6Ah1ifBUw7Y/view#heading=h.3eeweptnwn6p) and let us know your feedback!

Ray Libraries

Ray AIR

🏗 Architecture refactoring:

* **Ray AIR namespace**: We are sunsetting the "Ray AIR" concept and namespace (39516, 38632, 38338, 38379, 37123, 36706, 37457, 36912, 37742, 37792, 37023). The changes follow the proposal outlined in [this REP](https://github.com/ray-project/enhancements/pull/36).
* **Ray Train Preprocessors, Predictors**: We now recommend using Ray Data instead of Preprocessors (38348, 38518, 38640, 38866) and Predictors (38209).

Ray Data

🎉 New Features:

* In this release, we’ve integrated the Ray Core streaming generator API by default, which allows us to reduce memory footprint throughout the data pipeline (37736).
* Avoid unnecessary data buffering between `Read` and `Map` operator (zero-copy fusion) (38789)
* Add `Dataset.write_images` to write images (38228)
* Add `Dataset.write_sql()` to write SQL databases (38544)
* Support sort on multiple keys (37124)
* Support reading and writing JSONL file format (37637)
* Support class constructor args for `Dataset.map()` and `flat_map()` (38606)
* Implement streamed read from Hugging Face Dataset (38432)

💫Enhancements:

* Read data with multi-threading for `FileBasedDataSource` (39493)
* Optimization to reduce `ArrowBlock` building time for blocks of size 1 (38988)
* Add `partition_filter` parameter to `read_parquet `(38479)
* Apply limit to `Dataset.take()` and related methods (38677)
* Postpone `reader.get_read_tasks` until execution (38373)
* Lazily construct metadata providers (38198)
* Support writing each block to a separate file (37986)
* Make `iter_batches` an Iterable (37881)
* Remove default limit on `Dataset.to_pandas()` (37420)
* Add `Dataset.to_dask()` parameter to toggle consistent metadata check (37163)
* Add `Datasource.on_write_start` (38298)
* Remove support for `DatasetDict` as input into `from_huggingface()` (37555)

🔨 Fixes:

* Backwards compatibility for `Preprocessor` that have been fit in older versions (39488)
* Do not eagerly free root `RefBundles` (39085)
* Retry open files with exponential backoff (38773)
* Avoid passing `local_uri` to all non-Parquet data sources (38719)
* Add `ctx` parameter to `Datasource.write` (38688)
* Preserve block format on `map_batches` over empty blocks (38161)
* Fix args and kwargs passed to `ActorPool` `map_batches` (38110)
* Add `tif` file extension to `ImageDatasource` (38129)
* Raise error if PIL can't load image (38030)
* Allow automatic handling of string features as byte features during TFRecord serialization (37995)
* Remove unnecessary file system wrapping (38299)
* Remove `_block_udf` from `FileBasedDatasource` reads (38111)

📖Documentation:

* Standardize API references (37015, 36980, 37007, 36982, etc)

Ray Train

🤝 API Changes

* **Ray Train and Ray Tune Checkpoints:** Introduced a new `train.Checkpoint` class that unifies interaction with remote storage such as S3, GS, and HDFS. The changes follow the proposal in [[REP35] Consolidated persistence API for Ray Train/Tune](https://github.com/ray-project/enhancements/pull/35) (#38452, 38481, 38581, 38626, 38864, 38844)
* **Ray Train with PyTorch Lightning:** Moving away from the LightningTrainer in favor of the TorchTrainer as the recommended way of running distributed PyTorch Lightning. The changes follow the proposal outlined in [[REP37] [Train] Unify Torch based Trainers on the TorchTrainer API](https://github.com/ray-project/enhancements/pull/37) (#37989)
* **Ray Train with Hugging Face Transformers/Accelerate:** Moving away from the TransformersTrainer/AccelerateTrainer in favor of the TorchTrainer as the recommended way of running distributed Hugging Face Transformers and Accelerate. The changes follow the proposal outlined in [[REP37] [Train] Unify Torch based Trainers on the TorchTrainer API](https://github.com/ray-project/enhancements/pull/37) (#38083, 38295)
* Deprecated `preprocessor` arg to `Trainer` (38640)
* Removed deprecated `Result.log_dir` (38794)

💫Enhancements:

* Various improvements and fixes for the console output of Ray Train and Tune (37572, 37571, 37570, 37569, 37531, 36993)
* Raise actionable error message for missing dependencies (38497)
* Use posix paths throughout library code (38319)
* Group consecutive workers by IP (38490)
* Split all Ray Datasets by default (38694)
* Add static Trainer methods for getting tree-based models (38344)
* Don't set rank-specific local directories for Train workers (38007)

🔨 Fixes:

* Fix trainer restoration from S3 (38251)

🏗 Architecture refactoring:

* Updated internal usage of the new Checkpoint API (38853, 38804, 38697, 38695, 38757, 38648, 38598, 38617, 38554, 38586, 38523, 38456, 38507, 38491, 38382, 38355, 38284, 38128, 38143, 38227, 38141, 38057, 38104, 37888, 37991, 37962, 37925, 37906, 37690, 37543, 37475, 37142, 38855, 38807, 38818, 39515, 39468, 39368, 39195, 39105, 38563, 38770, 38759, 38767, 38715, 38709, 38478, 38550, 37909, 37613, 38876, 38868, 38736, 38871, 38820, 38457)

📖Documentation:

* Restructured the Ray Train documentation to make it easier to find relevant content (37892, 38287, 38417, 38359)
* Improved examples, references, and navigation items (38049, 38084, 38108, 37921, 38391, 38519, 38542, 38541, 38513, 39510, 37588, 37295, 38600, 38582, 38276, 38686, 38537, 38237, 37016)
* Removed outdated examples (38682, 38696, 38656, 38374, 38377, 38441, 37673, 37657, 37067)

Ray Tune

🤝 API Changes

* **Ray Train and Ray Tune Checkpoints:** Introduced a new `train.Checkpoint` class that unifies interaction with remote storage such as S3, GS, and HDFS. The changes follow the proposal in [[REP35] Consolidated persistence API for Ray Train/Tune](https://github.com/ray-project/enhancements/pull/35) (#38452, 38481, 38581, 38626, 38864, 38844)
* Removed deprecated `Result.log_dir` (38794)

💫Enhancements:

* Various improvements and fixes for the console output of Ray Train and Tune (37572, 37571, 37570, 37569, 37531, 36993)
* Raise actionable error message for missing dependencies (38497)
* Use posix paths throughout library code (38319)
* Improved the PyTorchLightning integration (38883, 37989, 37387, 37400)
* Improved the XGBoost/LightGBM integrations (38558, 38828)

🔨 Fixes:

* Fix hyperband r calculation and stopping (39157)
* Replace deprecated np.bool8 (38495)
* Miscellaneous refactors and fixes (38165, 37506, 37181, 37173)

🏗 Architecture refactoring:

* Updated internal usages of the new Checkpoint API (38853, 38804, 38697, 38695, 38757, 38648, 38598, 38617, 38554, 38586, 38523, 38456, 38507, 38491, 38382, 38355, 38284, 38128, 38143, 38227, 38141, 38057, 38104, 37888, 37991, 37962, 37925, 37906, 37690, 37543, 37475, 37142, 38855, 38807, 38818, 39515, 39468, 39368, 39195, 39105, 38563, 38770, 38759, 38767, 38715, 38709, 38478, 38550, 37909, 37613, 38876, 38868, 38736, 38871, 38820, 38457)
* Removed legacy TrialRunner/Executor (37927)

Ray Serve

🎉 New Features:

* Added keep_alive_timeout_s to Serve config file to allow users to configure HTTP proxy’s duration to keep idle connections alive when no requests are ongoing.
* Added gRPC proxy to serve gRPC requests through Ray Serve. It comes with feature parity with HTTP while offering better performance. Also, replaces the previous experimental gRPC direct ingress.
* Ray 2.7 introduces a new `DeploymentHandle` API that will replace the existing `RayServeHandle` and `RayServeSyncHandle` APIs in a future release. You are encouraged to migrate to the new API to avoid breakages in the future. To opt in, either use `handle.options(use_new_handle_api=True)` or set the global environment variable `export RAY_SERVE_ENABLE_NEW_HANDLE_API=1`. See https://docs.ray.io/en/latest/serve/model_composition.html for more details.
* Added a new API `get_app_handle` that gets a handle used to send requests to an application. The API uses the new `DeploymentHandle` API.
* Added a new developer API `get_deployment_handle` that gets a handle that can be used to send requests to any deployment in any application.
* Added replica placement group support.
* Added a new API `serve.status` which can be used to get the status of proxies and Serve applications (and their deployments and replicas). This is the pythonic equivalent of the CLI `serve status`.
* A `--reload` option has been added to the `serve run` CLI.
* Support X-Request-ID in http header

💫Enhancements:

* Downstream handlers will now be canceled when the HTTP client disconnects or an end-to-end timeout occurs.
* Ray Serve is now “generally available,” so the core APIs have been marked stable.
* `serve.start` and `serve.run` have a few small changes and deprecations in preparation for this, see [https://docs.ray.io/en/latest/serve/api/index.html](https://docs.ray.io/en/latest/serve/api/index.html) for details.
* Added a new metric (`ray_serve_num_ongoing_http_requests`) to track the number of ongoing requests in each proxy
* Add `RAY_SERVE_MULTIPLEXED_MODEL_ID_MATCHING_TIMEOUT_S` flag to wait until the model matching.
* Reduce the multiplexed model id information publish interval.
* Add Multiplex metrics into dashboard
* Added metrics to track controller restarts and control loop progress
* [https://github.com/ray-project/ray/pull/38177](https://github.com/ray-project/ray/pull/38177)
* [https://github.com/ray-project/ray/pull/38000](https://github.com/ray-project/ray/pull/38000)
* Various stability, flexibility, and performance enhancements to Ray Serve’s autoscaling.
* [https://github.com/ray-project/ray/pull/38107](https://github.com/ray-project/ray/pull/38107)
* [https://github.com/ray-project/ray/pull/38034](https://github.com/ray-project/ray/pull/38034)
* [https://github.com/ray-project/ray/pull/38267](https://github.com/ray-project/ray/pull/38267)
* [https://github.com/ray-project/ray/pull/38349](https://github.com/ray-project/ray/pull/38349)
* [https://github.com/ray-project/ray/pull/38351](https://github.com/ray-project/ray/pull/38351)

🔨 Fixes:

* Fixed a memory leak in Serve components by upgrading gRPC: [https://github.com/ray-project/ray/issues/38591](https://github.com/ray-project/ray/issues/38591).
* Fixed a memory leak due to `asyncio.Event`s not being removed in the long poll host: [https://github.com/ray-project/ray/pull/38516](https://github.com/ray-project/ray/pull/38516).
* Fixed a bug where bound deployments could not be passed within custom objects: [https://github.com/ray-project/ray/issues/38809](https://github.com/ray-project/ray/issues/38809).
* Fixed a bug where all replica handles were unnecessarily broadcasted to all proxies every minute: [https://github.com/ray-project/ray/pull/38539](https://github.com/ray-project/ray/pull/38539).
* Fixed a bug where `ray_serve_deployment_queued_queries` wouldn’t decrement when clients disconnected:[ https://github.com/ray-project/ray/pull/37965](https://github.com/ray-project/ray/pull/37965).

📖Documentation:

* Added docs for how to use keep_alive_timeout_s in the Serve config file.
* Added usage and examples for serving gRPC requests through Serve’s gRPC proxy.
* Added example for passing deployment handle responses by reference.
* Added a Ray Serve Autoscaling guide to the Ray Serve docs that goes over basic configurations and autoscaling examples. Also added an Advanced Ray Serve Autoscaling guide that goes over more advanced configurations and autoscaling examples.
* Added docs explaining how to debug memory leaks in Serve.
* Added docs that explain how Serve cancels disconnected requests and how to handle those disconnections.

RLlib

🎉 New Features:

* In Ray RLlib, we have implemented Google’s new [DreamerV3](https://github.com/ray-project/ray/tree/master/rllib/algorithms/dreamerv3), a sample-efficient, model-based, and hyperparameter hassle-free algorithm. It solves a wide variety of challenging reinforcement learning environments out-of-the-box (e.g. the MineRL diamond challenge), for arbitrary observation- and action-spaces as well as dense and sparse reward functions.

💫Enhancements:

* Added support for Gymnasium 0.28.1 [(35698](https://github.com/ray-project/ray/pull/35698))
* Dreamer V3 tuned examples and support for “XL” Dreamer models ([38461](https://github.com/ray-project/ray/pull/38461))
* Added an action masking example for RL Modules ([38095](https://github.com/ray-project/ray/pull/38095))

🔨 Fixes:

* Multiple fixes to DreamerV3 ([37979](https://github.com/ray-project/ray/pull/37979)) ([#38259](https://github.com/ray-project/ray/pull/38259)) ([#38461](https://github.com/ray-project/ray/pull/38461)) ([#38981](https://github.com/ray-project/ray/pull/38981))
* Fixed TorchBinaryAutoregressiveDistribution.sampled_action_logp() returning probs not log probs. ([37240](https://github.com/ray-project/ray/pull/37240))
* Fix a bug in Multi-Categorical distribution. It should use logp and not log_p. ([36814](https://github.com/ray-project/ray/pull/36814))
* Index tensors in slate epsilon greedy properly so SlateQ does not fail on multiple GPUs ([37481](https://github.com/ray-project/ray/pull/37481))
* Removed excessive deprecation warnings in exploration related files ([37404](https://github.com/ray-project/ray/pull/37404))
* Fixed missing agent index in policy input dict on environment reset ([37544](https://github.com/ray-project/ray/pull/37544))

📖Documentation:

* Added docs for DreamerV3 [(37978](https://github.com/ray-project/ray/pull/37978))
* Added docs on torch.compile usage ([37252](https://github.com/ray-project/ray/pull/37252))
* Added docs for the Learner API [(37729](https://github.com/ray-project/ray/pull/37729))
* Improvements to Catalogs and RL Modules docs + Catalogs improvements ([37245](https://github.com/ray-project/ray/pull/37245))
* Extended our metrics and callbacks example to showcase how to do custom summarisation on custom metrics ([37292](https://github.com/ray-project/ray/pull/37292))

Ray Core and Ray Clusters

Ray Core

🎉 New Features:
* [Actor task cancelation](https://docs.ray.io/en/master/ray-core/actors.html#cancelling-actor-tasks) is officially supported.
* The experimental streaming generator is now available. It means the yielded output is sent to the caller before the task is finished and overcomes the [limitation from `num_returns="dynamic"` generator](https://docs.ray.io/en/latest/ray-core/tasks/generators.html#limitations). The API could be used by specifying `num_returns="streaming"`. The API has been used for Ray data and Ray serve to support streaming use cases. [See the test script](https://github.com/ray-project/ray/blob/0d6bc79bbba400e91346a021279501e05940b51e/python/ray/tests/test_streaming_generator.py#L123) to learn how to use the API. The documentation will be available in a few days.

💫Enhancements:
* Minimal Ray installation `pip install ray` doesn't require the Python grpcio dependency anymore.
* [Breaking change] `ray job submit` now exits with `1` if the job fails instead of `0`. To get the old behavior back, you may use `ray job submit ... || true` . ([38390](https://github.com/ray-project/ray/pull/38390))
* [Breaking change] `get_assigned_resources` in pg will return the name of the original resources instead of formatted name (37421)
* [Breaking change] Every env var specified via `${ENV_VAR} ` now can be replaced. Previous versions only supported limited number of env vars. (36187)
* [Java] Update Guava package (38424)
* [Java] Update Jackson Databind XML Parsing (38525)
* [Spark] Allow specifying CPU / GPU / Memory resources for head node of Ray cluster on spark (38056)

🔨 Fixes:
* [Core] Internal gRPC version is upgraded from 1.46.6 to 1.50.2, which fixes the memory leak issue
* [Core] Bind jemalloc to raylet and GCS (38644) to fix memory fragmentation issue
* [Core] Previously, when a ray is started with `ray start --node-ip-address=...`, the driver also had to specify `ray.init(_node_ip_address)`. Now Ray finds the node ip address automatically. (37644)
* [Core] Child processes of workers are cleaned up automatically when a raylet dies (38439)
* [Core] Fix the issue where there are lots of threads created when using async actor (37949)
* [Core] Fixed a bug where tracing did not work when an actor/task was defined prior to calling `ray.init`: [https://github.com/ray-project/ray/issues/26019](https://github.com/ray-project/ray/issues/26019)
* Various other bug fixes
* [Core] loosen the check on release object (39570)
* [Core][agent] fix the race condition where the worker process terminated during the get_all_workers call 37953
* [Core]Fix PG leakage caused by GCS restart when PG has not been successfully remove after the job died (35773)
* [Core]Fix internal_kv del api bug in client proxy mode (37031)
* [Core] Pass logs through if sphinx-doctest is running (36306)
* [Core][dashboard] Make intentional ray system exit from worker exit non task failing (38624)
* [Core][dashboard] Add worker pid to task info (36941)
* [Core] Use 1 thread for all fibers for an actor scheduling queue. (37949)
* [runtime env] Fix Ray hangs when nonexistent conda environment is specified 28105 (34956)

Ray Clusters

💫Enhancements:

* New Cluster Launcher for vSphere [37815](https://github.com/ray-project/ray/pull/37815)
* TPU pod support for cluster launcher [37934](https://github.com/ray-project/ray/pull/37934)

📖Documentation:

* The KubeRay documentation has been moved to [https://docs.ray.io/en/latest/cluster/kubernetes/index.html](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) from its old location at [https://ray-project.github.io/kuberay/](https://ray-project.github.io/kuberay/).
* New guide: GKE Ingress on KubeRay ([39073](https://github.com/ray-project/ray/pull/39073))
* New tutorial: Cloud storage from GKE on KubeRay [38858](https://github.com/ray-project/ray/pull/38858)
* New tutorial: Batch inference tutorial using KubeRay RayJob CR [38857](https://github.com/ray-project/ray/pull/38857)
* New benchmarks for RayService custom resource on KubeRay [38647](https://github.com/ray-project/ray/pull/38647)
* New tutorial: Text summarizer using NLP with RayService [38647](https://github.com/ray-project/ray/pull/38647)

Thanks

Many thanks to all those who contributed to this release!

simran-2797, can-anyscale, akshay-anyscale, c21, EdwardCuiPeacock, rynewang, volks73, sven1977, alexeykudinkin, mattip, Rohan138, larrylian, DavidYoonsik, scv119, alpozcan, JalinWang, peterghaddad, rkooo567, avnishn, JoshKarpel, tekumara, zcin, jiwq, nikosavola, seokjin1013, shrekris-anyscale, ericl, yuxiaoba, vymao, architkulkarni, rickyyx, bveeramani, SongGuyang, jjyao, sihanwang41, kevin85421, ArturNiederfahrenhorst, justinvyu, pleaseupgradegrpcio, aslonnie, kukushking, 94929, jrosti, MattiasDC, edoakes, PRESIDENT810, cadedaniel, ddelange, alanwguo, noahjax, matthewdeng, pcmoritz, richardliaw, vitsai, Michaelvll, tanmaychimurkar, smiraldr, wfangchi, amogkam, crypdick, WeichenXu123, darthhexx, angelinalg, chaowanggg, GeneDer, xwjiang2010, peytondmurray, z4y1b2, scottsun94, chappidim, jovany-wang, jaidisido, krfricke, woshiyyya, Shubhamurkade, ijrsvt, scottjlee, kouroshHakha, allenwang28, raulchen, stephanie-wang, iycheng

2.6.3

Not secure

The Ray 2.6.3 patch release contains fixes for Ray Serve, and Ray Core streaming generators.

Ray Core
🔨 Fixes:
* [Core][Streaming Generator] Fix memory leak from the end of object stream object 38152 (38206)

Ray Serve
🔨 Fixes:
* [Serve] Fix `serve run` help message (37859) (38018)
* [Serve] Decrement `ray_serve_deployment_queued_queries` when client disconnects (37965) (38020)

RLib
📖 Documentation:
* [RLlib][docs] Learner API Docs (37729) (38137)

2.6.2

Not secure

The Ray 2.6.2 patch release contains a critical fix for ray's logging setup, as well fixes for Ray Serve, Ray Data, and Ray Job.

Ray Core
🔨 Fixes:
* [Core] Pass logs through if sphinx-doctest is running (36306) (37879)
* [cluster-launcher] Pick GCP cluster launcher tests and fix (37797)

Ray Serve
🔨 Fixes:
* [Serve] Apply `request_timeout_s` from Serve config to the cluster (37884) (37903)

Ray Air
🔨 Fixes:
* [air] fix pyarrow lazy import (37670) (37883)

2.6.1

Not secure

The Ray 2.6.1 patch release contains a critical fix for cluster launcher, and compatibility update for Ray Serve protobuf definition with python 3.11, as well doc improvements.

⚠️ Cluster launcher in Ray 2.6.0 fails to start multi-node clusters. Please update to 2.6.1 if you plan to use 2.6.0 cluster launcher.

Ray Core
🔨 Fixes:
* [core][autoscaler] Fix env variable overwrite not able to be used if the command itself uses the env 37675

Ray Serve
🔨 Fixes:
* [serve] Cherry-pick Serve enum to_proto fixes for Python 3.11 37660

Ray Air
📖Documentation:
* [air][doc] Update docs to reflect head node syncing deprecation 37475

2.6.0

Not secure

Release Highlights

* **Serve**: Better streaming support -- In this release, Support for HTTP streaming response and WebSockets is now on by default. Also, `serve.batch`-decorated methods can stream responses.
* **Train and Tune**: Users are now expected to provide cloud storage or NFS path for distributed training or tuning jobs instead of a local path. This means that results written to different worker machines will not be directly synced to the head node. Instead, this will raise an error telling you to switch to one of the recommended alternatives: cloud storage or NFS. Please see https://github.com/ray-project/ray/issues/37177 if you have questions.
* **Data**: We are introducing a new streaming integration of Ray Data and Ray Train. This allows streaming data ingestion for model training, and enables per-epoch data preprocessing. The DatasetPipeline API is also being deprecated in favor of Dataset with streaming execution.
* **RLlib**: Public alpha release for the new multi-gpu Learner API that is less complex and more powerful compared to our previous solution ([blogpost](https://www.anyscale.com/blog/introducing-rllib-multi-gpu-stack-for-cost-efficient-scalable-multi-gpu-rl)). This is used under PPO algorithm by default.

Ray Libraries

Ray AIR

🎉 **New Features**:
* Added support for restoring Results from local trial directories. (35406)

💫 **Enhancements**:
* [Train/Tune] Disable Train/Tune syncing to head node (37142)
* [Train/Tune] Introduce new console output progress reporter for Train and Tune (35389, 36154, 36072, 35770, 36764, 36765, 36156, 35977)
* [Train/Data] New Train<>Data streaming integration (35236, 37215, 37383)

🔨 **Fixes**:
* Pass on KMS-related kwargs for s3fs (35938)
* Fix infinite recursion in log redirection (36644)
* Remove temporary checkpoint directories after restore (37173)
* Removed actors that haven't been started shouldn't be tracked (36020)
* Fix bug in execution for actor re-use (36951)
* Cancel `pg.ready()` task for pending trials that end up reusing an actor (35748)
* Add case for `Dict[str, np.array]` batches in `DummyTrainer` read bytes calculation (36484)

📖 **Documentation**:
* Remove experimental features page, add github issue instead (36950)
* Fix batch format in `dreambooth` example (37102)
* Fix Checkpoint.from_checkpoint docstring (35793)

🏗 **Architecture refactoring**:
* Remove deprecated mlflow and wandb integrations (36860, 36899)
* Move constants from tune/results.py to air/constants.py (35404)
* Clean up a few checkpoint related things. (35321)

Ray Data

🎉 **New Features**:
* New streaming integration of Ray Data and Ray Train. This allows streaming data ingestion for model training, and enables per-epoch data preprocessing. (35236)
* Enable execution optimizer by default (36294, 35648, 35621, 35952)
* Deprecate DatasetPipeline (35753)
* Add `Dataset.unique()` (36655, 36802)
* Add option for parallelizing post-collation data batch operations in `DataIterator.iter_batches()` (36842) (37260)
* Enforce strict mode batch format for `DataIterator.iter_batches()` (36686)
* Remove `ray.data.range_arrow()` (35756)

💫 **Enhancements**:
* Optimize block prefetching (35568)
* Enable isort for data directory (35836)
* Skip writing a file for an empty block in `Dataset.write_datasource()` (36134)
* Remove shutdown logging from StreamingExecutor (36408)
* Spread map task stages by default for arg size <50MB (36290)
* Read->SplitBlocks to ensure requested read parallelism is always met (36352)
* Support partial execution in `Dataset.schema()` with new execution plan optimizer (36740)
* Propagate iter stats for `Dataset.streaming_split()` (36908)
* Cache the computed schema to avoid re-executing (37103)

🔨 **Fixes**:
* Support sub-progress bars on AllToAllOperators with optimizer enabled (34997)
* Fix DataContext not propagated properly for `Dataset.streaming_split()` operator
* Fix edge case in empty bundles with `Dataset.streaming_split()` (36039)
* Apply Arrow table indices mapping on HuggingFace Dataset prior to reading into Ray Data (36141)
* Fix issues with combining use of `Dataset.materialize()` and `Dataset.streaming_split()` (36092)
* Fix quadratic slowdown when locally shuffling tensor extension types (36102)
* Make sure progress bars always finish at 100% (36679)
* Fix wrong output order of `Dataset.streaming_split()` (36919)
* Fix the issue that StreamingExecutor is not shutdown when the iterator is not fully consumed (36933)
* Calculate stage execution time in StageStatsSummary from `BlockMetadata` (37119)

📖 **Documentation**:
* Standardize Data API ref (36432, 36937)
* Docs for working with PyTorch (36880)
* Split "Consuming data" guide (36121)
* Revise "Loading data" (36144)
* Consolidate Data user guides (36439)

🏗 **Architecture refactoring**:
* Remove simple blocks representation (36477)

Ray Train

🎉 **New Features**:
* LightningTrainer support DeepSpeedStrategy (36165)

💫 **Enhancements**:
* Unify Lightning and AIR CheckpointConfig (36368)
* Add support for custom pipeline class in TransformersPredictor (36494)

🔨 **Fixes**:
* Fix Deepspeed device ranks check in Lightning 2.0.5 (37387)
* Clear stale lazy checkpointing markers on all workers. (36291)

📖 **Documentation**:
* Migrate Ray Train `code-block` to `testcode`. (36483)

🏗 Architecture refactoring:
* Deprecate `BatchPredictor` (36947, 37178)

Ray Tune

🔨 **Fixes**:

* Optuna: Update distributions to use new APIs (36704)
* BOHB: Fix nested bracket processing (36568)
* Hyperband: Fix scheduler raising an error for good `PENDING` trials (35338)
* Fix param space placeholder injection for numpy/pandas objects (35763)
* Fix result restoration with Ray Client (35742)
* Fix trial runner/controller whitelist attributes (35769)

📖 **Documentation**:
* Remove missing example from Tune "Other examples" (36691)

🏗 **Architecture refactoring**:
* Remove `tune/automl` (35557)
* Remove hard-deprecated modules from structure refactor (36984)
* Remove deprecated mlflow and wandb integrations (36860, 36899)
* Move constants from tune/results.py to air/constants.py (35404)
* Deprecate redundant syncing related parameters (36900)
* Deprecate legacy modules in `ray.tune.integration` (35160)

Ray Serve

💫 **Enhancements**:
* Support for HTTP streaming response and WebSockets is now on by default.
* `serve.batch`-decorated methods can stream responses.
* `serve.batch` settings can be reconfigured dynamically.
* Ray Serve now uses “power of two random choices” routing. This improves enforcement of `max_concurrent_queries` and tail latencies under load.

🔨 **Fixes**:
* Fixed the bug previously unable to use a custom module named after “utils”.
* Fixed serve downscaling issue by adding a new draining state to the http proxy. This helps http proxies to not take new requests when there are no replicas on the node and prevents interruption on the ongoing requests when the node is downscaled. Also, enables downscaling to happen when the requests use Ray’s object store which is blocking downscaling of the node.
* Fixed non-atomic shutdown logic. Serve shutdown will be run in the background and not require the client to wait for the shutdown to complete. And won’t be interrupted when the client is force killed.

RLlib

🎉 **New Features**:
* Public alpha release for the new multi-gpu Learner API that is less complex and more powerful than the old training stack ([blogpost](https://www.anyscale.com/blog/introducing-rllib-multi-gpu-stack-for-cost-efficient-scalable-multi-gpu-rl)). This is used under PPO algorithm by default.
* Added RNN support on the new RLModule API
* Added TF-version of DreamerV3 ([link](https://github.com/ray-project/ray/tree/master/rllib/algorithms/dreamerv3)). The comprehensive results will be published soon.
* Added support for torch 2.x compile method in sampling from environment

💫 **Enhancements**:
* Added an Example on how to do pretraining with BC and then continuing finetuning with PPO ([example](https://github.com/ray-project/ray/blob/master/rllib/examples/learner/train_w_bc_finetune_w_ppo.py))
* RLlib deprecation Notices (algorithm/, evaluation/, execution/, models/jax/) ([36826](https://github.com/ray-project/ray/pull/36826))
* Enable eager_tracing=True by default. ([36556](https://github.com/ray-project/ray/pull/36556))

🔨 **Fixes**:
* Fix bug in Multi-Categorical distribution. It should use logp and not log_p. ([36814](https://github.com/ray-project/ray/pull/36814))
* Fix LSTM + Connector bug: StateBuffer restarting states on every in_eval() call. ([36774](https://github.com/ray-project/ray/pull/36774))

🏗 **Architecture refactoring**:
* Multi-GPU Learner API

Ray Core

🎉 **New Features**:
* [Core][Streaming Generator] Cpp interfaces and implementation (35291)
* [Core][Streaming Generator] Streaming Generator. Support Core worker APIs + cython generator interface. (35324)
* [Core][Streaming Generator] Streaming Generator. E2e integration (35325)
* [Core][Streaming Generator] Support async actor and async generator interface. (35584)
* [Core][Streaming Generator] Streaming Generator. Support the basic retry/lineage reconstruction (35768)
* [Core][Streaming Generator] Allow to raise an exception to avoid check failures. (35766)
* [Core][Streaming Generator] Fix a reference leak when a stream is deleted with out of order writes. (35591)
* [Core][Streaming Generator] Fix a reference leak when pinning requests are received after refs are consumed. (35712)
* [Core][Streaming Generator] Handle out of order report when retry (36069)
* [Core][Streaming Generator] Make it compatible with wait (36071)
* [Core][Streaming Generator] Remove busy waiting (36070)
* [Core][Autoscaler v2] add test for node provider (35593)
* [Core][Autoscaler v2] add unit tests for NodeProviderConfig (35590)
* [Core][Autoscaler v2] test ray-installer (35875)
* [Core][Autoscaler v2] fix too many values to unpack (expected 2) bug (36231)
* [Core][Autoscaler v2] Add idle time information to Autoscaler endpoint. (36918)
* [Core][Autoscaler v2] Cherry picks change to Autoscaler intereface (37407)
* [Core][Autoscaler v2] Fix idle time duration when node resource is not updated periodically (37121) (37175)
* [Core][Autoscaler v2] Fix pg id serialization with hex rather than binary for cluster state reporting 37132 (37176)
* [Core][Autoscaler v2] GCS Autoscaler V2: Add instance id to ray [3/x] (35649)
* [Core][Autoscaler v2] GCS Autoscaler V2: Add node type name to ray (36714)
* [Core][Autoscaler v2] GCS Autoscaler V2: Add placement group's gang resource requests handling [4/x] (35970)
* [Core][Autoscaler v2] GCS Autoscaler V2: Handle ReportAutoscalingState (36768)
* [Core][Autoscaler v2] GCS Autoscaler V2: Interface [1/x] (35549)
* [Core][Autoscaler v2] GCS Autoscaler V2: Node states and resource requests [2/x] (35596)
* [Core][Autoscaler v2] GCS Autoscaler V2: Support Autoscaler.sdk.request_resources [5/x] (35846)
* [Core][Autoscaler v2] Ray status interface [1/x] (36894)
* [Core][Autoscaler v2] Remove usage of grpcio from Autoscaler SDK (36967)
* [Core][Autoscaler v2] Update Autoscaler proto for default enum value (36962)
* [Core][Autoscalerv2] Update Autoscaler.proto / instance_manager.proto dependency (36116)

💫 **Enhancements**:
* [Core] Make some grpcio imports lazy (35705)
* [Core] Only instantiate gcs channels on driver (36389)
* [Core] Port GcsSubscriber to Cython (35094)
* [Core] Print out warning every 1s when sched_cls_id is greater than 100 (35629)
* [Core] Remove attrs dependency (36270)
* [Core] Remove dataclasses requirement (36218)
* [Core] Remove grpcio from Ray minimal dashboard (36636)
* [Core] Remove grpcio import from usage_lib (36542)
* [Core] remove import thread (36293)
* [Core] Remove Python grpcio from check_health (36304)
* [Core] Retrieve the token from GCS server [4/n] (37003) (37294)
* [Core] Retry failed redis request (35249)
* [Core] Sending ReportWorkerFailure after the process died. (35320)
* [Core] Serialize auto-inits (36127)
* [Core] Support auto-init ray for get_runtime_context() (35903)
* [Core] Suppress harmless ObjectRefStreamEndOfStreamError when using asyncio (37062) (37200)
* [Core] Unpin grpcio and make Ray run on mac M1 out of the box (35932)
* [Core] Add a better error message for health checking network failures (36957) (37366)
* [Core] Add ClusterID to ClientCallManager [2/n] (36526)
* [Core] Add ClusterID token to GCS server [3/n] (36535)
* [Core] Add ClusterID token to GRPC server [1/n] (36517)
* [Core] Add extra metrics for workers (36973)
* [Core] Add get_worker_id() to runtime context (35967)
* [Core] Add logs for Redis leader discovery for observability. (36108)
* [Core] Add metrics for object size distribution in object store (37005) (37110)
* [Core] Add resource idle time to resource report from node. (36670)
* [Core] Check that temp_dir must be absolute path. (36431)
* [Core] Clear CPU affinity for worker processes (36816)
* [Core] Delete object spilling dead code path. (36286)
* [Core] Don't drop rpc status in favor of reply status (35530)
* [Core] Feature flag actor task logs with off by default (35921)
* [Core] Graceful handling of returning bundles when node is removed (34726)
* [Core] Graceful shutdown in TaskEventBuffer destructor (35857)
* [Core] Guarantee the ordering of put ActorTaskSpecTable and ActorTable (35683)
* [Core] Introduce fail_on_unavailable option for hard NodeAffinitySchedulingStrategy (36718)
* [Core] Make “import” ray work without grpcio (35737)
* [Core][dashboard] Add task name in task log magic token (35377)
* [Core][deprecate run_function_on_all_workers 3/n] delete run_function_on_all_workers (30895)
* [Core][devex] Move ray/util build targets to separate build files (36598)
* [Core][logging][ipython] Fix log buffering when consecutive runs within ray log dedup window (37134) (37174)
* [Core][Logging] Switch worker_setup_hook to worker_process_setup_hook (37247) (37463)
* [Core][Metrics] Use Autoscaler-emitted metrics for pending/active/failed nodes. (35884)
* [Core][state] Record file offsets instead of logging magic token to track task log (35572)
* [CI] [Minimal install] Check python version in minimal install (36887)
* [CI] second try of fixing vllm example in CI 36712
* [CI] skip vllm_example 36665
* [CI][Core] Add more visbility into state api stress test (36465)
* [CI][Doc] Add windows 3.11 wheel support in doc and CI 37297 (37302)
* [CI][py3.11] Build python wheels on mac os for 3.11 (36185)
* [CI][python3.11] windows 3.11 wheel build
* [CI][release] Add mac 3.11 wheels to release scripts (36396)
* [CI] Update state api scale test (35543)
* [Release Test] Fix dask on ray 1tb sort failure. (36905)
* [Release Test] Make the cluster name unique for cluster launcher release tests (35801)
* [Test] Deflakey gcs fault tolerance test in mac os (36471)
* [Test] Deflakey pubsub integration_test (36284)
* [Test] Change instance type to r5.8xlarge for dask_on_ray_1tb_sort (37321) (37409)
* [Test] Move generators test to large (35747)
* [Test][Core] Handled the case where memories is empty for dashboard test (35979)

🔨 **Fixes**:
* [Core] Fix GCS FD usage increase regression. (35624)
* [Core] Fix issues with worker churn in WorkerPool (36766)
* [Core] Fix proctitle for generator tasks (36928)
* [Core] Fix ray.timeline() (36676)
* [Core] Fix raylet memory leak in the wrong setup. (35647)
* [Core] Fix test_no_worker_child_process_leaks (35840)
* [Core] Fix the GCS crash when connecting to a redis cluster with TLS (36916)
* [Core] Fix the race condition where grpc requests are handled while c… (37301)
* [Core] Fix the recursion error when async actor has lots of deserialization. (35494)
* [Core] Fix the segfault from Opencensus upon shutdown (36906) (37311)
* [Core] Fix the unnecessary logs (36931) (37313)
* [Core] Add a special head node resource and use it to pin the serve controller to the head node (35929)
* [Core] Add debug log for serialized object size (35992)
* [Core] Cache schema and test (37103) (37201)
* [Core] Fix 'ray stack' on macOS (36100)
* [Core] Fix a wrong metrics setup link from the doc. (37312) (37367)
* [Core] Fix lint (35844)(36739)
* [Core] Fix literalinclude path (35660)
* [Core] Fix microbenchmark (35823)
* [Core] Fix single_client_wait_1k perf regression (35614)
* [Core] Get rid of shared_ptr for GcsNodeManager (36738)
* [Core] Remove extra step in M1 installation instructions (36029)
* [Core] Remove unnecessary AsyncGetResources in NodeManager::NodeAdded (36412)
* [Core] Unskip test_Autoscaler_shutdown_node_http_everynode (36420)
* [Core] Unskip test_get_release_wheel_url for mac (36430)

📖 **Documentation**:
* [Doc] Clarify that session can also mean a ray cluster (36422)
* [Doc] Fix doc build on M1 (35689)
* [Doc] Fix documentation failure due to typing_extensions (36732)
* [Doc] Make doc code snippet testable [3/n] (35407)
* [Doc] Make doc code snippet testable [4/n] (35506)
* [Doc] Make doc code snippet testable [5/n] (35562)
* [Doc] Make doc code snippet testable [7/n] (36960)
* [Doc] Make doc code snippet testable [8/n] (36963)
* [Doc] Some instructions on how to size the head node (36429)
* [Doc] Fix doc for runtime-env-auth (36421)
* [Doc][dashboard][state] Promote state api and dashboard usage in Core user guide. (35760)
* [Doc][python3.11] Update mac os wheels built link (36379)
* [Doc] [typo] Rename acecelerators.md to accelerators.md (36500)

Many thanks to all those who contributed to this release!

ericl, ArturNiederfahrenhorst, sihanwang41, scv119, aslonnie, bluecoconut, alanwguo, krfricke, frazierprime, vitsai, amogkam, GeneDer, jovany-wang, gjoliver, simran-2797, rkooo567, shrekris-anyscale, kevin85421, angelinalg, maxpumperla, kouroshHakha, Yard1, chaowanggg, justinvyu, fantow, Catch-Bull, cadedaniel, ckw017, hora-anyscale, rickyyx, scottsun94, XiaodongLv, SongGuyang, RocketRider, stephanie-wang, inpefess, peytondmurray, sven1977, matthewdeng, ijrsvt, MattiasDC, richardliaw, bveeramani, rynewang, woshiyyya, can-anyscale, omus, eax-anyscale, raulchen, larrylian, Deegue, Rohan138, jjyao, iycheng, akshay-anyscale, edoakes, zcin, dmatrix, bryant1410, WanNJ, architkulkarni, scottjlee, JungeAlexander, avnishn, harisankar95, pcmoritz, wuisawesome, mattip

Page 3 of 15

Releases

Has known vulnerabilities

Previous Next

Ray

Page 3 of 15

2.7.1

2.7.0

2.6.3

2.6.2

2.6.1

2.6.0

Page 3 of 15

Links

Releases