Code Samples¶

Index¶

Example 1: Ray jobs on an existing cluster
Ray Cluster Sample Spec (YAML)
Example 2: Using @ray.task for job lifecycle
Example 3: Using SubmitRayJob operator for job lifecycle
Example 4: SetupRayCluster, SubmitRayJob & DeleteRayCluster

Example 1: Ray jobs on an existing cluster¶

If you already have a Ray cluster set up, you can use the SubmitRayJob operator or ray.task() decorator to submit jobs directly.

In the example below (ray_taskflow_example_existing_cluster.py), the @ray.task decorator is used to define a task that will be executed on the Ray cluster:

Important

Set the Ray Dashboard URL connection parameter or RAY_ADDRESS on your airflow worker to connect to your cluster

from datetime import datetime
from pathlib import Path

from airflow.decorators import dag, task

from ray_provider.decorators import ray

CONN_ID = "ray_conn"
FOLDER_PATH = Path(__file__).parent / "ray_scripts"
RAY_TASK_CONFIG = {
    "conn_id": CONN_ID,
    "runtime_env": {"working_dir": str(FOLDER_PATH), "pip": ["numpy"]},
    "num_cpus": 1,
    "num_gpus": 0,
    "memory": 0,
    "poll_interval": 5,
}


@dag(
    dag_id="Ray_Taskflow_Example_Existing_Cluster",
    start_date=datetime(2023, 1, 1),
    schedule=None,
    catchup=False,
    tags=["ray", "example"],
)
def ray_taskflow_dag():

    @task
    def generate_data():
        return [1, 2, 3]

    @ray.task(config=RAY_TASK_CONFIG)
    def process_data_with_ray(data):
        import numpy as np
        import ray

        @ray.remote
        def square(x):
            return x**2

        ray.init()
        data = np.array(data)
        futures = [square.remote(x) for x in data]
        results = ray.get(futures)
        mean = np.mean(results)
        print(f"Mean of this population is {mean}")
        return mean

    data = generate_data()
    process_data_with_ray(data)


ray_example_dag = ray_taskflow_dag()

Ray Cluster Sample Spec (YAML)¶

Important

spec.headGroupSpec.serviceType must be a ‘LoadBalancer’ to spin a service that exposes your dashboard externally

Save this file in a location accessible to your Airflow installation, and reference it in your DAG code.

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: airflow-raycluster
spec:
  rayVersion: "2.10.0"
  enableInTreeAutoscaling: true
  headGroupSpec:
    serviceType: LoadBalancer
    rayStartParams:
      dashboard-host: "0.0.0.0"
      block: "true"
    template:
      metadata:
        labels:
          ray-node-type: head
      spec:
        imagePullSecrets:
          - name: my-registry-secret
        containers:
        - name: ray-head
          image: rayproject/ray:2.20.0-aarch64
          resources:
            limits:
              cpu: 1
              memory: 3Gi
            requests:
              cpu: 1
              memory: 3Gi
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh","-c","ray stop"]
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          - containerPort: 8080
            name: metrics
  workerGroupSpecs:
  - groupName: small-group
    replicas: 1
    minReplicas: 1
    maxReplicas: 2
    rayStartParams:
      block: "true"
    template:
      metadata:
      spec:
        imagePullSecrets:
          - name: my-registry-secret
        containers:
        - name: machine-learning
          image: rayproject/ray:2.20.0-aarch64
          resources:
            limits:
              cpu: 1
              memory: 1Gi
            requests:
              cpu: 1
              memory: 1Gi

Example 2: Using @ray.task for job lifecycle¶

The below example showcases how to use the @ray.task decorator to manage the full lifecycle of a Ray cluster: setup, job execution, and teardown.

This approach is ideal for jobs that require a dedicated, short-lived cluster, optimizing resource usage by cleaning up after task completion.

import os
from datetime import datetime
from pathlib import Path

from airflow.decorators import dag, task

from ray_provider.decorators import ray

CONN_ID = "ray_conn"
RAY_SPEC_FILENAME = os.getenv("RAY_SPEC_FILENAME", "ray.yaml")
RAY_SPEC = Path(__file__).parent / "scripts" / RAY_SPEC_FILENAME

FOLDER_PATH = Path(__file__).parent / "ray_scripts"
RAY_TASK_CONFIG = {
    "conn_id": CONN_ID,
    "runtime_env": {"working_dir": str(FOLDER_PATH), "pip": ["numpy"]},
    "num_cpus": 1,
    "num_gpus": 0,
    "memory": 0,
    "poll_interval": 5,
    "ray_cluster_yaml": str(RAY_SPEC),
    "xcom_task_key": "dashboard",
}


@dag(
    dag_id="Ray_Taskflow_Example",
    start_date=datetime(2023, 1, 1),
    schedule=None,
    catchup=False,
    tags=["ray", "example"],
)
def ray_taskflow_dag():

    @task
    def generate_data():
        return [1, 2, 3]

    @ray.task(config=RAY_TASK_CONFIG)
    def process_data_with_ray(data):
        import numpy as np
        import ray

        @ray.remote
        def square(x):
            return x**2

        ray.init()
        data = np.array(data)
        futures = [square.remote(x) for x in data]
        results = ray.get(futures)
        mean = np.mean(results)
        print(f"Mean of this population is {mean}")
        return mean

    data = generate_data()
    process_data_with_ray(data)


ray_example_dag = ray_taskflow_dag()

Example 3: Using SubmitRayJob operator for job lifecycle¶

This example demonstrates how to use the SubmitRayJob operator to manage the full lifecycle of a Ray cluster and job execution.

This operator provides a more declarative way to define your Ray job within an Airflow DAG.

from datetime import datetime
from pathlib import Path

from airflow import DAG

from ray_provider.operators import SubmitRayJob

CONN_ID = "ray_conn"
RAY_SPEC = Path(__file__).parent / "scripts/ray.yaml"
FOLDER_PATH = Path(__file__).parent / "ray_scripts"
RAY_RUNTIME_ENV = {"working_dir": str(FOLDER_PATH)}

dag = DAG(
    "Ray_Single_Operator",
    start_date=datetime(2023, 1, 1),
    schedule=None,
    catchup=False,
    tags=["ray", "example"],
)

submit_ray_job = SubmitRayJob(
    task_id="SubmitRayJob",
    conn_id=CONN_ID,
    entrypoint="python script.py",
    runtime_env=RAY_RUNTIME_ENV,
    num_cpus=1,
    num_gpus=0,
    memory=0,
    resources={},
    xcom_task_key="SubmitRayJob.dashboard",
    ray_cluster_yaml=str(RAY_SPEC),
    fetch_logs=True,
    wait_for_completion=True,
    job_timeout_seconds=600,
    poll_interval=5,
    dag=dag,
)


# Create ray cluster and submit ray job
submit_ray_job

Example 4: SetupRayCluster, SubmitRayJob & DeleteRayCluster¶

This example shows how to use separate operators for cluster setup, job submission, and teardown, providing more granular control over the process.

This approach allows for more complex workflows involving Ray clusters.

Key Points:

Uses SetupRayCluster, SubmitRayJob, and DeleteRayCluster operators separately.
Allows for multiple jobs to be submitted to the same cluster before deletion.
Demonstrates how to pass cluster information between tasks using XCom.

This method is ideal for scenarios where you need fine-grained control over the cluster lifecycle, such as running multiple jobs on the same cluster or keeping the cluster alive for a certain period.

Important

The SubmitRayJob operator uses the xcom_task_key parameter “SetupRayCluster.dashboard” to retrieve the Ray dashboard URL. This URL, stored as an XCom variable by the SetupRayCluster task, is necessary for job submission.

from datetime import datetime
from pathlib import Path

from airflow import DAG

from ray_provider.operators import DeleteRayCluster, SetupRayCluster, SubmitRayJob

CONN_ID = "ray_conn"
RAY_SPEC = Path(__file__).parent / "scripts/ray.yaml"
FOLDER_PATH = Path(__file__).parent / "ray_scripts"

with DAG(
    "Setup_Teardown",
    start_date=datetime(2023, 1, 1),
    schedule=None,
    catchup=False,
    tags=["ray", "example"],
):

    setup_cluster = SetupRayCluster(
        task_id="SetupRayCluster", conn_id=CONN_ID, ray_cluster_yaml=str(RAY_SPEC), update_if_exists=False
    )

    submit_ray_job = SubmitRayJob(
        task_id="SubmitRayJob",
        conn_id=CONN_ID,
        entrypoint="python script.py",
        runtime_env={"working_dir": str(FOLDER_PATH)},
        num_cpus=1,
        num_gpus=0,
        memory=0,
        resources={},
        fetch_logs=True,
        wait_for_completion=True,
        job_timeout_seconds=600,
        xcom_task_key="SetupRayCluster.dashboard",
        poll_interval=5,
    )

    delete_cluster = DeleteRayCluster(task_id="DeleteRayCluster", conn_id=CONN_ID, ray_cluster_yaml=str(RAY_SPEC))

    # Create ray cluster and submit ray job
    setup_cluster.as_setup() >> submit_ray_job >> delete_cluster.as_teardown()
    setup_cluster >> delete_cluster