Hook

class RayHook(*args: Any, **kwargs: Any)

Bases: KubernetesHook

Airflow Hook for interacting with Ray Job Submission Client and Kubernetes Cluster.

This hook provides methods to interact with Ray clusters, submit and manage Ray jobs, and handle Kubernetes resources related to Ray deployments.

Parameters:

conn_id – The connection ID to use when fetching connection info.

DEFAULT_NAMESPACE = 'default'
conn_name_attr = 'ray_conn_id'
conn_type = 'ray'
create_daemon_set(name: str, body: dict[str, Any]) V1DaemonSet | None

Create a DaemonSet resource in Kubernetes.

Parameters:
  • name – The name of the DaemonSet.

  • body – The body of the DaemonSet for the create action.

Returns:

The created DaemonSet resource if successful, None otherwise.

default_conn_name = 'ray_default'
delete_daemon_set(name: str) V1Status | None

Delete a DaemonSet resource in Kubernetes.

Parameters:

name – The name of the DaemonSet.

Returns:

The status of the delete operation if successful, None otherwise.

delete_ray_cluster(ray_cluster_yaml: str, gpu_device_plugin_yaml: str) None

Execute the operator to delete the Ray cluster.

Parameters:
  • ray_cluster_yaml – Path to the YAML file defining the Ray cluster.

  • gpu_device_plugin_yaml – Path or URL to the GPU device plugin YAML. Defaults to NVIDIA’s plugin

Raises:

AirflowException – If there’s an error deleting the Ray cluster.

delete_ray_job(dashboard_url: str | None, job_id: str) Any

Deletes a job from the Ray cluster.

Parameters:
  • dashboard_url – The URL of the Ray dashboard.

  • job_id – The ID of the job to delete.

Returns:

The result of the delete operation.

classmethod get_connection_form_widgets() dict[str, Any]

Return connection widgets to add to connection form.

Returns:

A dictionary of connection form widgets.

get_daemon_set(name: str) V1DaemonSet | None

Retrieve a DaemonSet resource in Kubernetes.

Parameters:

name – The name of the DaemonSet.

Returns:

The DaemonSet resource if found, None otherwise.

get_ray_job_logs(dashboard_url: str | None, job_id: str) str

Retrieves the logs of a submitted job.

Parameters:
  • dashboard_url – The URL of the Ray dashboard.

  • job_id – The ID of the job.

Returns:

Logs of the job.

get_ray_job_status(dashboard_url: str | None, job_id: str) JobStatus

Gets the status of a submitted job.

Parameters:
  • dashboard_url – The URL of the Ray dashboard.

  • job_id – The ID of the job.

Returns:

Status of the job.

async get_ray_tail_logs(dashboard_url: str | None, job_id: str) AsyncIterator[str]

Tails the logs of a submitted job asynchronously.

Parameters:
  • dashboard_url – The URL of the Ray dashboard.

  • job_id – The ID of the job.

Returns:

An async iterator of log lines.

classmethod get_ui_field_behaviour() dict[str, Any]

Return custom field behaviour for the connection form.

Returns:

A dictionary specifying custom field behaviour.

hook_name = 'Ray'
install_kuberay_operator(version: str = '1.0.0', env: dict[str, str] | None = None) tuple[str | None, str | None]

Install KubeRay operator using Helm.

Parameters:
  • version – The version of KubeRay operator to install.

  • env – Optional dictionary of environment variables.

Returns:

A tuple containing the installation output and error (if any).

load_yaml_content(path_or_link: str) Any

Load YAML content from a file path or URL.

Parameters:

path_or_link – The file path or URL of the YAML content.

Returns:

The loaded YAML content.

ray_client(dashboard_url: str | None = None) JobSubmissionClient

Establishes a connection to the Ray Job Submission Client.

Parameters:

dashboard_url – The URL of the Ray dashboard.

Returns:

An instance of JobSubmissionClient.

Raises:

AirflowException – If the connection fails.

setup_ray_cluster(context: airflow.utils.context.Context, ray_cluster_yaml: str, kuberay_version: str, gpu_device_plugin_yaml: str, update_if_exists: bool) None

Execute the operator to set up the Ray cluster.

Parameters:
  • context – The Airflow task context.

  • ray_cluster_yaml – Path to the YAML file defining the Ray cluster.

  • kuberay_version – Version of KubeRay to install.

  • gpu_device_plugin_yaml – Path or URL to the GPU device plugin YAML. Defaults to NVIDIA’s plugin

  • update_if_exists – Whether to update the cluster if it already exists.

Raises:

AirflowException – If there’s an error setting up the Ray cluster.

submit_ray_job(dashboard_url: str, entrypoint: str, runtime_env: dict[str, Any] | None = None, **job_config: Any) str

Submits a job to the Ray cluster.

Parameters:
  • dashboard_url – The URL of the Ray dashboard.

  • entrypoint – The command or script to run.

  • runtime_env – The runtime environment for the job.

  • job_config – Additional job configuration parameters.

Returns:

Job ID of the submitted job.

uninstall_kuberay_operator() tuple[str | None, str | None]

Uninstall KubeRay operator using Helm.

Returns:

A tuple containing the uninstallation output and error (if any).