Operators¶
- class DeleteRayCluster(*args: Any, **kwargs: Any)¶
Bases:
BaseOperator
Operator to delete a Ray cluster from Kubernetes.
- Parameters:
conn_id – The connection ID for the Ray cluster.
ray_cluster_yaml – Path to the YAML file defining the Ray cluster.
gpu_device_plugin_yaml – URL or path to the GPU device plugin YAML. Example value: https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml
- execute(context: airflow.utils.context.Context) None ¶
Execute the deletion of the Ray cluster.
- Parameters:
context – The context in which the operator is being executed.
- property hook: airflow.providers.cncf.kubernetes.utils.pod_manager.PodOperatorHookProtocol¶
Lazily initialize and return the RayHook.
- class SetupRayCluster(*args: Any, **kwargs: Any)¶
Bases:
BaseOperator
Operator to set up a Ray cluster on Kubernetes.
- Parameters:
conn_id – The connection ID for the Ray cluster.
ray_cluster_yaml – Path to the YAML file defining the Ray cluster.
kuberay_version – Version of KubeRay to install. Defaults to “1.0.0”.
gpu_device_plugin_yaml – URL or path to the GPU device plugin YAML. Example value: https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml.
update_if_exists – Whether to update the cluster if it already exists. Defaults to False.
- execute(context: airflow.utils.context.Context) None ¶
Execute the setup of the Ray cluster.
- Parameters:
context – The context in which the operator is being executed.
- class SubmitRayJob(*args: Any, **kwargs: Any)¶
Bases:
BaseOperator
Operator to submit and monitor a Ray job.
This operator handles the submission of a Ray job to a Ray cluster and monitors its status until completion. It supports deferring execution and resuming based on job status changes, making it suitable for long-running jobs.
- Parameters:
conn_id – The connection ID for the Ray cluster.
entrypoint – The command or script to execute as the Ray job.
runtime_env – The runtime environment configuration for the Ray job.
num_cpus – Number of CPUs required for the job. Defaults to 0.
num_gpus – Number of GPUs required for the job. Defaults to 0.
memory – Amount of memory required for the job in bytes. Defaults to 0.
resources – Additional custom resources required for the job. Defaults to None.
ray_cluster_yaml – Path to the Ray cluster YAML configuration file. If provided, the operator will set up and tear down the cluster.
kuberay_version – Version of KubeRay to use when setting up the Ray cluster. Defaults to “1.0.0”.
update_if_exists – Whether to update the Ray cluster if it already exists. Defaults to True.
gpu_device_plugin_yaml – URL or path to the GPU device plugin YAML file. Example value: https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml
fetch_logs – Whether to fetch logs from the Ray job. Defaults to True.
wait_for_completion – Whether to wait for the job to complete before marking the task as finished. Defaults to True.
job_timeout_seconds – Maximum time to wait for job completion in seconds. Defaults to 600 seconds. Set to 0 if you want the job to run indefinitely without timeouts.
poll_interval – Interval between job status checks in seconds. Defaults to 60 seconds.
xcom_task_key – XCom key to retrieve the dashboard URL. Defaults to None.
- execute(context: airflow.utils.context.Context) str ¶
Execute the Ray job submission and monitoring.
This method submits the Ray job to the cluster and, if configured to wait for completion, monitors the job status until it reaches a terminal state or times out.
- Parameters:
context – The context in which the task is being executed.
- Returns:
The job ID of the submitted Ray job.
- execute_complete(context: airflow.utils.context.Context, event: dict[str, Any]) None ¶
Handle the completion of a deferred Ray job execution.
This method is called when the deferred job execution completes. It processes the final job status and raises exceptions for failed or cancelled jobs. It finally deletes the cluster when the ray spec is provided
- Parameters:
context – The context in which the task is being executed.
event – The event containing the job execution result.
- Raises:
RayAirflowException – If the job execution fails, is cancelled, or reaches an unexpected state.
- property hook: airflow.providers.cncf.kubernetes.utils.pod_manager.PodOperatorHookProtocol¶
Lazily initialize and return the RayHook.
- on_kill() None ¶
Delete the Ray job if the task is killed.
This method is called when the task is externally killed. It ensures that the associated Ray job is also terminated to avoid orphaned jobs.
- template_fields = ('conn_id', 'entrypoint', 'runtime_env', 'num_cpus', 'num_gpus', 'memory', 'xcom_task_key', 'ray_cluster_yaml', 'job_timeout_seconds')¶