Kubernetes Execution Mode#
The following tutorial illustrates how to run the Cosmos dbt Kubernetes Operator using a local Kubernetes (K8s) cluster. It assumes the following:
Postgres is run in the Kubernetes (K8s) cluster as a container
Airflow is run locally, and it triggers a K8s Pod which runs dbt
Requirements#
To test the DbtKubernetesOperators locally, we encourage you to install the following:
Local Airflow (either standalone or using Astro CLI)
Kind to run K8s locally
Helm to install Postgres in K8s
Docker to create the dbt container image, which will allow Airflow to create a K8s pod which will run dbt
At the moment, the user is expected to add to the Docker image both:
The dbt project files
The dbt Profile, which contains the information for dbt to access the database while parsing the project from Apache Airflow nodes
Handle secrets
Additional KubernetesPodOperator parameters can be added to the operator_args parameter of the DbtKubernetesOperator.
For instance,
run_models = DbtTaskGroup(
project_config=ProjectConfig(),
profile_config=ProfileConfig(
profile_name="postgres_profile",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="postgres_default",
profile_args={
"schema": "public",
},
),
),
render_config=RenderConfig(dbt_project_path=AIRFLOW_PROJECT_DIR),
execution_config=ExecutionConfig(execution_mode=ExecutionMode.KUBERNETES, dbt_project_path=K8S_PROJECT_DIR),
operator_args={
"image": DBT_IMAGE,
"get_logs": True,
"is_delete_operator_pod": False,
"secrets": [postgres_password_secret, postgres_host_secret],
},
)
Step-by-step instructions#
Using installed Kind, you can setup a local kubernetes cluster
kind create cluster
Deploy a Postgres pod to Kind using Helm
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm install postgres bitnami/postgresql
Retrieve the Postgres password and set it as an environment variable.
export POSTGRES_PASSWORD=$(kubectl get secret --namespace default postgres-postgresql -o jsonpath="{.data.postgres-password}" | base64 -d)
Check that the environment variable was set and that it is not empty
echo $POSTGRES_PASSWORD
Expose the Postgres to the host running Docker/Kind.
kubectl port-forward --namespace default postgres-postgresql-0 5432:5432
Check that you’re able to connect to the exposed pod.
PGPASSWORD="$POSTGRES_PASSWORD" psql --host 127.0.0.1 -U postgres -d postgres -p 5432
postgres=# \dt
\q
Create a K8s secret which contains the credentials to access Postgres.
kubectl create secret generic postgres-secrets --from-literal=host=postgres-postgresql.default.svc.cluster.local --from-literal=password=$POSTGRES_PASSWORD
Clone the example repo that contains the Airflow DAG and dbt project files.
git clone https://github.com/astronomer/cosmos-example.git
cd cosmos-example/
Create a Docker image containing the dbt project files and dbt profile by using the Dockerfile, which will be run in K8s.
docker build -t dbt-jaffle-shop:1.0.0 -f Dockerfile.postgres_profile_docker_k8s .
Note
If running on M1, you may need to set the following environment variables to run the Docker build command in case it fails.
export DOCKER_BUILDKIT=0
export COMPOSE_DOCKER_CLI_BUILD=0
export DOCKER_DEFAULT_PLATFORM=linux/amd64
Take a look at the Dockerfile to understand its purpose so that you can use it as a reference in your project.
The dbt profile file is added to the image
The dags directory containing the dbt project jaffle_shop is added to the image
The dbt_project.yml is replaced with postgres_profile_dbt_project.yml which contains the profile key pointing to postgres_profile as profile creation is not handled at the moment for K8s operators like in local mode.
Make the build image available in the Kind K8s cluster.
kind load docker-image dbt-jaffle-shop:1.0.0
Create a Python virtual environment and install the latest version of Astronomer Cosmos, which contains the K8s Operator.
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install "astronomer-cosmos[dbt-postgres]" apache-airflow-providers-cncf-kubernetes
Make the jaffle_shop_kubernetes.py file at your Airflow DAG home:
cp -r dags $AIRFLOW_HOME/
Run Airflow
airflow standalone
Note
You may need to run Airflow standalone with sudo if your Airflow user is unable to access the Docker socket URL or pull images in the Kind cluster.
Log in to Airflow through a web browser http://localhost:8080/, using the user airflow and the password described in the standalone_admin_password.txt file.
Enable and trigger a run of the jaffle_shop_k8s DAG. You will be able to see the following successful DAG run.