Getting Started on Astro#

While it is possible to use Cosmos on Astro with all Execution Modes, we recommend using the local execution mode. It’s the simplest to set up and use.

If you’d like to see a fully functional project to run in Astro (CLI or Cloud), check out cosmos-demo.

Below you can find a step-by-step guide to run your own dbt project within Astro.

Pre-requisites#

To get started, you should have:

  • The Astro CLI installed. You can find installation instructions here.

  • An Astro CLI project. You can initialize a new project with astro dev init.

  • A dbt project. The jaffle shop example is a good example.

Create a virtual environment#

Create a virtual environment in your Dockerfile using the sample below. Be sure to replace <your-dbt-adapter> with the actual adapter you need (i.e. dbt-redshift, dbt-snowflake). It’s recommended to use a virtual environment because dbt and Airflow can have conflicting dependencies.

FROM quay.io/astronomer/astro-runtime:11.3.0

# install dbt into a virtual environment
RUN python -m venv dbt_venv && source dbt_venv/bin/activate && \
    pip install --no-cache-dir <your-dbt-adapter> && deactivate

An example of dbt adapter is dbt-postgres.

Install Cosmos#

Add Cosmos to your project’s requirements.txt.

astronomer-cosmos

Move your dbt project into the DAGs directory#

Make a new folder, dbt, inside your local project’s dags folder. Then, copy/paste your dbt project into the directory and create a file called my_cosmos_dag.py in the root of your DAGs directory. Your project structure should look like this:

├── dags/
│   ├── dbt/
│   │   └── my_dbt_project/
│   │       ├── dbt_project.yml
│   │       ├── models/
│   │       │   ├── my_model.sql
│   │       │   └── my_other_model.sql
│   │       └── macros/
│   │           ├── my_macro.sql
│   │           └── my_other_macro.sql
│   └── my_cosmos_dag.py
├── Dockerfile
├── requirements.txt
└── ...

Note: your dbt projects can go anywhere on the Airflow image. By default, Cosmos looks in the /usr/local/airflow/dags/dbt directory, but you can change this by setting the dbt_project_dir argument when you create your DAG instance.

For example, if you wanted to put your dbt project in the /usr/local/airflow/dags/my_dbt_project directory, you would do:

from cosmos import DbtDag, ProjectConfig

my_cosmos_dag = DbtDag(
    project_config=ProjectConfig(
        dbt_project_path="/usr/local/airflow/dags/my_dbt_project",
    ),
    # ...,
)

Create a dagfile#

In your my_cosmos_dag.py file, import the DbtDag class from Cosmos and create a new DAG instance. Make sure to use the dbt_executable_path argument to point to the virtual environment you created in step 1.

from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping

import os
from datetime import datetime

airflow_home = os.environ["AIRFLOW_HOME"]

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=PostgresUserPasswordProfileMapping(
        conn_id="airflow_db",
        profile_args={"schema": "public"},
    ),
)

my_cosmos_dag = DbtDag(
    project_config=ProjectConfig(
        f"{airflow_home}/dags/my_dbt_project",
    ),
    profile_config=profile_config,
    execution_config=ExecutionConfig(
        dbt_executable_path=f"{airflow_home}/dbt_venv/bin/dbt",
    ),
    # normal dag parameters
    schedule_interval="@daily",
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="my_cosmos_dag",
    default_args={"retries": 2},
)

Note

In some cases, especially in larger dbt projects, you might run into a DagBag import timeout error. This error can be resolved by increasing the value of the Airflow configuration core.dagbag_import_timeout.

Start your project#

Start your project with astro dev start. You should see your Airflow DAG in the Airflow UI (localhost:8080 by default), where you can trigger it.

Cosmos dbt DAG