Getting Started on Astro#
While it is possible to use Cosmos on Astro with all Execution Modes, we recommend using the local
execution mode. It’s the simplest to set up and use.
Pre-requisites#
To get started, you should have:
The Astro CLI installed. You can find installation instructions here.
An Astro CLI project. You can initialize a new project with
astro dev init
.
Create a virtual environment#
Create a virtual environment in your Dockerfile
using the sample below. Be sure to replace <your-dbt-adapter>
with the actual adapter you need (i.e. dbt-redshift
, dbt-snowflake
). It’s recommended to use a virtual environment because dbt and Airflow can have conflicting dependencies.
FROM quay.io/astronomer/astro-runtime:10.0.0
# install dbt into a virtual environment
RUN python -m venv dbt_venv && source dbt_venv/bin/activate && \
pip install --no-cache-dir <your-dbt-adapter> && deactivate
Install Cosmos#
Add Cosmos to your project’s requirements.txt
.
astronomer-cosmos
Move your dbt project into the DAGs directory#
Make a new folder, dbt
, inside your local project’s dags
folder. Then, copy/paste your dbt project into the directory and create a file called my_cosmos_dag.py
in the root of your DAGs directory. Your project structure should look like this:
├── dags/
│ ├── dbt/
│ │ └── my_dbt_project/
│ │ ├── dbt_project.yml
│ │ ├── models/
│ │ │ ├── my_model.sql
│ │ │ └── my_other_model.sql
│ │ └── macros/
│ │ ├── my_macro.sql
│ │ └── my_other_macro.sql
│ └── my_cosmos_dag.py
├── Dockerfile
├── requirements.txt
└── ...
Note: your dbt projects can go anywhere on the Airflow image. By default, Cosmos looks in the /usr/local/airflow/dags/dbt
directory, but you can change this by setting the dbt_project_dir
argument when you create your DAG instance.
For example, if you wanted to put your dbt project in the /usr/local/airflow/dags/my_dbt_project
directory, you would do:
from cosmos import DbtDag, ProjectConfig
my_cosmos_dag = DbtDag(
project_config=ProjectConfig(
dbt_project_path="/usr/local/airflow/dags/my_dbt_project",
),
# ...,
)
Create a dagfile#
In your my_cosmos_dag.py
file, import the DbtDag
class from Cosmos and create a new DAG instance. Make sure to use the dbt_executable_path
argument to point to the virtual environment you created in step 1.
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.profiles import PostgresUserPasswordProfileMapping
profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="airflow_db",
profile_args={"schema": "public"},
),
)
my_cosmos_dag = DbtDag(
project_config=ProjectConfig(
"/usr/local/airflow/dags/my_dbt_project",
),
profile_config=profile_config,
execution_config=ExecutionConfig(
dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
),
# normal dag parameters
schedule_interval="@daily",
start_date=datetime(2023, 1, 1),
catchup=False,
dag_id="my_cosmos_dag",
default_args={"retries": 2},
)
Note
In some cases, especially in larger dbt projects, you might run into a DagBag import timeout
error.
This error can be resolved by increasing the value of the Airflow configuration core.dagbag_import_timeout.
Start your project#
Start your project with astro dev start
. You should see your Airflow DAG in the Airflow UI (localhost:8080
by default), where you can trigger it.