Skip to content

DAG Factory: Quick Start Guide With Airflow

DAG Factory is a Python library Apache Airflow® that simplifies DAG creation using declarative YAML configuration files instead of Python.

Prerequisites

The minimum requirements for dag-factory are:

Step 1: Create a Python Virtual Environment

Create and activate a virtual environment:

python3 -m venv dagfactory_env
source dagfactory_env/bin/activate

Step 2: Install Apache Airflow

Install Apache Airflow®:

  1. Create a directory for your project and navigate to it:

    mkdir dag-factory-quick-start && cd dag-factory-quick-start
    
  2. Set the AIRFLOW_HOME environment variable:

    export AIRFLOW_HOME=$(pwd)
    export AIRFLOW__CORE__LOAD_EXAMPLES=False
    
  3. Install Apache Airflow:

    pip install apache-airflow
    

Step 3: Install DAG Factory

Install the DAG Factory library in your virtual environment:

pip install dag-factory

Step 4: Set Up the DAGS Folder

Create a DAGs folder inside the $AIRFLOW_HOME directory, which is where your DAGs will be stored:

mkdir dags

Step 5: Define a DAG in YAML

DAG Factory uses YAML files to define DAG configurations. Create a file named example_dag_factory.yml in the $AIRFLOW_HOME/dags folder with the following content:

example_dag_factory.yml
default:
  default_args:
    catchup: false,
    start_date: 2024-11-11

basic_example_dag:
  default_args:
    owner: "custom_owner"
  description: "this is an example dag"
  schedule_interval: "0 3 * * *"
  render_template_as_native_obj: True
  tasks:
    task_1:
      operator: airflow.operators.bash_operator.BashOperator
      bash_command: "echo 1"
    task_2:
      operator: airflow.operators.bash_operator.BashOperator
      bash_command: "echo 2"
      dependencies: [task_1]
    task_3:
      operator: airflow.operators.bash_operator.BashOperator
      bash_command: "echo 2"
      dependencies: [task_1]

Step 6: Generate the DAG from YAML

Create a Python script named example_dag_factory.py in the $AIRFLOW_HOME/dags folder. This script will generate the DAG from the YAML configuration

example_dag_factory.py
import os
from pathlib import Path

# The following import is here so Airflow parses this file
# from airflow import DAG
import dagfactory

DEFAULT_CONFIG_ROOT_DIR = "/usr/local/airflow/dags/"

CONFIG_ROOT_DIR = Path(os.getenv("CONFIG_ROOT_DIR", DEFAULT_CONFIG_ROOT_DIR))

config_file = str(CONFIG_ROOT_DIR / "example_dag_factory.yml")

example_dag_factory = dagfactory.DagFactory(config_file)

# Creating task dependencies
example_dag_factory.clean_dags(globals())
example_dag_factory.generate_dags(globals())

Step 7: Start Airflow

To start the Airflow environment with your DAG Factory setup, run the following command:

airflow standalone

This will take a few minutes to set up. Once completed, you can access the Airflow UI and the generated DAG at http://localhost:8080 🚀.

View Your Generated DAG

Once Airflow is up and running, you can login with the username admin and the password in $AIRFLOW_HOME/standalone_admin_password.txt. You should be able to see your generated DAG in the Airflow UI.

Generated DAG

Airflow DAG

Graph View

Airflow Home

Checkout examples for generating more advanced DAGs.