Defaults
DAG Factory allows you to define default values for DAG-level arguments and Airflow default_args. There are several ways to accomplish this:
- Define the
default_args
within each DAG definition in the DAGs YAML file; - Declare a
default
block within the toplevel of the DAGs YAML file; - Define the
default_args_config_dict
argument when instantiating theDAGFactory
class; - Create one or multiple
defaults.yml
ordefaults.yaml
and declare thedefault_args_config_path
argument in theDAGFactory
class. This approach includes support for combining multipledefaults.yml
ordefaults.yaml
files.
Although you cannot use the last two configurations together, you can use a combination of the first two configurations with either the third or the last.
Below, we detail how to use each of these approaches and also how to combine them.
Specifying default_args
directly in the DAG YAML specification
This configuration affects only the DAG where the default_args
are defined.
You can override or define specific default_args
at the individual DAG level. This strategy allows you to customize arguments for each DAG without affecting others. Not only can existing default_args
be overridden directly in a DAG configuration, but also other DAG-level arguments can be added.
YAML top-level default
This configuration affects all the DAGs defined in the YAML file where the default
block is declared.
The default
top-level block enables you to share standard settings and configurations across all DAGs in your YAML configuration, with the arguments automatically applied to each DAG defined in the file.
It is one of DAG Factory's most powerful features; using defaults allows for the dynamic generation of multiple DAGs.
Benefits of using the default
block
- Consistency: Ensures uniform configurations across all tasks and DAGs.
- Maintainability: Reduces duplication by centralizing common properties.
- Simplicity: Makes configurations easier to read and manage.
- Dynamic Generation: Use a single default block to generate more than a single DAG easily.
Example usage of the default
block
Specifying default_args
in the default
block
Using a default
block in a YAML file allows for those key-value pairs to apply to each DAG defined in that same file. One of the most common examples is using a default
block to specify default_args
for each DAG defined in that file. Every DAG defined in the file inherits these arguments. Below is an example of this.
default:
default_args:
start_date: '2024-01-01'
schedule: 0 0 * * *
catchup: false
tags:
- "data engineering"
etl:
tasks:
- task_id: "extract"
operator: airflow.operators.bash.BashOperator
bash_command: "echo extract"
- task_id: "transform"
operator: airflow.operators.bash.BashOperator
bash_command: "echo transform"
dependencies:
- extract
- task_id: "load"
operator: airflow.operators.bash.BashOperator
bash_command: "echo load"
dependencies:
- transform
Example using of default block for dynamic DAG generation
Not only can the default
block in a YAML file be used to define default_args
for one or more DAGs, you can also use it to create the skeleton of "templated" DAGs. In the example below, the default
block defines both the default_args
of a DAG, and also default Tasks. These Tasks provide a "template" for the DAGs defined in this file. Each DAG (machine_learning
, data_science
, artificial_intelligence
) is defined using the values from the default
block, and, like with default_args
, can override these values. This is a powerful way to use DAG Factory to dynamically create DAGs using a single configuration.
default:
default_args:
start_date: '2024-01-01'
schedule: 0 0 * * *
catchup: false
tags:
- dynamic
tasks:
- task_id: "extract"
operator: airflow.operators.bash.BashOperator
bash_command: "echo extract"
- task_id: "transform"
operator: airflow.operators.bash.BashOperator
bash_command: "echo transform"
dependencies:
- extract
- task_id: "load"
operator: airflow.operators.bash.BashOperator
dependencies:
- transform
machine_learning:
tasks:
- task_id: "load"
bash_command: "echo machine_larning"
data_science:
tasks:
- task_id: "load"
bash_command: "echo data_science"
artificial_intelligence:
tasks:
- task_id: "load"
bash_command: "echo artificial_intelligence"
Specifying default
arguments via a Python dictionary
This configuration affects DAGs created using the DagFactory
class with the default_args_config_dict
argument.
It allows you to define DAG-level arguments, including the default_args
, using Python dictionaries.
Example of using a Python-defined default configuration
default:
default_args:
start_date: '2024-01-01'
schedule: 0 0 * * *
catchup: false
tags:
- dynamic
tasks:
- task_id: "extract"
operator: airflow.operators.bash.BashOperator
bash_command: "echo extract"
- task_id: "transform"
operator: airflow.operators.bash.BashOperator
bash_command: "echo transform"
dependencies:
- extract
- task_id: "load"
operator: airflow.operators.bash.BashOperator
dependencies:
- transform
machine_learning:
tasks:
- task_id: "load"
bash_command: "echo machine_larning"
data_science:
tasks:
- task_id: "load"
bash_command: "echo data_science"
artificial_intelligence:
tasks:
- task_id: "load"
bash_command: "echo artificial_intelligence"
In this example, users create a YAML DAG by using the DagFactory
class and declare default arguments in the form of a Python dictionary by setting the default_args_config_dict
parameter in the DAGFactory
class. This feature mirrors the functionality of
manually specifying a default_args_config_path
in the DAGFactory
class, described in the next section.
"bash_command": "echo transform",
"dependencies": ["extract"],
},
{
"task_id": "load",
"operator": "airflow.operators.bash.BashOperator",
"bash_command": "echo load",
Declaring default values using the defaults.yml
or defaults.yaml
file
This configuration affects DAGs created using the DagFactory
class without the default_args_config_dict
argument.
If a defaults.yml
file is present in the same directory as the YAML file representing the DAG, DagFactory will use it to build the DAG.
If the defaults.yml
is added to a separate directory, users can share it using DagFactory
's argument default_args_config_path
.
Example of declaring the default_args
using defaults.yml
Starting DAG Factory 0.22.0, you can also keep the default_args
in the defaults.yml
file. The configuration
from defaults.yml
will be applied to all DAG Factory-generated DAGs. Be careful, DagFactory will apply these to all
generated DAGs.
Example usage of DAG-level configurations in defaults.yml
In Airflow, not all DAG-level arguments are supported under default_args
, because they are DAG-specific and not
used by any operator classes, such as schedule
and catchup
. To set default values for those arguments, they need
to be added at the root level of defaults.yml
as follows:
```yaml schedule: 0 1 * * * # set DAG-specific arguments at the root level catchup: False
default_args: start_date: '2024-12-31' ... ```
Combining multiple defaults.yml
files
It is possible to combine and merge the content of multiple defaults.yml
files.
To accomplish this, you should declare in the default_args_config_path
a folder that is a parent folder of a DAG-defined YAML
file. In this case, DAG Factory will merge all the defaults.yml
configurations, following the directories' hierarchy, and give precedence to the arguments declared in the defaults.yml
file closest to the DAG YAML file.
As an example, let's say there are DagFactory DAGs defined inside the a/b/c/some_dags.yml
file following this directory tree:
sample_project
└── a
├── b
│ ├── c
│ │ ├── defaults.yml
│ │ └── some_dags.yml
│ └── defaults.yml
└── defaults.yml
Assuming you instantiate the DAG by using:
``python
load_yaml_dags(
"a/b/c/some_dags.yml",
default_args_config_path=default_args_config_path="a"
)
The DAG will be using the default configuration defined in all the following files:
a/b/c/some_dags.yml
a/b/c/defaults.yml
a/b/defaults.yml
a/defaults.yml
Following this precedence order. Illustrating, if the DAG owner
is declared both in a/b/c/defaults.yml
and in a/defaults.yml
, the one that takes precedence is the a/b/c/defaults.yml
, since it is closer to the DAG YAML file.
Combining multiple methods of defining default values
Given the various ways to specify top-level DAG arguments, including default_args
, the following precedence order is applied if multiple places define the same argument:
- In the DAG configuration
- In the
default
block within the workflow's YAML file - The arguments defined in
default_args_config_dict
- If (3) is not declared, the
defaults.yml
hierarchy.
Note
The defaults.yml
is preferred over defaults.yaml
.