Managing seeds#
Note
SeedRenderingBehavior is available for cosmos >= 1.15.0.
By default, Cosmos renders every dbt seed and runs dbt seed on each DAG run. In many production pipelines,
seeds change rarely, so re-running them on every execution is unnecessary. You can control this behavior using the
seed_rendering_behavior field in the RenderConfig object. This is how it works:
always (default): Cosmos renders the seed and runs
dbt seedon every execution. This preserves the original Cosmos behavior.when_seed_changes: Cosmos renders the seed, but only runs
dbt seedwhen the seed’s CSV content has changed since the last successful run. When the content is unchanged, the task succeeds without runningdbt seed.render_only: Cosmos renders the seed as a no-op
EmptyOperatorso it stays visible in the DAG topology and lineage, butdbt seedis never run. This is useful when seeds are managed outside of Cosmos.none: Cosmos does not render the seed in the DAG/TaskGroup at all.
Example:
from cosmos import DbtTaskGroup, RenderConfig
from cosmos.constants import SeedRenderingBehavior
jaffle_shop = DbtTaskGroup(
render_config=RenderConfig(
seed_rendering_behavior=SeedRenderingBehavior.WHEN_SEED_CHANGES,
)
)
Detecting seed changes#
When seed_rendering_behavior is set to when_seed_changes, Cosmos computes the seed’s content checksum and
compares it against the last checksum persisted after a successful run (best-effort: if Cosmos cannot compute or read
either checksum, it falls back to running the seed):
The checksum is the SHA256 of the seed’s CSV content, computed from the seed file when available. Computing it from the file (rather than reading dbt’s per-node manifest checksum) keeps change detection consistent whether the project is loaded via
LoadMode.DBT_MANIFESTorLoadMode.DBT_LS.The last-seen checksum is persisted as an Airflow Variable, scoped per
DbtDag/DbtTaskGroupand seed, so the same seed rendered in different DAGs tracks its state independently and one DAG never causes another to skip a seed.Passing
full_refresh=Truealways runs the seed, bypassing change detection. This path does not update the persisted checksum, so the next non-full-refresh run may run again to record/update it.On a run where the seed is skipped because it is unchanged, the task does not emit its Airflow dataset, since no data was loaded.
Limitations#
when_seed_changesis only supported forExecutionMode.LOCAL,ExecutionMode.VIRTUALENVandExecutionMode.AIRFLOW_ASYNC, which run dbt directly on the Airflow worker with access to the seed files. Configuring it with any other execution mode raises aCosmosValueErrorat DAG-build time.when_seed_changesis incompatible withTestBehavior.BUILD(underBUILDseeds run viadbt buildand cannot be selectively skipped); this combination also raises aCosmosValueError.Under
ExecutionMode.WATCHER, a singledbt buildruns the whole selection, so seeds run regardless of this setting. Onlyalwaysis meaningful for the watcher execution mode;noneandrender_onlychange only what is rendered, not whether the underlyingdbt buildloads the seed.
Example:
example_seed_rendering = DbtDag(
project_config=ProjectConfig(
DBT_ROOT_PATH / "jaffle_shop",
),
render_config=RenderConfig(
# Render every seed, but only run `dbt seed` when a seed's CSV content has changed
# since the last successful run.
seed_rendering_behavior=SeedRenderingBehavior.WHEN_SEED_CHANGES,
),
profile_config=profile_config,
operator_args={"install_deps": True},
schedule="@daily",
start_date=datetime(2023, 1, 1),
catchup=False,
dag_id="example_seed_rendering",
default_args={"retries": 0},
)