Managing seeds#

Note

SeedRenderingBehavior is available for cosmos >= 1.15.0.

By default, Cosmos renders every dbt seed and runs dbt seed on each DAG run. In many production pipelines, seeds change rarely, so re-running them on every execution is unnecessary. You can control this behavior using the seed_rendering_behavior field in the RenderConfig object. This is how it works:

  • always (default): Cosmos renders the seed and runs dbt seed on every execution. This preserves the original Cosmos behavior.

  • when_seed_changes: Cosmos renders the seed, but only runs dbt seed when the seed’s CSV content has changed since the last successful run. When the content is unchanged, the task succeeds without running dbt seed.

  • render_only: Cosmos renders the seed as a no-op EmptyOperator so it stays visible in the DAG topology and lineage, but dbt seed is never run. This is useful when seeds are managed outside of Cosmos.

  • none: Cosmos does not render the seed in the DAG/TaskGroup at all.

Example:

from cosmos import DbtTaskGroup, RenderConfig
from cosmos.constants import SeedRenderingBehavior

jaffle_shop = DbtTaskGroup(
    render_config=RenderConfig(
        seed_rendering_behavior=SeedRenderingBehavior.WHEN_SEED_CHANGES,
    )
)

Detecting seed changes#

When seed_rendering_behavior is set to when_seed_changes, Cosmos computes the seed’s content checksum and compares it against the last checksum persisted after a successful run (best-effort: if Cosmos cannot compute or read either checksum, it falls back to running the seed):

  • The checksum is the SHA256 of the seed’s CSV content, computed from the seed file when available. Computing it from the file (rather than reading dbt’s per-node manifest checksum) keeps change detection consistent whether the project is loaded via LoadMode.DBT_MANIFEST or LoadMode.DBT_LS.

  • The last-seen checksum is persisted as an Airflow Variable, scoped per DbtDag/DbtTaskGroup and seed, so the same seed rendered in different DAGs tracks its state independently and one DAG never causes another to skip a seed.

  • Passing full_refresh=True always runs the seed, bypassing change detection. This path does not update the persisted checksum, so the next non-full-refresh run may run again to record/update it.

  • On a run where the seed is skipped because it is unchanged, the task does not emit its Airflow dataset, since no data was loaded.

Limitations#

  • when_seed_changes is only supported for ExecutionMode.LOCAL, ExecutionMode.VIRTUALENV and ExecutionMode.AIRFLOW_ASYNC, which run dbt directly on the Airflow worker with access to the seed files. Configuring it with any other execution mode raises a CosmosValueError at DAG-build time.

  • when_seed_changes is incompatible with TestBehavior.BUILD (under BUILD seeds run via dbt build and cannot be selectively skipped); this combination also raises a CosmosValueError.

  • Under ExecutionMode.WATCHER, a single dbt build runs the whole selection, so seeds run regardless of this setting. Only always is meaningful for the watcher execution mode; none and render_only change only what is rendered, not whether the underlying dbt build loads the seed.

Example:

example_seed_rendering = DbtDag(
    project_config=ProjectConfig(
        DBT_ROOT_PATH / "jaffle_shop",
    ),
    render_config=RenderConfig(
        # Render every seed, but only run `dbt seed` when a seed's CSV content has changed
        # since the last successful run.
        seed_rendering_behavior=SeedRenderingBehavior.WHEN_SEED_CHANGES,
    ),
    profile_config=profile_config,
    operator_args={"install_deps": True},
    schedule="@daily",
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="example_seed_rendering",
    default_args={"retries": 0},
)