Using Selectors in Data Build Tool (DBT) to Handle Disabled Models

Understanding the Issue with Disabled Models in Data Build Tool (DBT)

As a data engineer or analyst working with Data Build Tool (DBT), you may have encountered scenarios where models are disabled, and yet, they are still referenced in other parts of your project. In such cases, DBT throws an error indicating that there is a dependency on a disabled model.

In this article, we will delve into the issue, explore possible solutions, and provide guidance on how to use selectors in DBT to decide which models to run on a job execution.

Background: How DBT Works

Before we dive into the solution, it’s essential to understand how DBT works. DBT is a tool that automates data transformations for various data sources, including relational databases and cloud-based data platforms. It uses a project structure and configuration files (e.g., dbt_project.yml) to define which models to run, how to materialize them, and other settings.

When you run a job in DBT, it executes the specified models against the configured database(s). If a model is disabled, it will not be executed. However, if a model is referenced as a dependency in another model, DBT will throw an error, indicating that there is a circular dependency between models.

Understanding Selectors

Selectors are a feature in DBT that allows you to define conditions under which specific models should be run. By using selectors, you can dynamically decide which models to execute based on various criteria, such as tags or model names.

In the context of this article, we will focus on using selectors to handle cases where models are disabled and referenced in other parts of the project.

Creating a `selectors.yml` File

To use selectors in DBT, you need to create a selectors.yml file in your main project directory (same as dbt_project.yml). This file contains definitions for different selectors, which can be used to filter models based on various criteria.

Here’s an example of a selectors.yml file:

selectors:
  - name: my_project_with_tags_ignored
    definition:
      union:
        - method: fqn
          value: "*"
        - exclude:
            - method: tag
              value: dont_run

In this example, we define a selector named my_project_with_tags_ignored. This selector will run all models (using the wildcard *) except those tagged with the dont_run label.

Modifying Your Models

To take advantage of selectors in DBT, you need to modify your models to include the necessary tags or attributes. For example:

{{ config(
    materialized='incremental',
    unique_key='some_unique_key',
    tags=["dont_run"],
) }}
...

In this modified model, we added a tags attribute with the value "dont_run". This tells DBT to exclude this model from execution using the my_project_with_tags_ignored selector.

Running Your Job

To run your job with the selector, use the following command:

dbt run --selector my_project_with_tags_ignored

This will execute all models except those tagged with the dont_run label.

Additional Selectors and Tagging

DBT provides various ways to create additional selectors and apply tags to your models. You can find more information on selecting nodes in DBT documentation here.

By using these advanced features, you can further refine your selectors and ensure that only the models you want to run are executed.

Conclusion

In this article, we explored the issue of disabled models in Data Build Tool (DBT) and provided guidance on how to use selectors to dynamically decide which models should be run. By creating a selectors.yml file and modifying your models to include necessary tags or attributes, you can ensure that only the desired models are executed.

Whether you’re working with large-scale data projects or need to optimize your workflow for smaller teams, using selectors in DBT can help streamline your development process and improve overall efficiency.

Last modified on 2024-08-30