Testing in dbt

Piotr Mierzejewski

19 May 2025, 5 min read

What's inside

Introduction

Singular tests

Hands-On: Implementing Singular Tests

Conclusion

Introduction

We continue our exploration of testing in dbt by building an example project, introducing various testing concepts, and explaining their relevance as we progress. Previously, we developed more complex logic and secured it with unit tests.

Now, we’ll take a step back to evaluate what we've built and determine whether there's a need for singular tests. In this stage, we'll introduce singular tests, explore their role in a broader testing strategy, and identify where they fit within our project.

Singular tests

What are singular tests?

So far, we've used unit tests to validate our logic and built-in tests to ensure data integrity. Now, let's explore another important category we haven't discussed yet: singular tests. But what exactly are they?

Unlike unit tests, which verify the correctness of logic, singular tests are assertions applied directly to the dataset. They help maintain data quality by detecting anomalies, inconsistencies, or rule violations within actual records. Singular tests are written as SQL queries designed to identify potential issues—if the query returns no records, the validation passes, ensuring the data meets the expected criteria.

When to use singular tests and when not to use them?

Singular tests are best reserved for custom data-quality checks that built-in tests can’t handle. Before writing one, ask yourself two questions:

Can a built-in test cover this case?
If the answer is yes, use the built-in version, it’s simpler to maintain.
Does a similar singular test already exist in the project?
If so, consider turning that test into a reusable, parameterized macro rather than creating another near-duplicate. In short, singular tests are a precision tool for validating unique assumptions that matter to your analytics pipeline. Use them only when no built-in or reusable alternative will do.

Hands-On: Implementing Singular Tests

Now it’s time to get hands-on. Our project is still small, and singular tests are a precision tool in a data engineer’s toolbox, but that doesn’t mean we can’t find the right spot to use one. Let’s scan the pipeline!

Staging layer

In the staging layer we have a single model, stg_fda__ndc.sql, whose job is a straightforward 1-to-1 mapping of the raw FDA file. Because no business logic is applied here, most quality checks can be handled by dbt’s built-in tests. Yet even a “pass-through” model can hide impossible values that would break downstream transformations. One such risk is a malformed NDC code. A simple singular test lets us detect the issue early, before intermediate models, whose logic relies heavily on the NDC structure, start failing.

select product_ndc
from {{ ref("stg_fda__ndc") }}
where
    position('-' in product_ndc) = 0 -- no hyphen
    -- left part 4–5 digits
    or length(split_part(product_ndc, '-', 1)) not in (4, 5)
    -- right part 3–5 digits
    or length(split_part(product_ndc, '-', 2)) not in (3, 4, 5)
    -- total length check
    or length(product_ndc) not in (9, 10, 11);

Intermediate

The intermediate layer is fertile ground for testing because it concentrates the bulk of business logic—joins, deduplication, rule-based cleaning, and evolving requirements. Singular tests can serve as an extra safety net against regressions, but only when the rule cannot be captured by a built-in assertion or a focused unit test.

At this point, our most sophisticated intermediate model is int_ndc_labelers_deduplicated. Its job is to standardise labeler names and remove duplicates. The cleansing regexes behind that process are already protected by unit tests, so there’s no need to re-check the same logic with a singular test.
Instead, we should add a singular test that validates the business invariant underpinning later layers: every labeler_code must map to exactly one canonical labeler_name. If a single code still appears under multiple cleaned names, downstream metrics will double-count products and erode trust.

select
    labeler_code,
    count(distinct labeler_name) as name_count
from {{ ref('int_ndc_labelers_deduplicated') }}
    group by labeler_code
having count(distinct labeler_name) > 1

Marts

Just like the intermediate layer, the marts layer needs solid test coverage—perhaps more than anywhere else—because it feeds dashboards and executive reports. If bad data slips through here, two things usually follow:

Silent impact. Problems may go unnoticed for days or weeks, driving flawed analyses and poor business decisions.
Fire-fighting scramble. Once the issue surfaces, teams scramble to trace the root cause. Without clear, automated tests this is slow and draining.

Put simply: catch the error in mart-level tests, not on a VP’s slide. Robust singular tests at this stage act as a final safety net, preventing costly downstream damage.

Going back to the project, the mart layer is still in infant stage. The only mart_labeler_summary aggregates three intermediate models to produce labeler-level metrics about the total_drugs, generic_drugs_count and distinct_active_ingredients used, produced by one labeler. The example of the test that could be added here are:

No negative counts - that guards against bad CASTs, subtraction logic, or join-duplication that accidentally flips counts negative or have not drugs assosiated.

select *
from {{ ref('mart_labeler_summary') }}
where
    total_drugs=< 0
    OR
    generic_drugs_count < 0
    OR
    distinct_active_ingredients < 0

Every labeler present - surfaces labelers that vanished because of mis-joins or filters.

select dm.product_id
from {{ ref('int_ndc_labelers_deduplicated') }} as dm
left join {{ ref('mart_labeler_summary') }} as ms
    on dm.labeler_name = ms.labeler_name
where ms.labeler_name is null

Conclusion

In this article, we introduced singular tests, a precision tool in the analytics engineering toolbox, that lets you validate custom business rules when built-in tests fall short. We walked through every layer of our FDA project, pinpointed the spots where singular tests add value, and showed how to implement them in practice.

If this is the first article you’ve read, be sure to check out the earlier parts of the Testing in dbt series, where we covered built-in, contract and unit tests and set the foundation for today’s examples.

We’ll keep extending the open-source FDA repo in upcoming articles, diving into other parts of data platform that you can use in your project.

Testing in dbt - part 3

Piotr Mierzejewski

What's inside

Introduction

Singular tests

What are singular tests?

When to use singular tests and when not to use them?

Hands-On: Implementing Singular Tests

Staging layer

Intermediate

Marts

Conclusion

Recent posts

Why data engineers don’t test - according to Reddit

Modern Data Stack with Airflow and dbt - going into the cloud (part 2)

Modern Data Stack with Airflow and dbt - starting simple (part 1)

Why data engineers don’t test - according to Reddit

Modern Data Stack with Airflow and dbt - going into the cloud (part 2)

Modern Data Stack with Airflow and dbt - starting simple (part 1)

Let's talk