Building critical data products? Sign up for our upcoming guide

— Written by Mikkel Dengsøe in Articles — 2/7/2023

The complete guide to building reliable data with dbt tests

10 practical steps from top data teams

We’re fortunate enough to speak to a large number of companies about their approach to building reliable data. In doing so, we’ve noticed common steps that the best teams take to get the most out of their dbt tests.

63d1687259fc32dbf122be93_kuR7BljjoYFr4AkO6wOGJB6M9-s_NrY2QiwZSHdhJm7XW8Lj3aq4ATWCMhSEG5QhTkJO0VYHWvlOxwOvASbfB4W1mv-A1M3Yr8JPf_6aJCvntPU1dD8uBvbWpHZ8gJG2ax9su9fITwn3MRq3ZcYHVCZB7Y_k7eLlmpfAAFr7rUzyrrfq9NWzxP0D2e1vrQ (1)

State of dbt testing in 2023

With the rise of data observability and ML-based anomaly detection it’s tempting to think that manually written tests are a thing of the past and that the future is fully automated.

In our experience, this is far from true. We’ve spoken to 300 data teams, and less than 20% are making use of 5 or more of the best practice steps we’ve identified for building reliable data.

This highlights that most teams still have a big opportunity to build higher-quality data foundations with a more robust testing setup.

63d1687112499e6591ee2b8d_1eH5UHy-fxrxG66bl8CkDUT0a5AnpZvxVcR36xEl2VCHg5EhcrWeROOFRJKF7DUZcBGOpNdKtTrAGFuRngP6qhiPJCGdU5PsIMQ1csBqeyPbDO6AoNZDs6o0aJOlKutRJNzlpvZOrVjvTB0DocKqaxLpvoXILiFbkSAFP1X2A_X49yzYyQLR79x4eibXsg

Building reliable data consistently ranks top of the list when data teams are asked about their priorities, so why have most data teams still not moved past a basic testing setup? Common answers we hear are:

  • Lack of time: Demands on creating new data models and meeting business needs take priority, and data quality initiatives are getting postponed
  • Not sure where to start: With hundreds or thousands of data models, it’s hard to know where to start
  • Not enough experience: Many data people are early in their career and haven’t yet worked in companies that have had high-quality data foundations that they can use as benchmarks for what good looks like We’ve identified ten steps we’ve consistently noticed the best data teams take.

1. Use source freshness to capture stale data

Testing data freshness for sources is one of the most important steps to take to build reliable data and has several benefits.

#1. Know about stale data before stakeholders ‍Catch stale data from sources you’d otherwise have missed before stakeholders notice that data is missing in dashboards.

#2. Reduce debugging time ‍The further downstream you catch an issue, the greater the potential impact and the longer it takes to debug. By catching stale data at the source, you can more quickly spot and escalate issues and avoid a lengthy chain working your way upstream from dependencies in the transformation layer.

63d1687159fc32f93a22be92_8PGbkMt5wjsmrl7wknWHs3mLqc_wv0qH6njWNvX66oT3oQN_i_aJYh6hYXxhjTYu3WKKmARgsLWf0WnXUNpOqQYhUGiGRb1M5X9QPxSutadhupv0UcSHNRIWOdGxDK93y7ofgw7Y64DVgeRxAHZ-g8XnjsBsOxfJgadEvKmKfg6hY3HzolbNSBimQ8ug_w

#3. Visibility into the most problematic sources ‍Use the outcome of source freshness checks to look back to see which sources are most often stale. This can help you make the case to data producers to fix issues frequently causing problems for your team.

Practical steps

  • Add source freshness checks: Implement source freshness checks to your most important sources. If you don’t know where to start, see #4 in this guide
  • Define upstream owner: Define an owner in the meta owner tag for your sources, so you know who to go to when source data is stale. See #5 in this guide for more details
  • Check freshness from your ingestion tool: Add source freshness checks to all models that come from your ingestion tool as a consistent way to catch issues proactively. For example, if you ingest data from Fivetran, use the _FIVETRAN_SYNCED as the timestamp to detect freshness

2. Extend your test suite with advanced tests

dbt has useful natively built-in tests but for more advanced out-of-the-box tests consider using a package such as dbt expectations. This lets you catch more issues that you’d be blind to with standard dbt tests.

Here are a few of our favourite tests and how they can be used.

63d192b0c7b74504e6f6c5b6_Screenshot 2023-01-25 at 21 35 53

Practical steps

  • Add advanced tests: Review your most recent uncaught issues and see if they could have been caught by a test that exists within the dbt expectations suite

3. Use regression tests to catch changes to historical data

Do you know the feeling when you’re updating an important presentation you made six months ago but can’t get the numbers to match up? Or if you work in a fintech data team and you have to resubmit data to regulators? Thousands of things could have gone wrong in the meantime, and there’s often no way of reproducing these. With regression tests, you can be notified of historical changes to your most critical metrics as they happen.

There are many ways to go about adding regression tests ranging from advanced implementations such as automatically creating snapshots to more basic approaches by comparing data to a fixed point in time.

If you’re like most data teams, you’re likely not doing much here, and we recommend starting with the most basic version of regression tests on a handful of key metrics.

63d1687103a283218cf12c5c_-4FOzifuplW10GY_DBBv_Uzr_nHJYjI3otI2DU9sGzATtS_V_qKPzKB0YDFfQKUVXicYOAxkxg759L0QD4IDJb_48Za_hdsXOJ-2xfoAG3lIyjqYCWgt1KJydjtTweoICo5UNM8zjtKMq2Z5T8zxXiBEvG-vTdLDfkxyCnr7knFQxEMcq8Oy6M2e7vnrzg

Here’s an example implementation from a data team for flagging when the sum of a column up to a given date changes, meaning that historical data has changed.

/*
Arguments:
- column_name: Picked up from the context it's used in
- model: Picked up from the context it's used in
- date: The date used for the historical data check
- date_column: Which column the date is in
- expected_sum: The value this column should total to on / up to the given date
- up_to_date: Boolean which indicates whether to sum up to and including (<=) the date, or on (=) just the given date
- rounded_to_decimal_places: Optional INT, checks the expected_sum is correct up to the given precision
*/

{%- test expected_sum_at_date(column_name, model, expected_sum, date, date_column, up_to_date, rounded_to_decimal_places) -%}

{# Flag if we want to sum to up and including the date (<=), or on the date (=) #}
{%- if up_to_date -%}
{%- set operator = '<=' -%}
{%- else -%}
{%- set operator = '=' -%}
{%- endif -%}

{# Check the expected_sum value up to the specified decimal places #}
{%- if rounded_to_decimal_places %}
{%- set aggregation = 'ROUND(CAST(SUM(' ~ column_name ~ ') AS NUMERIC), ' ~ rounded_to_decimal_places ~ ')' -%}
{%- else -%}
{%- set aggregation = 'CAST(SUM(' ~ column_name ~ ') AS NUMERIC)' -%}
{%- endif -%}

SELECT
{{ aggregation }} AS actual_sum

FROM {{ model }}
WHERE {{ date_column }} {{ operator }} '{{ date }}'
HAVING {{ aggregation }} != {{ expected_sum }}

{%- endtest -%}

Example of implementing the test

- expected_sum_at_date:
    expected_sum: 250000
    date: "2022-08-31"
    date_column: "day"
    up_to_date: True
    severity: error

‍_Practical steps_

  • Add regression tests: If your company has data where historical changes are critical (e.g. if you submit data to regulators) consider adding regression tests

4. Be explicit about testing your most critical data

Whether you have a few test failures per week or dozens per day, it’s important to be explicit about when an issue is important. This helps you prioritise and know when you should drop everything to fix it.

designing-severity-levels-for-data-issues

See the guide Defining Ownership and Making it Actionable

‍As a rule of thumb, data models that are critical either have a business-critical use case (e.g. used for board reporting) or have a lot of downstream dependencies (e.g. core users table).

In the guide above we suggest three concrete steps for assessing the criticality of test errors:

#1. Think of data importance as a chain ‍It’s not enough to know if an issue happens on a data asset with a business-critical use case. You should also know if the issue is on the critical path. Any issue on a data asset that sits upstream of business-critical data is on the critical path and should be treated as such.

#2. Severity should be managed across the stack ‍To be able to confidently assess the severity of an issue, you need to look across the stack. For example, you may miss that a data issue impacts a dashboard that half the company uses if you don’t consider the impact on the BI layer.

#3. Automate your severity levels ‍Whenever you have a data issue, the last thing you want to do is manually have to go through all the steps above. Instead, it should be automated and built into your existing workflows. For example, if you receive alerts in Slack, they should clearly state the severity of the test error.

Practical steps

  • Tag your most critical data models: Start by using the dbt meta configurations and create a criticality: high tag for your most important data models
  • Be deliberate on how to act: Set clear guidelines for how test failures with different test severity should be acted on. See Designing Severity Levels for Data Issues for concrete steps to getting started

5. Assign owners to make it clear who should deal with test errors

Perhaps you’re running a rota with a different person responsible for handling data failures each week. Or maybe your team is so large that you expect each domain to act on issues from data models they own. Either way, it’s a good idea to assign owners to your data models, so it’s clear who’s responsible for dealing with test errors.

defining-ownership-and-making-it-actionable

See the guide Defining Ownership and Making it Actionable

‍We recommend your most important data assets have an owner, you should use teams rather than individuals when defining owners and you should integrate ownership with your alerting so the right people know about issues they should care about and can take action.‍

Unfortunately, we’ve seen too many teams getting started with ownership with good intentions but failing to put it into action and have any material impact. We suggest taking these steps to avoid this.

#1. Ownership should be a first-class citizen ‍Ownership should be integrated with your workflows. It should be integrated into your data alerts so it’s clear who should look at a test error. You should also be able to use ownership to understand the data reliability for data owned by different teams so you can make data-driven decisions about where data quality is falling short.

#2. Ownership should be managed across the stack ‍Just like criticality should be managed across the stack, so should ownership. For example, you can add owners to your sources in dbt so it’s clear who you should go to when there’s an upstream issue. Some of the most successful ownership implementations we’ve seen include automatically notifying data producers when there are issues on sources caused by systems they own.

63d16871a2acaf59e5351291_gZcaOx1-4h0tjJj61zrrq09W_W9LfjODJ4-EZPLB3uv5gm_ZMCbJkLyGGjAZMF507_mTS5KD5IhZ_aflAD0rQV2za90E3JlvF2ouh3AyprkE6XE-Z0c292KYDDRAH4DbeOrYVfLeymF5dNkG5PlhOSPFVG6euJzuaBIJAi630nZtldFLx7UTFXMMOgTtYw

Consider adding owners to your most important dashboards as well so you know who to notify when there’s an issue.

#3. Embedded or centralised data team? You may need a different approach ‍If you’re experiencing scale beyond 10 data people, chances are you’ve moved to a world where at least some data people work embedded in the teams they support. While you may be able to get away without defining ownership in a centralised world, good luck chasing down who owns an issue in a 100-person mostly embedded data team.

63d168716218e8b62c7c9811_RrcLAcTemAFUwd4nMUD1mEkhruQhsDVaPtzt8XbSydU2QtqZI2QEdGIXdP62EBoe3y2cQvKamkJKLHbKlVb_35_GFr_-SkQhyIwGhPgdb4rdDXjA-eCWoMYc2lwoxSF6XAlrgpSEAw-gh03A8rqHLW37uO6Zzn4pNOreiuXiBlJh_i0UxU5CagjUxNsjwA

If you get to the point where multiple data people are embedded in each business area, we recommend you start routing alerts about test failures to individual Slack channels for each team.

Practical steps

  • Assign owners to your most important models: Use the owner tag in dbt meta to assign owners to critical models. It can make sense to assign ownership on a team level (even in teams of one) to have a more resilient setup for people leaving or moving around
  • Integrate ownership with your alerts: Most companies use Slack to get alerts about failing data tests. Whichever tool you use, try to integrate ownership with your alerts. For example, tag teams or individuals for test errors on models owned by them or if you have a larger data team, route to individual Slack channels
  • All new models should have an owner: Establish a rule that each time someone creates a new data model they assign an owner to it. If you want to be strict, you can use pre-commit-dbt to enforce it

6. Continuously improve your test suite

Even if you have all the tests you could possibly imagine, from time to time, you learn about data issues from stakeholders. This happens, and sometimes there’s no way you could have known upfront. But if stakeholders spot the same issue several times, you risk losing their trust.

Common signs that it’s time to evolve your test suite:

#1. You just experienced a high-severity data outage: If you had a critical data issue, you should evaluate if it could have been caught earlier by a test.

#2. Your data models have gotten more complex: You may start out with a simple architecture, but as you grow, chances are you’ll have core models that most downstream models depend on. Expectations for this data and the quality of testing should increase as the downstream dependencies increase.

#3. You’re starting to see business-critical use cases: Perhaps you’re being asked to provide data to regulators, or data is being used for real-time decision-making in operational systems. If you’re doing good work, chances are that more people will want to build on top of your data. Your test suite should evolve as your data gets more business-critical use cases.

Practical steps

  • The two-time rule: Make a rule that every time your stakeholders spot an issue that wasn’t caught by a test, you implement a test that will catch it next time

7. Beware of long-failing tests

Long failing data tests are akin to the broken windows theory. If you stop ignoring a few failing tests before you know it, your team will get accustomed to the idea that tests are not that important, and you’ll start seeing the data reliability deteriorate.

63d16871c96802a9190a505c_TKKCqx39_aWt9HlPqmV8Erd2nu-W73G3dKy09wHu6Q13FbnZi8P3Mb5yItLxKntGabIhIwracaBixP_K4cg4oaqIvu2qX6DuACQJPVfHXE9QVaosooS4pTOYkbFQa3TETdrT8YNjWhXjDmnpo9cEadJFpgzypMykgaZtIEI5yEY-2-bWaCjvEwY1z0Xi7A

We’ve noticed three pitfalls that often lead to tests being ignored:

#1. Low signal to noise ‍Too often, test errors are not representing actual errors in the data. For example, they may be due to a delayed pipeline, and by the time you get to the root cause of the test failure, it may have just worked itself out. Or they may have been impacted on non-critical data assets without thinking the business case through.

#2. Tests not implemented in the right places You have a high test coverage, but data issues are almost always caught by end-users directly. It frequently happens that issues that the data team did not anticipate creep in and go undetected by the data team until a stakeholder starts questioning a dashboard. This can lead to the perception that data tests are unimportant because they don’t catch the real issues.

#3. Overload of test failures ‍Large data teams (50+) with thousands of data models often dabble with this. They have dozens or even hundreds of test failures in a given day and knowing which ones to pay attention to is no easy task. This causes alert blindness and a culture where leaving a test failing is considered acceptable.

Practical steps

  • Fix broken tests sprint: Set aside a Friday each month where you sort your tests by those that have been failing for the longest and make it a goal to fix the issues so no test has been failing for more than five consecutive days
  • Delete unnecessary tests: If a test has been failing for weeks and you haven’t addressed it, it’s probably not important, and you can delete it

8. Beware of known issues that deteriorate

You may have a test failure on an important data model, but you know it’s safe to ignore as it’s just two unimportant mismatched employee IDs. But what happens if the test starts to deteriorate and you start seeing hundreds of failing rows? You’d risk missing that until a stakeholder notifies you that something looks off.

63d16871a1070d138d6cead5_unGEmaFTqVgy7XMFfaSudvB48QbDdCYxNDO7JuvoJ0aMtbhO1HBJxyYGfxItelEvzaK9p9XJXdwQPSmomjmOdc6cToctjSN2igvMmzOIBZ00gKt74QfzoU9FuuFcwufP1BdAzXYTJLEt9BG5nnbV8GovjUnmaz0qlro0iV_JhTBlMy6GI6gNR5ILXuXcug

A better approach is to explicitly monitor data tests that you know are failing. You can tag these tests with a label such as “under-investigation” in dbt and exclude them from the daily test run and instead run them less frequently to keep track of how many rows are impacted over time.

We’ve seen too many data teams deal with this in an ad-hoc way, most often by introducing a where clause to remove known issues.

63d1687294e0357608f1c0ec_tEtRHhR8Gbsx2B0-S-xYZT8Ynk7TWtLGiBEJRDJdC_p1Z2aFd1VGquX6kxjP-TrznwiirsEKwN9t3d5W0h9tubX1hy9kz9peQ_rPvl1CZTtNQenJSqshvLn4k6TI2qWKdAhfCQPYpTL2FqJY8lg72JZjcWEeyNWCy7dBz0fqvS8a6mCQauKjuIUkriNCiQ

This works in the short term and fixes an immediate issue, but as you start seeing dozens of these in your code base, you’re on a slippery path. Chances are you’ll forget about them, and your codebase will be full of logic that only one person understands.

Practical steps

  • Monitor the impact of failing tests: If you have a failing test that you’re well aware of, occasionally check that the number of rows failing is not deteriorating
  • Make monitoring known issues explicit: Consider adding a tag in dbt so you can monitor tests with known issues

9. Set clear guidelines and expectations

Too often, expectations for what good testing looks like is left up to an individual’s interpretation. As the data team grows, this leads to situations where each person or teams develop their own definition of what good looks like, irrespective of the criticality of the underlying data.

When we speak with data teams, we frequently see them starting with good intentions, but as the team grows, the quality of data tests starts to deteriorate.

There’s often a good reason for this. As the data team grows, some areas may be required to cut corners to meet crunch-time deadlines while other teams have more time to focus on the foundations. That’s unavoidable, but in these situations, you should be clear that expectations are being cut short, and the idea is to catch up in these areas.

A good way of supporting this is by having a centralised and embedded analytics engineering team that can jump in and out of areas and ensure that the data quality is consistent and up to expectations.

Practical steps

  • Document testing expectations: Write down your dbt testing expectations. For example, make it clear that important data models must have at least unique and not null tests. Or make a rule that data models upstream of critical dashboards should all have tests. Make this document mandatory reading for new joiners
  • Enforce it: While not a catch-all solution, you can use an open source package such as pre-commit-dbt to enforce specific patterns for data models. For example, add a requirement for new models to have unique and not_null tests or check that certain meta tags have been defined

10. Take time to reflect on the big picture

If you’re like most data teams, when asked, “what’s our data quality” you find it hard to have a concise answer. Once every three months, set a few hours aside to look back. Have a back-of-the-envelope sense of the number of test failures, how many issues were caught by stakeholders and how data testing has improved in key areas. Share this with your key stakeholders.

63d28974d18739fd966c84b6_Screenshot 2023-01-26 at 15 07 16

Practical steps

  • Know your data: Have a few KPIs about how many errors you had, what the impact was and how long they took to resolve
  • Share accountability: In some cases, you’ll see a disproportionate number of data issues coming from a few upstream teams. Share this with the teams to showcase the strain they cause on the data team and downstream producers