Product Updates

Building AI Agents for Data Engineering

Written by

CEO & Co-Founder @ SYNQ

Published on

June 24, 2025

The field of AI agents has, in just a few months, evolved from a futuristic concept to a tangible reality. Stable tool calling enabled LLMs to interact effectively with external systems and APIs, and substantial improvements in reasoning capabilities have opened the way for their practical application.

We are excited to bring AI agents to Data Observability, capitalizing on the momentum of this technology, as data observability is poised for a fundamental shift.

It will evolve from a passive platform to an active one, taking a larger role in the day-to-day operations of data platforms.

It's uniquely positioned to do so, with the most complex view of what is happening based on rich metadata collected from multiple sources — including orchestrators, transformations, code repositories, incident management, data platforms, downstream tools, and more.

Starting with Data SRE

We’ve started by analyzing the entire data observability landscape and identified several possible use cases where AI agents can truly transform existing workflows. While I see potential to reinvent almost any part of data observability, there were two particular problems we kept hearing from virtually any data team:

Teams are overwhelmed by alerts and struggle to triage them effectively.
It’s often tough to design proper testing and monitoring for increasingly complex data systems.

While these are two separate problems, they also share many commonalities. They are part of a broader development life cycle, focusing on preventing and resolving issues, as well as reducing production downtime.

This is where we started.

Going forward, AI systems can advise on the optimal setup of tests and monitors to prevent problems, assist in diagnosing issues that arise, and close the loop by recommending relevant follow-up tests.

The closest term that comes to mind, combining these two disciplines, is the practice of site reliability engineering (SRE), which is already well established in software engineering.

Building autonomous SRE

One of the crucial decisions we’ve made when designing the SYNQ AI agent is how it should interact with the user. A chatbot was the most obvious choice, but we’ve ruled it out, at least as a primary way to interact with Scout AI for a couple of reasons.

Chatbots require the user to initiate the workflow. This works well for information retrieval use cases, such as exploring ideas or learning about a topic; ChatGPT or Anthropic Claude serve exceptionally well. However, chatbots also leave a significant amount of work for the user. To effectively interact with a bot, the end user has to acquire a set of skills, such as sound prompt engineering. This is already hard for general-purpose LLM models and becomes even more complicated for domain-specific solutions. The user must understand not only the underlying LLM but also the system prompts of the agent itself. Finally, chatbots are limited to single-turn operations in the prompt-response model, which means that the user must drive the entire workflow.

To address some of the limitations of chatbot interactions, we’ve explored assistant agents. Assistants like Cursor Composer or Claude Code are widely adopted in software engineering, where an engineer offloads entire problems, such as refactoring code, implementing a feature, or fixing a bug, to the assistant. The agent performs multiple tasks, primarily through tool calling, thereby automating a significant portion of the workflow. This is a considerable improvement, and the concept has been verified in software engineering; however, we felt it doesn’t have a great fit in data observability. Software engineering, at least today, is a workflow that needs the user to be entirely focused on the task at hand. While the coding agent can take some of the work away, the engineer stays in the flow with the agent to correct it, add more context, and iterate.

This brings us to the approach we’ve settled on: an autonomous agent. To establish a firm reference, we could look to another example of such an agent entering the market, OpenAI Codex. If you work in software engineering and didn’t have a chance to experiment with it, I highly recommend it. It completely changes the software engineering workflow. It starts with a prompt (task) and context (codebase) and then takes the entire process “offline”. It iterates on its own and notifies the developer once the solution is done.

This concept is a perfect fit for data observability because it enables another fundamental shift: the SYNQ agent workflow is entirely independent of the user. It’s triggered by events detected in the underlying data platform. This makes the entire agent reactive and autonomous. It means the agent can significantly reduce the amount of work required from the user. If a new issue appears in the system, we send an alert to the team and immediately assign an agent to investigate the problem to identify the root cause. Such an agent, therefore, takes the entire workflow away from the user and returns with actionable information without requiring the user to take any specific action.

We believe data and analytics engineers shouldn’t have to be logged into observability platforms. We also think sending raw alerts will soon be a thing of the past.

Instead, we see a fleet of agents working in the background, powered by observability metadata, automating entire workflows and coming back to the user with recommendations and actions that save vast amounts of time and increase the robustness of the platforms.

Humans rarely have the opportunity to review 100% of alerts or proactively consider improving the robustness of their data platform on a daily basis. Autonomous data agents do.

Scout phase #1 — Autonomous Issue Root Cause Analysis

Diagnosing data issues is a complex topic that requires a lot of context. This is why an agent rooted in data observability has so much potential - an observability platform works on the premise of collecting metadata from a wide range of platforms and modalities. At SYNQ, this ranges from understanding the definition of data products, ownership, prioritization, and use cases, through all aspects of code, whether it is SQL, raw dbt, or SQLMesh code, including commits and pull requests (PRs), down to the very detailed information, such as detailed query logs. The SRE agent leverages this data to gradually explore the system's state and potential root causes.

> Data SRE acts like an engineer in the early days of setting up a data platform who has everything in their head. As the platforms and teams grow, this becomes incomprehensible for people. However, this is not the case for computers, which now have unlimited storage and an increasingly powerful ability to evaluate and reason about this information.

Our first agent, focused on root cause analysis, was therefore a firm testing ground for several key concepts.

Connecting the agent with observability

From the beginning, it was clear that Scout AI would not be uniquely differentiated by a set of prompts, which are easily replicable. Instead, we’ve heavily focused on the context that agents can gather from a wide range of information already organised in the SYNQ platform. This led to the development of two dozen specialised tools that we can use in agent workflows. An agent can explore lineage, recent commits on relevant models and their upstream dependencies, review the history of issues or incidents and how they were resolved, and assess the impact on schema or data products. All this data is provided by the SYNQ observability platform via APIs. This is where we’ve made a crucial decision that has served us very well till today:

We are building agent tools abstracted from the underlying APIs. Contrary to patterns popularized by MCP servers, where relatively low-level APIs are also exposed directly to LLMs, we have created a layer of tools that abstract this away. Each tool exposes a carefully designed and structured interface to the LLM model. Under the hood, it calls one or multiple SYNQ APIs and applies logic that might normalize, join, or otherwise process raw API payloads.

We see this as significant for several reasons:

APIs are not always designed for LLM agents that might need specific actions in the context of particular workflows
APIs might often be too heavy. Passing through unnecessary information to LLM might confuse it, and also generate unnecessary costs.
Tools allow us to efficiently trace declarative and probabilistic execution (more on that below)

Besides tools designed to fetch metadata context, we have also empowered the agent with the ability to query the actual data. This is a fundamental departure from traditional observability. We no longer primarily work with metadata; instead, we encourage the agent to run queries to gain a deeper understanding. This ranges from profiling statistics about columns to appreciate their content, to sampling row-level data and data segmentation queries that might reveal underlying patterns. We believe this functionality is essential as it dramatically increases the Agent’s ability to diagnose issues like a real data engineer.

Balancing declarative and probabilistic execution

When designing the agent workflow, it’s essential to strike a balance between declarative and prescriptive workflow and probabilistic actions taken by the LLM. I can demonstrate this on two extremes.

On one side, we could describe the agent workflow completely declaratively. For an agent focused on the diagnosis of data issues, the workflow would be something like this:

Review history and find any relevant issue or incident
Review recent code changes that might be related to the issue
replicate the issue from a data perspective to understand the data
Conclude what happened

Such a workflow can be coded as a sequence of API calls with several if-else statements, even without the use of an LLM. But such a workflow is very static and would likely work only on a minimal set of contexts.

On the other hand, we could go completely agent-based, where we provide only a rough system prompt to the LLM agent, which can then independently decide the sequence of steps it will execute and effectively loop through a sequence of tool calls. This offloads the discovery of the correct workflow fully to LLM. You could facilitate such a workflow if we had exposed the MCP server. I am skeptical that such an approach would work well. This is because the LLM agent must make all decisions. When the issue arrives, it needs to choose to call the get_issue tool to fetch information about the problem. Or when it’s in the middle of the workflow, it theoretically has the opportunity to do actions that don’t make any sense at all. This is good for the LLM provider, who will gladly see your agent explore a lot of irrelevant actions, but it’s not good for you. It reduces the stability of the agent and generates more cost.

As you might expect, none of the above extremes is ideal. Instead, the best solution is in the middle, where we balance probabilistic flow (LLM decisions) with deterministic code execution (either predefined workflow steps or code that aggregates data within tools).

Not allowing the LLM agent to be fully agentic is also an excellent opportunity to layer in strong domain expertise to the agent. It also removes room for error and workflow flexibility that is not needed at all.

We’ve settled on a combination of 4 concepts:

At the core of the agent is a workflow. It begins by retrieving all open issues, incidents, and their relationships to each other in terms of space (lineage) and time. In the next step, the agent determines the sequence of problems to diagnose, and the workflow will collect a large amount of relevant information at each issue, including details about the issue and pertinent historical issues.
We plug in LLM as a decision maker with constraints. For example, after the initial assessment of open issues and incidents, the workflow prompts the agent to indicate where they would like to start. This means execution transitions between deterministically sequenced steps and probabilistic decisions. Such decisions are not coded as agents but rather as more straightforward LLM completions. This also makes them more testable and easier to reason about.
In specific parts of the workflow, we open the room for the agent to explore fully. In the context of issue diagnosis, this occurs, for example, when the agent decides to diagnose a specific issue and then generates an initial hypothesis, such as that the problem originates from the source. At this point, it would be very inconvenient to outline deterministic steps on how to proceed; instead, the agent has a set of specialized tools to query data and follow its hypothesis iteratively. It can segment data in a table, examine lineage and code, and proceed upstream, repeating this process until it concludes. LLM entirely drives this part of the workflow until the conclusion can be made, and the logic flow returns to the workflow.
Agent tools are essential units of work, where tools under the hood encapsulate specific actions as we outlined above.

As a result, we have built an ecosystem that enables us to effectively experiment with the movement of logic between LLM and traditional code by shifting the logic between workflow, agents, LLM calls, and tools.

Secure way to execute SQL queries

One of the crucial topics we had to address from the ground up was security. This is mainly driven by the fact that agents can now execute queries to the customer’s data platform. This is a significant departure from the minimal and straightforward queries typically executed by an observability tool. To tackle this challenge, we’ve designed SYNQ AI from the beginning completely independently of our Cloud, which allows us to run Scout AI both inside our cloud offering and within client infrastructure, effectively cutting ourselves out of the flow of queries and raw customer data.

This concept is further supported by how we designed agent tools to explore data. SQL is a relatively simple declarative language; yet, numerous examples and research studies show that LLMs still struggle to write SQL in complex situations, which drives the popularity of semantic layers that eliminate some of the details.

However, perhaps even more importantly, we didn’t want to generate raw SQL queries directly from the LLM, as we believe such an approach is fragile (with too much room for error) and insecure (the agent can generate any code).

We’ve therefore decided to build a set of specialised tools that encapsulate the SQL generation in deterministic code. LLM chooses the tool to use—profiler, segmentation, or sampling—and provides the necessary parameters. The tool converts parameters into SQL code aligned with the customer’s platform dialect, eliminating room for error and ensuring consistency. Such queries are much easier to audit, secure, and reliable.

As we explore agent workflows further, we are experimenting with several deeper concepts to enhance our agent's ability to diagnose issues more effectively. Our focus is now on advanced hypothesis-driven exploration, which shows promising signs of enabling our agent to debug more complex problems through a series of multi-step analyses. We are further expanding context beyond traditional observability by ingesting additional data sources such as PR reviews and comments, which is pushing Agent’s ability to debug issues on par with senior engineers who have a good understanding of the entire platform.