How LLMs are changing system architecture: from simple data pipelines to intelligent autonomous agents

17:38
27.12.2024
antipov_dmitry
376

At every technical conference lately, the word "agents" is sure to be mentioned. They are presented in different ways: as the next step after RAG, as a silver bullet for solving all problems, and as an absolute replacement for all classic pipelines. And those who do not use agents yet are hopelessly behind progress.

But is this really the case? This article is inspired by Anthropic's vision of applying LLM in processes and building agents, so let's try to figure it all out.

Let's talk about Data Pipelines, LLM Workflows, and LLM Agents, and also compare them with each other.

Anthropic should always be read because the company is at the forefront of LLM development and often sets the direction for the entire industry with its tech reports.

What are LLM agents and when to use them

There is currently some confusion in the definition of "agents". Some call agents only fully autonomous systems that operate independently for a long time and use external tools to perform tasks. Some call it just a sequential call of LLM with different prompts, and if you really need a fancy line in your resume and new stripes on your epaulettes, then any process where an LLM was barbarically stuck in and was not needed there at all becomes an agent.

Anthropic highlights two main architectural approaches:

Workflows — approaches where LLM and tools are used and orchestrated in predefined ways

Agents — approaches where LLM dynamically manage their own processes and use of tools

LLM Agents vs. a cat with IF ( IF ( IF ) ) ) in the code

Since this article is not just a translation, but a more general comparison, I will add another anglicism Data Pipelines, which in its pure meaning does not quite fit into the company of the terms above, but here it will be a generalization of the classic algorithms and processes we are used to, in which there is no LLM at all.

Comparison of approaches and how to choose them

When developing applications, you should always look for the simplest and most reliable solution, and increase complexity only when necessary. "Simplicity" here means predictable behavior, low resource consumption, good manageability, and maintainability of the process. And this automatically makes the use of agents inappropriate, whatever is meant by them.

Agent systems work better in certain situations, but they are slower and more expensive. Therefore, it is important to understand whether such complexity is really needed - agents perform better only where it is necessary to adapt their behavior and flexibly adjust to the situation.

Comparison of Data Pipelines, Workflows, and Agents

Characteristic	Data Pipelines	LLM Workflows	LLM Agents
Execution Flow	Predefined and linear	Predefined steps with possible conditions	Dynamic, determined by the model itself
Decision Making	Based on strict rules and conditions	Based on predefined rules and LLM	Autonomous, based on context and goal
Error Handling	Queues, retries, try/catch	Predefined error handlers	Self-recovery and finding alternative paths
Scaling	Horizontal, through parallelism	Vertical (more powerful models) and horizontal	Primarily vertical (improving agent capabilities)
Monitoring	Performance metrics	Response quality metrics, step execution time	Autonomy metrics, decision quality, resource usage
Predictability	Highest	Medium	Low
Flexibility	Low (requires rewriting when changes occur)	Medium (within predefined steps)	High (can adapt to new situations)
Resource Usage	Predictable, set	Moderate, depends on step complexity	High, unpredictable
Development Complexity	Any, depends on the task	Medium or high	High
Debugging	Straightforward, regular log analysis	Possible at each step	Complex, requires special tools
Development Time	Average	High	Very high
Data Requirements	Structured	Structured + partially unstructured	Any data types
Autonomy	None	Partial	High
Operational Cost	Low to medium	Medium	Very high
Development Speed	Fast for standard tasks	Medium	Slow
Reusability in Other Tasks	Low overall, high for similar tasks	Medium	Low (requires adaptation)
Documentation	Standard technical	Requires step and prompt descriptions	Requires detailed description of capabilities and limitations
Testing	Any tests	Unit tests + prompt tests	Complex behavior testing
Versioning	Standard (code + configs)	Code + prompts + models	Complex (including knowledge versions)

Optimal usage scenarios:

Data Pipelines:

ETL processes
Processing large volumes of structured data
Regular, repetitive tasks
Performance-critical processes
Predictability and security-critical processes

LLM Workflows:

Structured tasks with NLP elements
Processes with predictable steps but requiring more complex context understanding at some of them
Tasks where the balance between automation and control is important
Processes with clear rules and constraints

LLM Agents:

Complex tasks requiring independent decision-making
Research tasks
Tasks with an uncertain number of steps and autonomy in them
Processes requiring adaptation to new situations

Well, then I hand it over to Anthropic with my slight adaptation.

Frameworks for working with agents

There are many frameworks that simplify the implementation of agent systems, including:

LangGraph from LangChain;
AI Agent framework from Amazon Bedrock;
Rivet — a constructor of LLM processes with a graphical interface;
Vellum — a tool with a graphical interface for creating and testing complex workflows.

These frameworks make it easier to get started by simplifying standard low-level tasks such as calling LLM, calling additional tools, and chaining calls together. However, frameworks often create additional layers of abstraction that can hide underlying prompts and responses, complicating debugging. They can also encourage adding complexity where a simpler solution would suffice.

An important recommendation is to start with direct use of the LLM API: many things can be implemented with just a few lines of code. But if you start using something, you should understand more deeply how it works, as incorrect assumptions about what is happening under the hood of the framework are a common cause of errors and lack of expected results.

Architecture analysis

Let's move on to the analysis of architectures offered by researchers, starting from simple ones and ending with everyone's favorite agents.

This is the base — the "LLM with addition" block

The main building block of agent systems is the LLM, supplemented with extensions such as search, tools, and memory.

Ideally, these additions should be immediately built into the desired model (as stated in the original), but, in principle, nothing prevents us from making them on our side if necessary.

In the examples further, we will assume that each LLM call has access to these extended capabilities.

In the case of the basic block, it is recommended to focus on two key aspects: adapting this block to your task and ensuring a well-documented protocol for working with it. Anthropic themselves recently released a ready-made protocol for such cases (Model Context Protocol), which allows integration with the overall ecosystem of third-party tools.

Prompt chaining (Workflow: Prompt chaining)

An approach in which the task is broken down into a sequence of steps, with each LLM call processing the output of the previous one. To ensure that the process is carried out correctly, different checks can be added (see "gate" in the diagram below). And, accordingly, the complexity and length of the chain depend only on our task.

When to use prompt chaining: ideally, when the original task can be easily and clearly broken down into fixed subtasks. The main goal is to sacrifice speed for increased accuracy, assigning the LLM one simple task in each call.

Routing (Workflow: Routing)

The approach in which the input data is first classified and then sent to a more tailored and specialized prompt. That is, instead of processing everything with an LLM, we break down the input tasks by meaning, prepare a prompt for each, and then distribute them as needed.

When to use routing: routing works well for tasks where there are clear categories that are better processed separately. It is very critical to correctly determine the required category: this can be done either by the LLM itself or, if possible, by conventional NLP methods.

Examples where routing is useful:

Different types of support requests (general questions, refund requests, mortgage questions, etc.)
Directing simple and unambiguous questions to a lighter LLM, and complex ones to a more powerful one

Workflow: Parallelization

Several LLMs can sometimes work on a task simultaneously, and the final decision is made by combining or aggregating their responses.

In fact, there are two approaches here:

Division: the task is divided into independent subtasks performed in parallel, and then everything is put together
Voting: the task is performed by several requests in parallel, and then a single answer is averaged

When to use parallelization: the approach is effective when the task can be divided and performed independently in parallel, or when a comprehensive answer is required. For complex tasks with many aspects, language models usually work better when each aspect is processed by a separate model call, as it remains in focus.

Examples where parallelization is useful:

Separation:
- Protective mechanisms where one instance of the model processes user requests while another checks it for safety
- Evaluation of something by several different parameters simultaneously
Voting:
- Code vulnerability checking where several different requests with different prompts check the code
- Evaluation of unacceptable content where multiple requests evaluate different aspects or require different threshold values of votes to balance false positives and false negatives

Orchestrator+workers (Workflow: Orchestrator-workers)

In this approach, the LLM dynamically breaks down tasks, delegates them to language model workers, and then synthesizes their results.

Approach using LLM — Orchestration + workers

When to use orchestration: although topographically it resembles parallelization, the key difference lies in its flexibility — subtasks are not predefined but determined by the orchestrator based on the specific input request. Therefore, it is well-suited for complex tasks where it is impossible to predict the necessary subtasks in all cases.

Examples where orchestrator-workers are useful:

Writing code that may require changes in multiple files
Search tasks involving gathering and analyzing information from multiple sources

Evaluator and optimizer (Workflow: Evaluator-optimizer)

An approach where the LLM's response is sent to another LLM and until they agree with each other, there will be no final answer. That is, the response of the first LLM is iteratively improved based on feedback (evaluation) from the second until it reaches an acceptable threshold.

This is somewhat remotely similar to GAN, but here there is no competition and no one is trying to deceive anyone, and both roles play for a common result with positive feedback. A much closer analogy is an author and his kind reviewer.

Approach using LLM — Evaluator and Optimizer (Author and Reviewer)

Examples where the evaluator-optimizer is useful:

Improving translation or writing of literary text
Complex search tasks that require multiple rounds of searching and analysis to gather comprehensive information, where the evaluator decides if further searches are needed

Agents

Here we come to what hardcore guys call agents.

Agents are becoming more widespread as language models improve in key capabilities — understanding complex inputs, reasoning and planning, reliable use of tools, and recovery from errors. Agents start their work either with a ready-made command from a human user or with an interactive discussion of the task with them. Once the task becomes clear, agents plan and act independently, referring to the human for additional information if necessary.

During execution, it is critical for agents to get the "true state of affairs" from the environment at each step (e.g., results of tool calls or code execution) to assess their progress. Agents can then pause to get feedback from the human at checkpoints or when encountering obstacles. The work stops either with the successful completion of the task or a certain number of unsuccessful attempts to solve it.

Technically, the agent scheme based on LLM looks like this: these are language models working in a loop with tools and receiving feedback from the environment. However, successful and safe implementation requires regular attention to detail and a deep understanding and elaboration of many aspects: from designing protection and preventing data leaks to creating monitoring, logging, and emergency stop mechanisms.

And therefore, the apparent simplicity of the basic implementation actually hides a lot of problems - as, in fact, always.

When to use agents: agents should be used for open tasks where it is difficult or impossible to predict the required number of steps, and where it is impossible to rigidly fix the sequence of steps. The language model will potentially work for many iterations, and you must have a certain level of trust in its decision making. The autonomy of agents makes them ideal for scaling tasks in trusted environments.

The autonomous nature of agents means higher costs and the possibility of accumulating errors, so it is recommended to use and test them first in a closed loop with all possible protection.

Examples where agents are useful:

A programming agent that can make the necessary changes to a large number of files
A feature with "using a computer" when the agent has access to the computer and the task, and then it tries to find a solution to the task itself

Combining approaches

All these approaches are just an attempt to systematize architectures with LLM and do not imply that this will easily solve the task. In real life, combinations of approaches are often found and something that works well appears only with iterative and regular improvement of the metrics we set.

And yes, the authors are very upset when something simple is tried to be solved in more complex ways.

Conclusions and results

Success lies not in creating the most complex system. Success lies in creating the right system to solve our problem. Agents are not needed where you can do without them. LLMs are not needed where you can do without them.

We start with the simple and gradually improve, adding new levels of complexity only when it is appropriate and when simple solutions definitely do not cope.

And here are the three main principles when building agents:

Maintain simplicity in the architectural design of your agent
Prioritize transparency, clearly showing the agent's planning steps
Thoroughly work out the agent's interaction interface, thoroughly documenting and testing the result

Frameworks can help you get started quickly, but for a production solution, it is worth doing everything at a lower level, and there is nothing wrong with that, as it will allow for greater control and transparency.

The original article also includes two practical examples of agents from the authors themselves: a customer support agent that handles inquiries well, and a code-writing agent, as well as an additional recommendation to carefully consider agent interfaces and test them more frequently.

And taking this opportunity, I would like to end with a joke, saying an additional thank you to the company Anthropic, which allowed me to come up with my family logo (see my surname in the nickname).