AI Agents
AI Agents represent one of the most promising frontiers for applying large language models to complex tasks. As organisations increasingly seek to leverage AI for automating workflows and solving multi-step problems, understanding how to build effective agents becomes crucial.
In this post, we're bringing you practical frameworks and implementation patterns that we came across in Chip Huyen's work and Anthropic's guidance on implementing AI agents, complemented by our own experience at Kiseki Labs from building our AI agent libraries. For background on why these systems matter, see our exploration of why AI Agents are necessary for unlocking LLM capabilities.
At their core, AI Agents are surprisingly simple. An Agent is essentially something that observes and acts in an environment, with an LLM as its brain. Despite this simplicity, we believe these systems have the potential to fundamentally transform software development. They're built on familiar concepts like self-critique and chain-of-thought reasoning that have been part of the LLM ecosystem for some time, yet can demonstrate remarkably sophisticated capabilities.
While this definition sounds straightforward, the actual implementation of effective agents is far more complex. An effective agent combines observation, reasoning, action, and reflection in a continuous loop that can solve complex problems beyond what a simple LLM call could achieve. The environment the agent operates in - whether that's a code repository, a company's knowledge base, or the broader internet - largely determines what the agent can perceive and what actions it can take.
Anthropic offers a useful distinction between two types of agentic systems:
Interestingly, most reliable agentic systems in production today are workflows rather than fully dynamic agents. This reflects a crucial principle we've embraced at Kiseki Labs: always start with the simplest possible solution. Sometimes this means not using agentic systems at all, since they often require trade-offs around latency, cost, and reliability..
When deciding whether to implement an agentic system, consider whether the added complexity will deliver meaningful benefits. For many applications, optimising single LLM calls with retrieval and in-context examples might be sufficient. The cost and latency trade-offs of agentic systems should deliver meaningful improvements in task performance to justify their implementation.
Tools make agents powerful by extending their capabilities beyond mere text generation. These tools come in two primary varieties:
While adding more tools can expand an agent's capabilities, we've found that sometimes removing or consolidating tools actually improves performance. When an agent has too many tools at its disposal, it can struggle to choose the appropriate one for a given task.
The planning architecture of an agent determines how it approaches problems. It’s important to note that decoupling planning from execution is crucial for building reliable agents. This separation allows for:
In our experience, adding human-in-the-loop plan validation significantly improves results. This allows humans to catch potential issues before the agent starts executing potentially flawed plans.
Anthropic outlines five workflow patterns that have proven effective in production environments:
Think of prompt chaining like a relay team. Each LLM processes the output of the previous one, creating a sequence of specialised steps. This pattern works brilliantly for sequential tasks like generating marketing copy and then translating it into different languages.
Routing works like a smart traffic controller, directing different types of customer queries to specialised handlers. For instance, you can route simple queries to faster models like Claude 3.5 Haiku, while sending complex ones to Claude 3.7 Sonnet.
Parallelisation comes in two flavours:
The latter approach can deliver excellent results but watch your costs, as you'll consume more tokens when running multiple versions of the same task.
This pattern mirrors having a project manager who breaks down complex tasks and delegates them to specialists. At Kiseki Labs, we often use this pattern in our more complex agentic systems: the orchestrator is a planner agent that uses a reasoning model (like OpenAI's o1/o3-mini or DeepSeek R1), while the worker is a non-reasoning model (like GPT-4o).
The Evaluator-Optimiser pattern creates a feedback loop where one LLM generates content while another provides feedback. It's particularly powerful for tasks requiring nuance, like literary translation. At Kiseki Labs, we employ "Critic Agents" in some of our agentic systems to review the work performed by worker agents to ensure they meet acceptance criteria.
Reflection is a crucial component of effective agents. It involves reviewing the work just completed - either by the agent itself (self-reflection), by another agent, or by a human. This process creates a feedback loop that enables continuous improvement.
When an agent reflects on its actions and outcomes, it can:
Several frameworks have emerged to implement reflection in agentic systems. The ReAct framework (Reasoning + Acting) alternates between reasoning and actions, prompting agents to explain their thinking before acting and then analyse observations after each step. This creates a natural cycle of reflection throughout the agent's operation.
Another approach, Reflexion, separates reflection into two modules: an evaluator that assesses outcomes and a self-reflection module that analyses what went wrong. This separation of concerns makes it easier to improve each component independently.
Reflection can occur at multiple points in the agent workflow:
Evaluating outcomes and iterating with new plans or execution paths creates a powerful mechanism for agents to improve their performance over time. At Kiseki Labs, we've found that implementing robust reflection mechanisms is often the difference between agents that merely function and those that truly excel.
Building effective agents requires understanding their potential failure modes. Chip Huyen identifies three primary types:
These occur when the agent generates invalid or ineffective plans. Common planning failures include:
Tool failures happen when the correct tool is used, but the tool output is wrong. For example, an image captioner might return an incorrect description, or an SQL query generator might return a flawed query.
Efficiency failures occur when the agent produces a valid solution but takes far too many steps or uses excessive resources to reach it. This is currently one of the hardest failure modes to address.
Tool selection itself poses significant challenges. More tools expand capabilities yet can reduce effectiveness when the agent becomes overwhelmed with options. We recommend using LLMOps products (tools for monitoring and managing LLM applications) like LangSmith or Langfuse to understand what your agents are doing. And remember: less is often more when it comes to tools.
We've learned that just as significant effort goes into designing human-computer interfaces (HCI), we should invest similar energy in creating good agent-computer interfaces (ACI). This means thinking carefully about how tools are named, documented, and structured to make them intuitive for models to understand and use correctly.
When implementing agents, we recommend starting with direct LLM API calls rather than immediately adopting complex frameworks. While frameworks like LangGraph, Amazon Bedrock's AI Agent framework, Rivet, and Vellum can simplify implementation, they sometimes add layers of abstraction that obscure the underlying prompts and responses, making systems harder to debug.
For companies just beginning their journey with AI Agents, focus on use cases with clear success criteria, meaningful human oversight, and natural feedback loops. Customer support and coding assistance have emerged as particularly promising areas where agents can deliver significant value while maintaining appropriate guardrails.
Building effective AI Agents isn't about creating the most sophisticated system possible. Instead, it's about building the right system for your specific needs. As Anthropic emphasises, the core principles for success are:
At Kiseki Labs, we're excited to continue exploring and implementing these patterns as we help our clients build effective AI solutions. We believe that focusing on these fundamentals rather than chasing complexity will yield the most powerful and reliable AI Agents.
If you're looking to implement AI Agents in your organisation, reach out to us for a free consultation.
LLMs
March 19, 2025
This blog post explains what DeepSeek v3 is and why it matters for frontier LLM development.
Read more
AI Agents
March 18, 2025
We share the key insights from the excellent AI Engineer Summit 2025. With 'Agents at Work' as this year's focus, our takeaways revolve around AI Agents—from deployment and monitoring in production environments to the user experience of agentic systems. Read on to learn more.
Read more
Business
March 18, 2025
Drawing insights from Menlo Ventures' latest report surveying 600 enterprise leaders, we examine why AI spending surged 6x in 2024, how organisations are building their own AI capabilities, and what's driving value in enterprise AI adoption.
Read more