Contents

    AI Agents

    Key Principles for Building Effective AI Agents

    AI Agents represent one of the most promising frontiers for applying large language models to complex tasks. As organisations increasingly seek to leverage AI for automating workflows and solving multi-step problems, understanding how to build effective agents becomes crucial.

    In this post, we're bringing you practical frameworks and implementation patterns that we came across in Chip Huyen's work and Anthropic's guidance on implementing AI agents, complemented by our own experience at Kiseki Labs from building our AI agent libraries. For background on why these systems matter, see our exploration of why AI Agents are necessary for unlocking LLM capabilities.

    What Are AI Agents?

    At their core, AI Agents are surprisingly simple. An Agent is essentially something that observes and acts in an environment, with an LLM as its brain. Despite this simplicity, we believe these systems have the potential to fundamentally transform software development. They're built on familiar concepts like self-critique and chain-of-thought reasoning that have been part of the LLM ecosystem for some time, yet can demonstrate remarkably sophisticated capabilities.

    While this definition sounds straightforward, the actual implementation of effective agents is far more complex. An effective agent combines observation, reasoning, action, and reflection in a continuous loop that can solve complex problems beyond what a simple LLM call could achieve. The environment the agent operates in - whether that's a code repository, a company's knowledge base, or the broader internet - largely determines what the agent can perceive and what actions it can take.

    Anthropic offers a useful distinction between two types of agentic systems:

    • Workflows orchestrate LLMs and tools through predefined code paths. These are more deterministic and follow established patterns that developers define in advance.
    • Agents dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. They have more autonomy in deciding what steps to take and when.

    Interestingly, most reliable agentic systems in production today are workflows rather than fully dynamic agents. This reflects a crucial principle we've embraced at Kiseki Labs: always start with the simplest possible solution. Sometimes this means not using agentic systems at all, since they often require trade-offs around latency, cost, and reliability..

    When deciding whether to implement an agentic system, consider whether the added complexity will deliver meaningful benefits. For many applications, optimising single LLM calls with retrieval and in-context examples might be sufficient. The cost and latency trade-offs of agentic systems should deliver meaningful improvements in task performance to justify their implementation.

    The Building Blocks of Effective Agents

    Tools give agents power

    Tools make agents powerful by extending their capabilities beyond mere text generation. These tools come in two primary varieties:

    • Read-only tools (for observing): These enable agents to perceive their environment, like retrieving information from knowledge bases, browsing the web, or accessing user data.
    • Write actions (for changing things): These allow agents to take actions that modify their environment, such as updating databases, sending emails, or posting content.

    While adding more tools can expand an agent's capabilities, we've found that sometimes removing or consolidating tools actually improves performance. When an agent has too many tools at its disposal, it can struggle to choose the appropriate one for a given task.

    Planning Architecture

    The planning architecture of an agent determines how it approaches problems. It’s important to note that decoupling planning from execution is crucial for building reliable agents. This separation allows for:

    • Intent classification to select appropriate tools for each task
    • Validation of plans before execution
    • Efficient handling of errors and unexpected outcomes

    In our experience, adding human-in-the-loop plan validation significantly improves results. This allows humans to catch potential issues before the agent starts executing potentially flawed plans.

    Practical Workflow Patterns

    Anthropic outlines five workflow patterns that have proven effective in production environments:

    1. Prompt Chaining

    Think of prompt chaining like a relay team. Each LLM processes the output of the previous one, creating a sequence of specialised steps. This pattern works brilliantly for sequential tasks like generating marketing copy and then translating it into different languages.

    2. Routing

    Routing works like a smart traffic controller, directing different types of customer queries to specialised handlers. For instance, you can route simple queries to faster models like Claude 3.5 Haiku, while sending complex ones to Claude 3.7 Sonnet.

    3. Parallelisation

    Parallelisation comes in two flavours:

    • Breaking tasks into parallel subtasks that can run simultaneously
    • Running multiple attempts to get higher confidence in the results

    The latter approach can deliver excellent results but watch your costs, as you'll consume more tokens when running multiple versions of the same task.

    4. Orchestrator-Workers

    This pattern mirrors having a project manager who breaks down complex tasks and delegates them to specialists. At Kiseki Labs, we often use this pattern in our more complex agentic systems: the orchestrator is a planner agent that uses a reasoning model (like OpenAI's o1/o3-mini or DeepSeek R1), while the worker is a non-reasoning model (like GPT-4o).

    5. Evaluator-Optimiser

    The Evaluator-Optimiser pattern creates a feedback loop where one LLM generates content while another provides feedback. It's particularly powerful for tasks requiring nuance, like literary translation. At Kiseki Labs, we employ "Critic Agents" in some of our agentic systems to review the work performed by worker agents to ensure they meet acceptance criteria.

    Source: Anthropic - Agentic Workflow

    Don’t skimp on Reflection

    Reflection is a crucial component of effective agents. It involves reviewing the work just completed - either by the agent itself (self-reflection), by another agent, or by a human. This process creates a feedback loop that enables continuous improvement.

    When an agent reflects on its actions and outcomes, it can:

    • Identify errors in reasoning or execution
    • Learn from mistakes
    • Generate better plans for similar tasks in the future
    • Determine when a task is truly complete

    Several frameworks have emerged to implement reflection in agentic systems. The ReAct framework (Reasoning + Acting) alternates between reasoning and actions, prompting agents to explain their thinking before acting and then analyse observations after each step. This creates a natural cycle of reflection throughout the agent's operation.

    Another approach, Reflexion, separates reflection into two modules: an evaluator that assesses outcomes and a self-reflection module that analyses what went wrong. This separation of concerns makes it easier to improve each component independently.

    Reflection can occur at multiple points in the agent workflow:

    • After receiving a user query to evaluate if the request is feasible
    • After generating an initial plan to evaluate if it makes sense
    • After each execution step to evaluate progress
    • After completing the full plan to determine if the task has been successfully accomplished

    Evaluating outcomes and iterating with new plans or execution paths creates a powerful mechanism for agents to improve their performance over time. At Kiseki Labs, we've found that implementing robust reflection mechanisms is often the difference between agents that merely function and those that truly excel.

    Common Challenges and Failure Modes

    Building effective agents requires understanding their potential failure modes. Chip Huyen identifies three primary types:

    1. Planning Failures

    These occur when the agent generates invalid or ineffective plans. Common planning failures include:

    • Calling tools that don't exist
    • Using valid tools with invalid parameters
    • Creating plans that don't actually solve the intended task

    2. Tool Failures

    Tool failures happen when the correct tool is used, but the tool output is wrong. For example, an image captioner might return an incorrect description, or an SQL query generator might return a flawed query.

    3. Efficiency Failures

    Efficiency failures occur when the agent produces a valid solution but takes far too many steps or uses excessive resources to reach it. This is currently one of the hardest failure modes to address.

    Tool selection itself poses significant challenges. More tools expand capabilities yet can reduce effectiveness when the agent becomes overwhelmed with options. We recommend using LLMOps products (tools for monitoring and managing LLM applications) like LangSmith or Langfuse to understand what your agents are doing. And remember: less is often more when it comes to tools.

    Lessons Learned

    We've learned that just as significant effort goes into designing human-computer interfaces (HCI), we should invest similar energy in creating good agent-computer interfaces (ACI). This means thinking carefully about how tools are named, documented, and structured to make them intuitive for models to understand and use correctly.

    When implementing agents, we recommend starting with direct LLM API calls rather than immediately adopting complex frameworks. While frameworks like LangGraph, Amazon Bedrock's AI Agent framework, Rivet, and Vellum can simplify implementation, they sometimes add layers of abstraction that obscure the underlying prompts and responses, making systems harder to debug.

    For companies just beginning their journey with AI Agents, focus on use cases with clear success criteria, meaningful human oversight, and natural feedback loops. Customer support and coding assistance have emerged as particularly promising areas where agents can deliver significant value while maintaining appropriate guardrails.

    Conclusion

    Building effective AI Agents isn't about creating the most sophisticated system possible. Instead, it's about building the right system for your specific needs. As Anthropic emphasises, the core principles for success are:

    • Keep your design simple
    • Make your planning steps transparent
    • Test your agent-computer interface thoroughly

    At Kiseki Labs, we're excited to continue exploring and implementing these patterns as we help our clients build effective AI solutions. We believe that focusing on these fundamentals rather than chasing complexity will yield the most powerful and reliable AI Agents.

    If you're looking to implement AI Agents in your organisation, reach out to us for a free consultation.

    More like this

    LLMs

    March 19, 2025

    Why DeepSeek v3 matters in the world of LLMs

    This blog post explains what DeepSeek v3 is and why it matters for frontier LLM development.

    Read more

    AI Agents

    March 18, 2025

    AI Engineer Summit 2025: 8 Key Trends Shaping the Year of Agents

    We share the key insights from the excellent AI Engineer Summit 2025. With 'Agents at Work' as this year's focus, our takeaways revolve around AI Agents—from deployment and monitoring in production environments to the user experience of agentic systems. Read on to learn more.

    Read more

    Business

    March 18, 2025

    What Menlo Ventures’ 2024 Report Reveals About Generative AI in the Enterprise

    Drawing insights from Menlo Ventures' latest report surveying 600 enterprise leaders, we examine why AI spending surged 6x in 2024, how organisations are building their own AI capabilities, and what's driving value in enterprise AI adoption.

    Read more