April 19, 2026 · ~8 min read

Build an AI agent in any stack

An agent is one feedback loop with four moving parts. Here is what each part does in plain language, and the five mistakes that quietly break most of them.

TL;DR: An AI agent is one feedback loop with four parts: a model, some tools, a bit of memory, and a set of guardrails. The stack you build it in (Flutter, Next.js, Python, Firebase, a Rust CLI) does not change the design. It changes where files live and how you call the network. Understand the loop and you can build one.

What an AI agent actually is

A plain chatbot takes your message and sends back words. Useful for answers, not much else.

An agent is different. It can do things. It looks at your message, decides whether it needs to call a tool (search the web, query a database, send an email, read a file), runs that tool, reads the result, and keeps going until the job is done or it hits a stopping rule. Then it shows you the final answer.

That is the whole idea. Everything else is plumbing.

The trap is building an "agent" that is really just a chatbot in a fancy jacket. If your app asks a model for text and prints it, you do not have an agent. You have a wrapper.

The one loop that makes it an agent

Every real agent, in every stack, runs the same loop:

 ┌─────────────┐
 │ User asks   │
 └──────┬──────┘
        ▼
 ┌─────────────┐
 │    Model    │ ◀──────────────┐
 └──────┬──────┘                │
        ▼                       │
   Tool call?                   │
   │        │                   │
   no       yes                 │
   │        ▼                   │
   │   ┌──────────┐             │
   │   │ Permit?  │             │
   │   └────┬─────┘             │
   │        ▼                   │
   │   ┌──────────┐             │
   │   │ Run tool │             │
   │   └────┬─────┘             │
   │        ▼                   │
   │   ┌──────────────┐         │
   │   │ Append result│ ────────┘
   │   └──────────────┘
   ▼
 Answer

Five moves: the user asks, the model thinks, the model either answers or calls a tool, the tool runs (with a permission check for anything risky), the result flows back into the next model call. Repeat until the model stops.

You are not done when "it replies in English." You are done when it can say "I need to run tool X with these arguments," your app can actually run that tool, and the result can be fed back into the next model call.

The four pieces you wire up

Four parts. Skip one and your agent misbehaves in the same predictable way every time.

1. The brain

A model plus a system prompt. The system prompt is not a throwaway line like "You are a helpful assistant." It is where you set identity, tool rules, tone, and hard no-go zones. Treat it like a contract.

One useful habit: split your system context into two layers. A stable prefix (identity, safety rules, tool usage norms) stays the same across sessions and can be cached by the provider. A dynamic suffix (today's memory, which tools are available right now, the user's current environment) changes per session. Same idea works in Flutter assets, a Next.js /prompts folder, Firebase Remote Config, or a Python string template.

2. The tools

These are the things the agent can do beyond talk. Each tool needs a name, a short description, a JSON schema for its arguments, and an executor that actually runs the code.

Tool descriptions are prompts in disguise. "Database query" is a bad description. "Run a read-only SQL query against the orders table; returns at most 100 rows" is a good one. The model picks tools based on what you write. Every word counts.

3. The memory

Three things people mix up. Keep them separate.

  • Session transcript. Every message and tool result in the current chat. Grows fast.
  • Session summary. A short "here is what we have done so far" you write when the transcript gets long, so you can drop the old middle and keep the meaning.
  • Long-term memory. Facts that should survive after the chat ends. User preferences, decisions, a changelog of your app's state.

You almost never re-send all memory on every message. You send the current transcript, plus a small slice of long-term memory if your selector finds something relevant. Nothing else.

4. The guardrails

Before the agent runs anything that writes, deletes, sends, or spends money, it should pause and ask.

Minimum to show the user:

  • What the tool does, in one or two sentences.
  • Why the agent wants to run it.
  • What could go wrong.
  • A risk level: low, medium, or high.

Then approve, edit, or deny.

Low risk (read-only queries, fetching public docs) can auto-run. High risk (delete, push, payments, outbound emails) should never auto-run, no matter how sharp the agent feels that day. You also want a hard ceiling on turns and tokens per session, so a looping bug cannot run up your bill overnight.

Why the stack does not matter

Flutter, Next.js, Django, Firebase, a Rust CLI. Same four pieces, same loop. What changes:

  • Where the files live. A Flutter app ships prompts as bundled assets or a remote config blob. A Next.js app keeps them in a /prompts folder. A Firebase function pulls them from Remote Config.
  • How you call the network. Direct HTTPS from a trusted server. A Cloud Function from a mobile app, because your API key should never ship in a mobile binary.
  • Where tools run. Read-only stuff can run on device. Anything that holds a secret or touches production data runs on a server you control.

If someone tells you "you need framework X to build agents," they are selling framework X. You do not.

Where this goes wrong in real projects

Five boring, real failure modes, in order of how often I see them:

  1. Tool results never get sent back. The model asks for tool X, the tool runs, and the app prints the result to the user instead of feeding it into the next model call. The agent has no idea what happened.
  2. Context balloons every turn. The app pastes the entire tool output into the transcript forever. By turn ten, the model is reading 40k tokens of stale logs. Costs explode, quality drops.
  3. No max-turn stop. A small logic bug makes the model call the same tool in circles. Without a hard cap (say, 20 turns), it will run until your wallet cries.
  4. No permission gate. Someone demos the agent, it calls send_email on the first turn, the CEO gets the test email. Always gate writes.
  5. No evaluation. You change the prompt and it feels better. Three weeks later, a flow that worked in v1 is broken in v3. You needed regression tests from day one.

None of these are glamorous. All of them will bite you.

A minimum viable agent you can ship this week

If you have an afternoon and a real problem to automate, here is the order:

  1. Pick one job. Not five. One. "Summarise my unread Slack threads." "Book a meeting when someone emails me with a date." "Answer customer questions from our docs."
  2. Hook up the model API in your stack. Make it return streaming text first, tool calls second.
  3. Write exactly one tool the agent needs for that job. Real JSON schema, real executor, real error handling.
  4. Build the loop. Call model, check for tool_use, run the tool, append the result, call the model again, stop on plain text or max turns.
  5. Add a permission gate for that one tool. Even a blocking window.confirm beats nothing on day one.
  6. Write three test conversations and run them every time you change a prompt. That is your first eval. Keep it boring. Keep it passing.

Ship that. Then, only then, add a second tool, or memory, or a sub-agent.

When not to build an agent

Not every problem needs one. Three cases where a plain function call wins:

  • The task is deterministic. If the rule is "when X happens, send Y," write a function. An agent will be a slower, more expensive, less reliable version of the same if.
  • The task is a form. If the user will fill in five fields and press submit, give them five fields. A chat interface where they type the same five fields is a worse product.
  • The cost of a wrong answer is high and recovery is hard. Financial transfers, contract signing, medical dosing. You want boring, auditable, deterministic flows here.

The quick test: could you write a function that does this in 100 lines? Write the function.

Next steps

If you want to dig into the prompts side, the reconstructed catalogue at Leonxlnx/claude-code-system-prompts is a good study reference.

If you are building this into something real and want a second pair of eyes on the architecture, the permissions, or the evaluation story, I am easiest to reach from the contact section.

© 2026 Theophilus Rex Danquah. All rights reserved.