The action execution stack: what your AI agent needs to get work done

There's a standard diagram that appears in every AI agent explainer: a loop with a model, a set of tools, and an environment. The model calls tools. The tools return results. The model reasons. Repeat until done. The diagram is accurate. What it doesn't show is the infrastructure required to make each step of that loop reliable in production.

The agent stack, layer by layer

A production AI agent is not just a model and some tool definitions. It's a stack. From top to bottom:

1. The model layer

The LLM itself. Receives input context (system prompt, conversation history, tool schemas, observations), produces output (text, tool calls, or both). Managed by model providers like OpenAI, Anthropic, or Google. This layer is increasingly a commodity — powerful, cheap, and abstracted.

2. The reasoning / planning layer

The agent loop: how the model's outputs are parsed, how tool calls are dispatched, how observations are fed back into context, how the agent decides when it's done. This is the layer that frameworks like LangChain, LlamaIndex, and Autogen primarily address. It's where your business logic lives.

3. The action execution layer

This is the layer that actually does things. When the model says "browse this URL" or "run this code" or "send this email," something has to execute those instructions. That something is the action execution layer. It sits between the reasoning layer and the real world, and it needs to handle:

This layer is frequently underspecified in architecture diagrams and underinvested in by teams building agents. It's also the layer where the most consequential failures happen.

4. The observability layer

Logs, metrics, traces, and alerts. What did the agent do? How long did each action take? Which actions succeeded? Which failed? What inputs produced unexpected outputs? The observability layer is what separates a production agent from a prototype.

Why the action execution layer matters most

Of the four layers, action execution is the one teams most often get wrong — not because it's conceptually difficult, but because it looks simple until it isn't.

Running code in a subprocess is easy. Running code safely in a production environment that handles adversarial inputs, resource exhaustion, and escaped outputs is hard. Sending an HTTP request to an external API is easy. Managing credentials, handling rate limits, implementing idempotency, and surfacing useful errors to the model is hard. Launching a Playwright browser is easy. Managing browser sessions at scale, handling bot detection, extracting structured content reliably, and never leaking session state between users is hard.

The pattern repeats across every action type. The simple version takes an afternoon. The production version takes weeks.

The five action primitives

Most agent use cases reduce to five fundamental action types:

Every agent use case I've encountered can be decomposed into combinations of these five. Research agents use browse and files. Data processing agents use code and files. Notification agents use email and API. Orchestration agents use API and browse. The primitives are general. The combination is specific to your use case.

The audit layer

There's one more thing every action execution layer needs that often gets skipped: a complete audit log. Every action should produce a log entry with:

This log serves multiple purposes. Debugging: what exactly did the agent do when it produced that result? Compliance: what actions did the agent take on behalf of this user over the past 30 days? Safety: did the agent exceed its defined permission scope? Billing: how many actions of each type were executed this month?

Building this log correctly — across all action types, with consistent structure, with appropriate retention and access controls — is non-trivial. It's also non-negotiable for any agent running in a production environment.

An agent without an audit log is a black box operating on your behalf. That's fine for a demo. It's not acceptable for production.

What to build vs. what to buy

Teams should focus their engineering investment on the layers that differentiate their product: the reasoning and planning layer, the domain-specific tools that are unique to their use case, the user experience around agent capabilities.

The action execution layer — browser infrastructure, code sandboxes, email sending, API proxy, file handling — is infrastructure. It's expensive to build correctly, expensive to maintain, and it's not what makes your agent valuable. The right move for most teams is to treat it as solved infrastructure and focus on what makes their agent unique.

Agent Legs is the action execution layer. All five action primitives in one SDK. Full audit log included. Give your agent legs — not just a brain. Get early access.

Your agent has a brain.
Give it legs.

Free for 1,000 actions/month. No credit card required.

Get early access