Code execution in AI agents: the sandboxing problem nobody talks about

When people talk about AI agents that can write and run code, the conversation usually focuses on the writing part. What LLM is being used? What language? How does it handle edge cases? The running part — the actual execution of model-generated code — gets far less attention. This is a mistake.

The threat model for agent code execution

Consider what it means to let an AI agent execute arbitrary code. The agent receives a task, reasons about how to accomplish it, writes code to do so, and runs it. In the best case, the code is correct and produces the intended result. In the less-than-best case, any of the following can happen:

The model generates code with unintended side effects — files deleted, data overwritten, services called
A prompt injection attack causes the model to generate malicious code
A runaway loop consumes all available CPU or memory
The code attempts to access the network, the filesystem, or other services it shouldn't touch
An error leaks sensitive information from the execution environment into the model's context

None of these are hypothetical. Each of them has happened, and will happen again, to teams that implement code execution without a proper sandbox.

What a sandbox actually needs to do

A code sandbox for agent use isn't just a Docker container. It's a set of isolation guarantees:

Process isolation: The executing code cannot affect other processes on the host system
Filesystem isolation: The code can only read and write within a scoped, ephemeral filesystem — not the host filesystem
Network isolation: Configurable. Sometimes agents need outbound network access; often they don't. Either way, it must be explicit
Resource limits: CPU time, memory, and execution timeout must be enforced at the kernel level — not just checked in application code
Environment isolation: Secrets, environment variables, and credentials from the host must not be visible inside the sandbox
Output capture: Stdout, stderr, and return values must be captured and returned as structured data the agent can reason over

The implementation problem

Most teams that decide to build their own code sandbox start with Docker. Docker is the obvious tool — it provides process and filesystem isolation out of the box. But production-grade sandbox infrastructure for an agent requires more than a Dockerfile:

You need container lifecycle management. A new container per execution is safe but slow. A pool of warm containers is fast but requires careful state management — a container that ran malicious code in the previous request cannot be reused. You need to solve the warm-start vs. clean-state tradeoff.

You need resource limit enforcement that survives adversarial inputs. Standard Docker resource limits are soft limits — they can be evaded by certain execution patterns. Production sandboxes need kernel-level enforcement via cgroups v2.

You need graceful timeout handling. A code execution that times out must terminate cleanly, return a useful error to the model, and not leave zombie processes or leaked resources on the host.

You need output sanitization. Code that prints sensitive data (credentials it discovered through filesystem traversal, environment variables it found) must not have that data surfaced to the model unfiltered.

The silent failure mode

The most dangerous aspect of the sandboxing problem is how silently it can fail. An agent that executes code without a proper sandbox might work fine in testing, where the code is benign and the environment is controlled. The failures arrive in production, when the code is less benign and the environment matters.

A team that ships "exec the model's code in a subprocess" in production has not shipped code execution — they've shipped a liability. The difference between that and a real sandbox is not visible in any benchmark or demo. It's visible exactly once, when something goes wrong.

The right abstraction

For most agent developers, the right approach is the same as with browser automation: treat code execution as a primitive, not a DIY infrastructure problem. The agent calls run_code(language, script). The execution layer handles sandbox provisioning, resource limits, isolation, and returning structured output.

result = await legs.run_code(
    language="python",
    code="import pandas as pd\ndf = pd.read_csv('/data/report.csv')\nprint(df.describe().to_json())"
)
# Returns: {"stdout": "{...}", "exit_code": 0, "elapsed_ms": 342}

The agent gets a clean result. It never needs to know about cgroups, container pools, or output sanitization. That's exactly where this complexity belongs — hidden behind a well-designed abstraction, maintained by people who think about nothing else.

Agent Legs provides sandboxed code execution as a first-class action type. Python, JavaScript, Bash — with resource limits, process isolation, and structured output. No containers to manage. Get early access.

Code execution in AI agents: the sandboxing problem nobody talks about

The threat model for agent code execution

What a sandbox actually needs to do

The implementation problem

The silent failure mode

The right abstraction

Your agent has a brain.Give it legs.

Your agent has a brain.
Give it legs.