Wiring AI agents to third-party APIs: a practical guide to tool calling

Tool calling is the mechanism by which AI agents interact with the outside world. The model decides it needs information or needs to take an action; it calls a tool; the tool returns a result; the model continues reasoning. In theory, clean. In practice, API tool calling has a long list of failure modes that most guides don't cover.

The anatomy of a tool call

In most agent frameworks, a tool is a function with a JSON Schema describing its inputs. The model receives the schema as part of its context and can invoke the tool by generating a JSON object that matches it. Your code receives the invocation, executes the function, and returns the result to the model's context.

For simple, local tools — math operations, string manipulation, in-memory lookups — this works cleanly. The function is deterministic, fast, and has no side effects. For tools that call external APIs, the picture is more complicated.

Authentication is harder than it looks

The first problem with API tool calls is authentication. Every external API requires credentials: API keys, OAuth tokens, service account credentials. These need to be available at call time, stored securely, and never visible to the model itself.

This is not a solved problem in most agent frameworks. If your tool function reads credentials from environment variables, those environment variables need to be available in whatever context the agent is running — which means your deployment and secret management story needs to be solid. If you hard-code credentials in the tool definition (which you should never do), they end up in logs, in the model's context, and potentially in error messages.

The right architecture separates credential management from tool logic. Credentials are injected into the execution layer. The tool function declares what credential it needs; the execution layer resolves and provides it. The model never sees the credential, and neither does your application code.

Rate limits and retries

External APIs have rate limits. Your agent will hit them. A naive tool implementation returns a 429 error to the model, which then has to decide what to do — usually, it either gives up or retries in a loop that makes things worse.

Production tool implementations need automatic retry with exponential backoff, rate limit detection (both by HTTP status code and by X-RateLimit headers), and graceful degradation. When a rate limit is hit, the tool should retry with appropriate backoff, and return a structured error to the model only if retries are exhausted — with clear information about what failed and when the agent should try again.

Structuring API responses for the model

Raw API responses are often poorly suited for model consumption. A Salesforce API response might return 200 fields when the agent needs 3. A Stripe response might nest the relevant data three levels deep. An error response might use a proprietary error code that the model doesn't know how to interpret.

Tool implementations should transform API responses into forms the model can reason over efficiently. This means:

Filtering to only the fields the agent is likely to need
Flattening nested structures where possible
Translating error codes into natural language descriptions
Adding context that helps the model understand what the response means

Idempotency and side effects

Write operations through API tools are particularly dangerous because they have side effects. Calling a Stripe API to create a charge is not like calling a weather API — the first call creates a charge, and calling it again creates another charge. If the agent retries a tool call because it didn't receive a clear result, you may end up with duplicate operations.

For write operations, implement idempotency keys. Pass a deterministic key derived from the task ID and the action to the external API. If the call succeeds but the response is lost, the retry with the same key will return the original result rather than creating a duplicate.

Logging every call

Every external API call your agent makes should be logged: the tool name, the inputs, the response (sanitized for credentials), the duration, and the result. This log is your debugging surface. When your agent does something unexpected — charges the wrong amount, sends the wrong email, creates the wrong record — the audit log is how you figure out what happened and why.

Agent Legs provides authenticated outbound API calls as a first-class action type. Secret management, automatic retries, rate limit handling, structured response parsing, and full audit logging — built in. Get early access.

Wiring AI agents to third-party APIs: a practical guide to tool calling

The anatomy of a tool call

Authentication is harder than it looks

Rate limits and retries

Structuring API responses for the model

Idempotency and side effects

Logging every call

Your agent has a brain.Give it legs.

Your agent has a brain.
Give it legs.