Optimization5 min read

Pass by Reference:
Making AI Agents Faster & Cheaper.

A pattern for making multi-turn AI agents faster and cheaper by letting them pass data by reference instead of forcing the model to re-write the same tool outputs over and over.

The Problem: The Re-Writing Loop

When an agent has to work with data across multiple turns, it often ends up doing the following:

Step 01

The Setup

Agent calls a tool and retrieves a large dataset (e.g., 50 customer records).

> Output: [{ "id": 1 ... }, 50 records ]
Step 02

The Bottleneck

To pass this data to a tool for analysis or stream it to the user, it must re-write the entire JSON payload token-by-token.

Re-Writing JSON...

This results in unnecessary time and token usage, because the model must rewrite and resend large structures every time it needs to pass data forward.

The Idea: Variable References

Instead of forcing the LLM to act as a data pipe, we let it handle pointers. When a tool runs, its output is automatically saved as a named variable (e.g., $customers_california). The core idea shows up in two concrete scenarios below.

When passing data to tools down the line

Instead of re-writing the full JSON payload, the agent passes a variable reference into the next tool call.

Without Variables
RE-WRITING DATA
tool_call: analyze_cohort(
  users: [
    { "id": "u_1", "visited": "2024-01-01", "duration": 120... },
    { "id": "u_2", "visited": "2024-01-01", "duration": 45... },
    ... 9,998 more records ...
  ]
)
❌ 12s Latency (Writing JSON)
With Variables
PASSING REFERENCE
tool_call: analyze_cohort(
  users: $weekly_visits
)
✅ 0.2s Latency (Writing Pointer)

When streaming data back to the user

Instead of dumping the entire dataset into the chat, the agent streams a short summary and keeps the heavy data behind a variable.

Without Variables
POLLUTING CONTEXT
Assistant: Here are the users who visited this week:
1. John Doe (2024-01-01)
2. Jane Smith (2024-01-02)
3. Bob Wilson (2024-01-02)
... [Stream continues for 45 seconds] ...
❌ 15k Tokens Wasted
With Variables
CLEAN HANDOFF
Assistant: Here are the users who visited this week:
$weekly_visits

Would you like to analyze their session duration?
✅ 50 Tokens Used

Out setup them automatically renders these varaibles while calling the tools or when streaming happens.

Benchmark Results

Response Time
-92.8%

Massive reduction in latency by skipping token generation.

Total Tokens
-82.4%

Significant drop in total tokens used for the scenario.

Estimated Cost
-87.1%

Direct cost savings on per-token pricing models.

The Takeaway

Don't make the model re-write large tool outputs just to pass data along. Let it pass references instead.

Open Source

Code implementation

Explore the full implementation, and code in the repository below.

View Repository

Want to build better agents? Connect with us to use our specialized tools.