The Problem: The Re-Writing Loop
When an agent has to work with data across multiple turns, it often ends up doing the following:
The Setup
Agent calls a tool and retrieves a large dataset (e.g., 50 customer records).
The Bottleneck
To pass this data to a tool for analysis or stream it to the user, it must re-write the entire JSON payload token-by-token.
This results in unnecessary time and token usage, because the model must rewrite and resend large structures every time it needs to pass data forward.
The Idea: Variable References
Instead of forcing the LLM to act as a data pipe, we let it handle pointers. When a tool runs, its output is automatically saved as a named variable (e.g., $customers_california). The core idea shows up in two concrete scenarios below.
When passing data to tools down the line
Instead of re-writing the full JSON payload, the agent passes a variable reference into the next tool call.
tool_call: analyze_cohort(
users: [
{ "id": "u_1", "visited": "2024-01-01", "duration": 120... },
{ "id": "u_2", "visited": "2024-01-01", "duration": 45... },
... 9,998 more records ...
]
)tool_call: analyze_cohort( users: $weekly_visits )
When streaming data back to the user
Instead of dumping the entire dataset into the chat, the agent streams a short summary and keeps the heavy data behind a variable.
Assistant: Here are the users who visited this week: 1. John Doe (2024-01-01) 2. Jane Smith (2024-01-02) 3. Bob Wilson (2024-01-02) ... [Stream continues for 45 seconds] ...
Assistant: Here are the users who visited this week: $weekly_visits Would you like to analyze their session duration?
Out setup them automatically renders these varaibles while calling the tools or when streaming happens.
Benchmark Results
Massive reduction in latency by skipping token generation.
Significant drop in total tokens used for the scenario.
Direct cost savings on per-token pricing models.
The Takeaway
Don't make the model re-write large tool outputs just to pass data along. Let it pass references instead.
Code implementation
Explore the full implementation, and code in the repository below.