tokens-usage

Guide

Your agent sends 1,500 tokens of tools before the user says a word

Function schemas are re-sent every turn. Measure tool token overhead across your agent loop — and decide what to trim before costs compound.

countAssistantTools toggle · Tool history vs schema measurement · Per-turn overhead visibility

The problem

Hidden overhead on every agent turn

OpenAI injects function definitions into the request on every call. They are billed as input tokens on each request. A moderate schema with five tools adds 800–1,200 tokens per request. Ten well-described tools run about 1,500 tokens per turn before the user message.

In a ten-step agent loop, that is 15,000 tokens of pure tool overhead — with zero user content growth. Most cost dashboards attribute spend to "prompts" without separating tool schema tax.

  • Tool schemas do not benefit from conversation truncation — they repeat every turn
  • Verbose parameter descriptions inflate counts linearly with tool count
  • Agent loops multiply overhead: 5 tools × 200 tokens × 10 turns = 10,000 tokens

The solution

Measure schema and history separately

Tool definitions and tool history are different costs. Use response.usage.prompt_tokens to measure schema overhead — compare identical messages with and without tools. Use countAssistantTools to toggle tool-call and tool-result blocks already in your messages array.

Run both measurements before deploying a new tool set. If overhead exceeds 15% of your context budget, consolidate tools, shorten descriptions, or pass a subset per route.

How it works

Three steps to a safe preflight

  1. 1

    Measure tool schema overhead via the API

    Send identical messages with and without your tools array. The prompt_tokens delta is your per-request schema tax — typically 100–200 tokens per tool.

  2. 2

    Measure tool history with countAssistantTools

    Pass messages that include function_call or tool_result blocks. Toggle countAssistantTools to see how much prior tool turns add to the count.

  3. 3

    Optimize before scaling traffic

    Shorten descriptions, merge related tools, use tool subsetting per route, or tool search on gpt-5.4+ for large catalogs.

Implementation

Production-ready code

Measure schema overhead (API) and tool history (countAssistantTools)

typescript
import { countTokens } from 'tokens-usage'
import OpenAI from 'openai'

const client = new OpenAI()
const messages = [{ role: 'user', content: 'Find me a laptop' }]
const tools = [/* your function definitions */]

// 1. Tool schema overhead — compare API usage with and without tools
const [withTools, withoutTools] = await Promise.all([
  client.chat.completions.create({ model: 'gpt-4o', messages, tools, max_tokens: 1 }),
  client.chat.completions.create({ model: 'gpt-4o', messages, max_tokens: 1 }),
])
const schemaOverhead =
  withTools.usage!.prompt_tokens - withoutTools.usage!.prompt_tokens

// 2. Tool history overhead — toggle countAssistantTools on message blocks
const history = [
  { role: 'user', content: 'Find me a laptop' },
  { type: 'function_call', call_id: 'c1', name: 'search', arguments: '{}' },
  { type: 'function_call_output', call_id: 'c1', output: '{"results":[]}' },
]

const withHistory = await countTokens({
  provider: 'openai',
  model: 'gpt-4o',
  content: history,
  countAssistantTools: true,
})

const withoutHistory = await countTokens({
  provider: 'openai',
  model: 'gpt-4o',
  content: history,
  countAssistantTools: false,
})

const historyOverhead = withHistory.tokens - withoutHistory.tokens

console.log(`Schema overhead: ${schemaOverhead} tokens/request`)
console.log(`History overhead: ${historyOverhead} tokens in current messages`)

Deep dive

The math on agent loop overhead

At GPT-4o input pricing, 15,000 overhead tokens across a session is real spend — before counting user content, system prompts, or tool results from web search.

Scenario Tokens per turn 10-turn total
5 tools, moderate schema~1,000~10,000
10 tools, verbose descriptions~1,500~15,000
20 tools (accuracy risk zone)~3,000+~30,000+

Optimization strategies that actually work

Cutting a five-line tool description to one line saves those tokens on every request forever. Optimize once during development, not after the bill arrives.

  • Tool subsetting: pass 4–6 relevant tools per route instead of the full catalog
  • Consolidation: one tool with an action parameter vs five separate tools
  • Tool search (gpt-5.4+): defer definitions until the model queries the index
  • Prompt caching: can reduce cost on stable prefixes — tool schemas may not cache depending on request layout

Anthropic tool_use blocks

Anthropic counts tool definitions, tool_use, and tool_result content in count_tokens. Server tool token counts apply to the first sampling call only. tokens-usage handles Anthropic native payloads with the same countTokens interface.

FAQ

Common questions

What does countAssistantTools do?

When true (default), function_call, tool_use, and functionResponse blocks in your messages are included in the count. Set false to exclude tool history. It does not measure tool definition schemas — use response.usage for that.

Are tool definitions cached by OpenAI?

Tool schemas are re-sent on every request and count as input tokens each time. Prompt caching may reduce cost on stable request prefixes, but do not assume schemas are free — measure with response.usage.

How many tools is too many?

Accuracy degrades beyond 10–20 tools per request. If overhead exceeds 15% of your context budget, split into sub-agents or use tool search on supported models.

Do tool results count as input tokens on the next turn?

Yes. Tool results become part of conversation history and are billed as input on every subsequent call. Truncate large tool outputs before appending to messages.

Can I estimate USD cost for tool-heavy requests?

Yes. tokens-usage returns a price field with USD estimate when the model exists in the pricing table.

Start counting before you send

Add tokens-usage to your stack today. Source-available license — see LICENSE.md for terms.

npm install tokens-usage