Transparent by Design
How We Calculate Your AI Costs
Every cent traced. No estimates, no surprises. Here is the exact math behind every cost we report — for every provider we support.
The Formula
Four steps, applied to every API call.
Base Token Cost
Fresh input tokens and output tokens, priced per model.
inputCost = regularInput × inputPrice / 1M
outputCost = outputTokens × outputPrice / 1MregularInput= inputTokens − cachedTokens − cacheWriteTokens
Cache Adjustments
Cached reads are cheaper. Cache writes may cost a premium.
cacheReadCost = cachedTokens × inputPrice × readMultiplier / 1M
cacheWriteCost = writeTokens × inputPrice × writeMultiplier / 1M| Provider | Cache Read | Cache Write (5m) | Cache Write (1h) |
|---|---|---|---|
| Anthropic | 0.1× | 1.25× | 2.0× |
| OpenAI (GPT-4.1 / o3 / o4-mini) | 0.25× | — | — |
| OpenAI (GPT-4o / o1) | 0.5× | — | — |
| 0.1× | Storage-based (per hour) | ||
| Groq | 0.5× | — | — |
| Mistral / Cohere | No caching available | ||
OpenAI and Groq cache writes are automatic at no extra cost. Google charges for cache storage per hour separately. Mistral and Cohere do not offer prompt caching.
Mode Multipliers
Applied to the token cost from steps 1 + 2.
tokenCost = inputCost + cacheReadCost + cacheWriteCost + outputCost| Condition | Multiplier | When |
|---|---|---|
| Batch API | 0.5× | is_batch_api: true |
| Long Context (Anthropic) | 2.0× input 1.5× output | input_tokens > 200,000 |
| Long Context (Google Pro) | 2.0× input 2.0× output | input_tokens > 200,000 (Flash exempt) |
| Fast Mode (Anthropic) | 6.0× | is_fast_mode: true |
Long context applies separately to input and output costs before they are summed. Other providers do not have documented long-context surcharges.
Tool Fees
Flat per-call fees added on top of token costs. Varies by provider.
toolCost = webSearchCount × searchFee(provider)
totalCost = tokenCost + toolCost| Provider | Web Search (per call) | Web Fetch |
|---|---|---|
| OpenAI | $0.010 | Free |
| Anthropic | $0.010 | Free |
| $0.014 | Free | |
| Groq | $0.005 | Free |
| Mistral / Cohere | — | — |
Web fetch is free for all providers — you only pay the token cost for fetched content.
What If You Don't Send a Field?
Every optional field has a safe default. You only need to send what you know.
| Field | Default | Effect |
|---|---|---|
input_tokens_cached | 0 | All input tokens billed at full price |
input_tokens_cache_write | 0 | No cache write premium applied |
cache_ttl | "5m" | Uses 1.25× write multiplier (not 2.0×) |
web_search_count | 0 | No search fees added |
web_fetch_count | 0 | No fees — web fetch is free (only token costs) |
is_batch_api | false | No 50% discount applied |
is_fast_mode | false | No 6× multiplier applied |
cost_usd | auto-calculated | Server computes cost from token counts + model price |
resolved_model | null | Falls back to model field for price lookup |
If the model is not found in our price database, cost_usd is left blank. You can always override by sending your own cost_usd value.
Worked Examples
Real numbers for Anthropic, OpenAI, and Google.
$3.00 / MTok input • $15.00 / MTok output • cache read 0.1× • cache write 2.0× (1h TTL)
Inputs
| input_tokens | 12,000 |
| cache_read_input_tokens | 8,000 |
| cache_creation_input_tokens | 2,000 |
| output_tokens | 500 |
| cache_ttl | "1h" |
| web_search_count | 2 |
Calculation
regularInput = 12,000 − 8,000 − 2,000 = 2,000
inputCost = 2,000 × $3.00 / 1M = $0.006000
cacheReadCost = 8,000 × $3.00 × 0.1 / 1M = $0.002400
cacheWriteCost = 2,000 × $3.00 × 2.0 / 1M = $0.012000
outputCost = 500 × $15.00 / 1M = $0.007500
tokenCost = $0.006000 + $0.002400 + $0.012000 + $0.007500
tokenCost = $0.027900
toolCost = 2 × $0.01 = $0.020000
totalCost = $0.027900 + $0.020000 = $0.047900$2.00 / MTok input • $8.00 / MTok output • cache read 0.25× (GPT-4.1 family)
Inputs
| input_tokens | 50,000 |
| input_tokens_cached | 40,000 |
| output_tokens | 1,000 |
| web_search_count | 1 |
Calculation
regularInput = 50,000 − 40,000 = 10,000
inputCost = 10,000 × $2.00 / 1M = $0.020000
cacheReadCost = 40,000 × $2.00 × 0.25 / 1M = $0.020000
outputCost = 1,000 × $8.00 / 1M = $0.008000
tokenCost = $0.020000 + $0.020000 + $0.008000
tokenCost = $0.048000
toolCost = 1 × $0.01 = $0.010000
totalCost = $0.048000 + $0.010000 = $0.058000$1.25 / MTok input • $10.00 / MTok output • long context: 2.0× input, 2.0× output
Inputs
| input_tokens | 250,000 |
| output_tokens | 2,000 |
Calculation
inputCost = 250,000 × $1.25 / 1M = $0.312500
outputCost = 2,000 × $10.00 / 1M = $0.020000
Long context (>200K): 2× input, 2× output
inputCost = $0.312500 × 2.0 = $0.625000
outputCost = $0.020000 × 2.0 = $0.040000
totalCost = $0.625000 + $0.040000 = $0.665000Provider Comparison
How pricing features differ across the six providers we track.
| Feature | OpenAI | Anthropic | Groq | Mistral | Cohere | |
|---|---|---|---|---|---|---|
| Cache Read Discount | 50–75% | 90% | ~90% | 50% | — | — |
| Cache Write Premium | None | 1.25–2.0× | Per-hour storage | None | — | — |
| Batch API | 50% off | 50% off | 50% off | 50% off | — | — |
| Long Context | — | 2× / 1.5× >200K | 2× / 2× >200K (Pro only) | — | — | — |
| Web Search Fee | $0.010 | $0.010 | $0.014 | $0.005 | — | — |
| Fast Mode | — | 6× | — | — | — | — |
| Thinking Tokens | Output rate | Output rate | Output rate | — | — | — |
What's the Same
- All providers use per-token pricing (input + output)
- Batch API is universally 50% off where available
- Thinking/reasoning tokens are billed at the output token rate
- Web fetch is free for all providers
What's Different
- Cache discounts range from 50% (OpenAI/Groq) to 90% (Anthropic/Google)
- Only Anthropic charges a cache write premium (1.25–2.0×)
- Long-context surcharges only apply to Anthropic and Google Pro (>200K tokens; Flash models exempt)
- Web search fees range from $0.005 (Groq) to $0.014 (Google) per call
- Fast mode (6×) is an Anthropic-only feature
Minimum Required Fields
To get an accurate auto-calculated cost, you need just these fields.
{
"provider": "anthropic",
"model": "claude-sonnet-4-5",
"input_tokens": 12000,
"output_tokens": 500,
"latency_ms": 1200,
"timestamp": "2026-03-08T10:00:00Z",
"tags": {
"task_type": "summarize",
"feature": "support_summary",
"route": "POST /api/summary"
}
}With just these fields, AISpendGuard looks up the model price and calculates the cost automatically. Add optional fields for more accuracy.