Transparent by Design

How We Calculate Your AI Costs

Every cent traced. No estimates, no surprises. Here is the exact math behind every cost we report — for every provider we support.

The Formula

Four steps, applied to every API call.

1

Base Token Cost

Fresh input tokens and output tokens, priced per model.

inputCost  = regularInput × inputPrice / 1M outputCost = outputTokens × outputPrice / 1M

regularInput= inputTokens − cachedTokens − cacheWriteTokens

2

Cache Adjustments

Cached reads are cheaper. Cache writes may cost a premium.

cacheReadCost  = cachedTokens × inputPrice × readMultiplier / 1M cacheWriteCost = writeTokens × inputPrice × writeMultiplier / 1M
ProviderCache ReadCache Write (5m)Cache Write (1h)
Anthropic0.1×1.25×2.0×
OpenAI (GPT-4.1 / o3 / o4-mini)0.25×
OpenAI (GPT-4o / o1)0.5×
Google0.1×Storage-based (per hour)
Groq0.5×
Mistral / CohereNo caching available

OpenAI and Groq cache writes are automatic at no extra cost. Google charges for cache storage per hour separately. Mistral and Cohere do not offer prompt caching.

3

Mode Multipliers

Applied to the token cost from steps 1 + 2.

tokenCost = inputCost + cacheReadCost + cacheWriteCost + outputCost
ConditionMultiplierWhen
Batch API0.5×is_batch_api: true
Long Context (Anthropic)2.0× input 1.5× outputinput_tokens > 200,000
Long Context (Google Pro)2.0× input 2.0× outputinput_tokens > 200,000 (Flash exempt)
Fast Mode (Anthropic)6.0×is_fast_mode: true

Long context applies separately to input and output costs before they are summed. Other providers do not have documented long-context surcharges.

4

Tool Fees

Flat per-call fees added on top of token costs. Varies by provider.

toolCost  = webSearchCount × searchFee(provider) totalCost = tokenCost + toolCost
ProviderWeb Search (per call)Web Fetch
OpenAI$0.010Free
Anthropic$0.010Free
Google$0.014Free
Groq$0.005Free
Mistral / Cohere

Web fetch is free for all providers — you only pay the token cost for fetched content.

What If You Don't Send a Field?

Every optional field has a safe default. You only need to send what you know.

FieldDefaultEffect
input_tokens_cached0All input tokens billed at full price
input_tokens_cache_write0No cache write premium applied
cache_ttl"5m"Uses 1.25× write multiplier (not 2.0×)
web_search_count0No search fees added
web_fetch_count0No fees — web fetch is free (only token costs)
is_batch_apifalseNo 50% discount applied
is_fast_modefalseNo 6× multiplier applied
cost_usdauto-calculatedServer computes cost from token counts + model price
resolved_modelnullFalls back to model field for price lookup

If the model is not found in our price database, cost_usd is left blank. You can always override by sending your own cost_usd value.

Worked Examples

Real numbers for Anthropic, OpenAI, and Google.

claude-sonnet-4-5AnthropicCache TTL: 1h

$3.00 / MTok input • $15.00 / MTok output • cache read 0.1× • cache write 2.0× (1h TTL)

Inputs

input_tokens12,000
cache_read_input_tokens8,000
cache_creation_input_tokens2,000
output_tokens500
cache_ttl"1h"
web_search_count2

Calculation

regularInput = 12,000 − 8,000 − 2,000 = 2,000 inputCost     = 2,000 × $3.00 / 1M = $0.006000 cacheReadCost = 8,000 × $3.00 × 0.1 / 1M = $0.002400 cacheWriteCost = 2,000 × $3.00 × 2.0 / 1M = $0.012000 outputCost    = 500 × $15.00 / 1M = $0.007500 tokenCost = $0.006000 + $0.002400 + $0.012000 + $0.007500 tokenCost = $0.027900 toolCost  = 2 × $0.01 = $0.020000 totalCost = $0.027900 + $0.020000 = $0.047900
Total cost for this call$0.0479
gpt-4.1OpenAI75% cache discount

$2.00 / MTok input • $8.00 / MTok output • cache read 0.25× (GPT-4.1 family)

Inputs

input_tokens50,000
input_tokens_cached40,000
output_tokens1,000
web_search_count1

Calculation

regularInput = 50,000 − 40,000 = 10,000 inputCost     = 10,000 × $2.00 / 1M = $0.020000 cacheReadCost = 40,000 × $2.00 × 0.25 / 1M = $0.020000 outputCost    = 1,000 × $8.00 / 1M = $0.008000 tokenCost = $0.020000 + $0.020000 + $0.008000 tokenCost = $0.048000 toolCost  = 1 × $0.01 = $0.010000 totalCost = $0.048000 + $0.010000 = $0.058000
Total cost for this call$0.0580
gemini-2.5-proGoogleLong Context > 200K

$1.25 / MTok input • $10.00 / MTok output • long context: 2.0× input, 2.0× output

Inputs

input_tokens250,000
output_tokens2,000

Calculation

inputCost  = 250,000 × $1.25 / 1M = $0.312500 outputCost = 2,000 × $10.00 / 1M = $0.020000 Long context (>200K): 2× input, 2× output inputCost  = $0.312500 × 2.0 = $0.625000 outputCost = $0.020000 × 2.0 = $0.040000 totalCost = $0.625000 + $0.040000 = $0.665000
Total cost for this call$0.6650

Provider Comparison

How pricing features differ across the six providers we track.

FeatureOpenAIAnthropicGoogleGroqMistralCohere
Cache Read Discount50–75%90%~90%50%
Cache Write PremiumNone1.25–2.0×Per-hour storageNone
Batch API50% off50% off50% off50% off
Long Context2× / 1.5× >200K2× / 2× >200K (Pro only)
Web Search Fee$0.010$0.010$0.014$0.005
Fast Mode
Thinking TokensOutput rateOutput rateOutput rate

What's the Same

  • All providers use per-token pricing (input + output)
  • Batch API is universally 50% off where available
  • Thinking/reasoning tokens are billed at the output token rate
  • Web fetch is free for all providers

What's Different

  • Cache discounts range from 50% (OpenAI/Groq) to 90% (Anthropic/Google)
  • Only Anthropic charges a cache write premium (1.25–2.0×)
  • Long-context surcharges only apply to Anthropic and Google Pro (>200K tokens; Flash models exempt)
  • Web search fees range from $0.005 (Groq) to $0.014 (Google) per call
  • Fast mode (6×) is an Anthropic-only feature

Minimum Required Fields

To get an accurate auto-calculated cost, you need just these fields.

{ "provider": "anthropic", "model": "claude-sonnet-4-5", "input_tokens": 12000, "output_tokens": 500, "latency_ms": 1200, "timestamp": "2026-03-08T10:00:00Z", "tags": { "task_type": "summarize", "feature": "support_summary", "route": "POST /api/summary" } }

With just these fields, AISpendGuard looks up the model price and calculates the cost automatically. Add optional fields for more accuracy.