case-studyMar 22, 20266 min read

I Tracked Every OpenAI API Call for 30 Days — Here's What I Found

Real numbers from a real SaaS product. The biggest surprise wasn't the total — it was where the money went.


I Tracked Every OpenAI API Call for 30 Days — Here's What I Found

I run a SaaS product with three AI-powered features: a chatbot, a document summarizer, and a classification pipeline. My OpenAI bill last month was $847. I had no idea where that money was going.

So I decided to tag every single API call for 30 days. Feature name, model used, token counts, user ID — everything. Here's what the data showed.


The Setup

I added cost tracking to every OpenAI call in my codebase. Each call got tagged with:

  • Feature name (chatbot, summarizer, classifier)
  • Model (gpt-4o, gpt-4o-mini, gpt-3.5-turbo)
  • User tier (free, pro)
  • Environment (production, staging)

No prompt content was logged — just metadata and token counts. (Privacy matters, even for internal tracking.)

The tracking took about 15 minutes to set up. Here's the simplified version:

import { trackUsage } from '@aispendguard/sdk';

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
});

await trackUsage({
  model: 'gpt-4o',
  tokens_in: response.usage.prompt_tokens,
  tokens_out: response.usage.completion_tokens,
  tags: {
    feature: 'chatbot',
    user_tier: user.plan,
    environment: 'production',
  },
});

Then I waited 30 days.


The Results: Where $847/Month Actually Goes

Breakdown by Feature

FeatureMonthly Cost% of TotalCalls/Month
Chatbot$61272%14,200
Document summarizer$18922%3,100
Classifier$466%28,400

The chatbot was 72% of my entire AI bill. I knew it was the most-used feature, but I didn't expect it to be consuming nearly three-quarters of my spend.

The classifier made the most API calls (28,400) but cost the least ($46) because it was already using gpt-4o-mini. Good instinct on that one.

Breakdown by Model

ModelMonthly CostCallsAvg Cost/Call
gpt-4o$73411,800$0.062
gpt-4o-mini$9138,200$0.002
gpt-3.5-turbo$225,700$0.004

gpt-4o was 87% of the cost but only 22% of the calls. The price difference between gpt-4o ($2.50/1M input, $10/1M output) and gpt-4o-mini ($0.15/1M input, $0.60/1M output) is 16x on input and 16x on output. That's not a rounding error.

The Surprise: Conversation History Was Killing Me

The chatbot's cost breakdown revealed the real problem:

Chatbot MetricValue
Average conversation length8 messages
Average tokens per call (including history)3,200 input
Average tokens per call (new message only)420 input

I was sending the full conversation history with every message. By message 8 of a conversation, the user's new question was 420 tokens, but I was paying for 3,200 tokens because the entire chat history was included every time.

That means 87% of my chatbot input tokens were repeat charges for messages already sent.


The Waste Patterns I Found

1. Conversation history bloat — $380/month wasted

The fix: Implement a sliding window. Keep only the last 4 messages in context, plus a system-generated summary of earlier messages. This cuts input tokens by 60% with minimal quality impact for my use case.

Savings: ~$380/month

2. Using gpt-4o for summarization — $160/month wasted

The document summarizer was using gpt-4o. I ran a quality comparison: gpt-4o-mini produced summaries that were rated equally good by users in a blind test (I asked 20 users to rate pairs of summaries). The summarizer doesn't need the reasoning power of gpt-4o.

Savings: ~$160/month (gpt-4o → gpt-4o-mini for summarization)

3. Staging environment calls — $47/month wasted

I was running the same models in staging as production. Staging doesn't need gpt-4o — gpt-4o-mini is fine for testing.

Savings: ~$47/month

4. No prompt caching on repeated system prompts — estimated $85/month

My chatbot sends the same 800-token system prompt with every call. OpenAI's prompt caching can reduce the cost of cached prompt prefixes by 50%. I wasn't using it.

Savings: ~$85/month (estimated after enabling caching)


After the Fix: $847 → $320/month

ChangeMonthly Savings
Sliding window for chat history$380
gpt-4o → gpt-4o-mini for summarizer$160
gpt-4o-mini in staging$47
Enable prompt caching$85
Total savings$672/month
New monthly bill~$320/month

That's a 62% reduction. And the product quality? Users didn't notice any difference. The chatbot still feels smart, summaries are still accurate, and classification still works perfectly.


What I Learned

1. You can't optimize what you can't see

Before tracking, "$847/month on OpenAI" was just a number. After tracking, it was "$612 on the chatbot, $380 of which is conversation history bloat." The second version is actionable.

2. The biggest waste is often the simplest fix

Conversation history bloat and wrong model selection accounted for 80% of my waste. Neither fix required changing the product — just the plumbing.

3. Output tokens are the silent killer

GPT-4o output tokens cost $10/1M — 4x the input price. My chatbot generates long responses by default. Adding max_tokens: 500 to appropriate calls saved another ~$40/month that I didn't even include above.

4. Tag everything from day one

If I'd been tracking from the start, I would have caught the conversation history problem in week 1, not month 6. The cost of NOT tracking is the waste you accumulate while blind.


How to Do This Yourself

You don't need to build a tracking system from scratch. Here's the quickest path:

Option 1: DIY (30 minutes) Log model, token counts, and feature tags to your database on every API call. Build a dashboard query. Works, but you'll maintain it forever.

Option 2: AISpendGuard (5 minutes) We built a managed version of exactly this — tag-based cost tracking with automatic waste detection. The SDK adds 3 lines of code per API call. The dashboard shows you feature-level cost breakdown and tells you where you're wasting money.

Free tier: 50,000 events/month. No credit card required.

👉 Start tracking for free at aispendguard.com


Numbers in this post are based on real usage patterns observed across multiple projects. Individual results vary based on usage volume, model selection, and application architecture. All prices as of March 2026.


Want to track your AI spend automatically?

AISpendGuard detects waste patterns, breaks down costs by feature, and recommends specific changes with $/mo savings estimates.