Documentation

Start From Zero To Hero

Set up workspace, connect SDK, send events, and read savings insights.

Workspace Setup

How It Works

Create workspace + key

In /settings/workspace, create workspace and API key.

Instrument your app

Add tags-only tracking in your AI request handlers.

Send usage events

Use SDK or direct HTTP to POST /api/ingest.

Run rollups

Daily rollups aggregate usage by provider/feature/route/task.

Act on savings

Dashboard shows where spend concentrates and what to optimize first.

Limits & Guardrails

Required tags: task_type, feature, route.
Custom tags are allowed and auto-accepted if key is lowercase snake_case.
Custom tag values can be string or string[].
Limits: max 24 tags/event, max 16 values in one array tag, max 120 chars/value.
Prompt/content/output-like fields are blocked by privacy guard.

task_type Reference

Pick the value that best describes what the model is being asked to produce. The right task_typeis how AISpendGuard knows when you're using a more expensive model than the task actually needs.

Value	What it does	Typical output	Best model tier	Batch-safe
`answer`	Direct Q&A, RAG-backed responses, knowledge retrieval	100–800 tok	standard	✗ user-facing
`classify`	Label, categorize, detect intent, route to a bucket	1–10 tok	micro	✓ strong
`extract`	Pull structured fields from unstructured text	50–300 tok	micro	✓ yes
`summarize`	Condense long content, TLDR, bullet points	100–500 tok	standard	✓ yes
`generate`	Write or draft new content, creative writing, ideation	300–2000 tok	standard	✓ yes
`rewrite`	Paraphrase, tone-adjust, edit existing text	≈ input	standard	✓ yes
`translate`	Translate between languages	≈ input	micro	✓ yes
`code`	Generate, complete, review, or explain code	200–1500 tok	premium	✓ yes
`eval`	LLM-as-judge, quality scoring, test assertions	10–50 tok	micro	✓ best candidate
`embed`	Text embedding / vector generation	fixed vector	embedding	✓ strong
`route`	Decide which tool, agent, or path to take next	1–20 tok	micro	✓ yes
`plan`	Decompose a goal into subtasks, strategy reasoning	100–500 tok	premium	✓ yes
`agent_step`	Single step inside a multi-step agent loop	50–800 tok	varies	usually ✗
`vision`	Understand images, PDFs, screenshots (multimodal)	100–600 tok	standard	✓ yes
`chat`	Multi-turn stateful conversation (not one-shot Q&A)	100–500 tok	standard	✗ real-time
`other`	None of the above — avoid, reduces waste detection quality	—	—	—

Model tiers

microhaiku / gpt-4o-mini / flash-lite — short output, high volume, 80–95% cheaper than premium

standardsonnet / gpt-4o / flash — versatile, best quality/cost ratio for most tasks

premiumopus / o1 / o3 / gpt-4-turbo — complex reasoning, nuanced code, agent planning

embeddingtext-embedding-3-small / embed-english-v3 — vectors only, never chat models

Waste rule triggered by task_type: if task_type is classify, route, or eval and you are using a premium model with average output under 100 tokens — AISpendGuard will flag this as overspend and show the exact monthly saving from switching to micro tier.

Extended Token Fields

These optional fields enable accurate cost calculation and cost-spike detection. Provider helpers extract them automatically — pass response.usage and they are captured for you.

Field	What it tracks	Provider source
`resolvedModel`string	Pinned model version returned by provider (e.g. `gpt-4o-mini-2024-07-18`) ⚠ Silent upgrades go undetected; price lookup uses alias	`response.modelmessage.modelresponse.modelVersion`
`inputTokensCached`integer	Cache read hits — subset of `inputTokens`, billed cheaper (OpenAI 0.5×, Anthropic 0.1×) ⚠ Spend overstated on cached calls; cache ROI invisible	`prompt_tokens_details.cached_tokenscache_read_input_tokenscachedContentTokenCount`
`inputTokensCacheWrite`integer · Anthropic only	Cache write cost — subset of `inputTokens`, billed at 1.25× base input price ⚠ Spend understated when building Anthropic prompt cache	`cache_creation_input_tokens`
`thinkingTokens`integer	Reasoning/thinking tokens — subset of `outputTokens`. Can be 3–10× the visible output on o1/o3/Gemini 2.5 ⚠ Cost spikes from reasoning-heavy calls are invisible	`completion_tokens_details.reasoning_tokensoutput_tokens_details.reasoning_tokensusageMetadata.thoughtsTokenCount`

Anthropic extended thinking note: When using claude-3-7-sonnet with thinking: { type: "enabled" }, thinking tokens are included in output_tokens but are not reported separately in the usage object. To track them, count content blocks of type "thinking" manually and pass the token count via thinkingTokens.

OpenAI (Real SDK Integration)

import OpenAI from "openai";
import { init, trackUsage, createOpenAIUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
const startedAt = Date.now();

const response = await openai.responses.create({
  model: "gpt-4o-mini",
  input: "Classify this message: 'I want to cancel my subscription'"
});

const event = createOpenAIUsageEvent({
  model: "gpt-4o-mini",
  resolvedModel: response.model,       // e.g. "gpt-4o-mini-2024-07-18"
  usage: response.usage,               // auto-extracts tokens, cache hits, reasoning tokens
  latencyMs: Date.now() - startedAt,
  costUsd: 0.0021, // optional if you have pricing calc
  tags: {
    task_type: "classify",
    feature: "ticket_triage",
    route: "POST /api/support/triage",
    customer_plan: "free",
    environment: "prod"
  }
});

await trackUsage(event);

Anthropic (Real SDK Integration)

import Anthropic from "@anthropic-ai/sdk";
import { init, trackUsage, createAnthropicUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
const startedAt = Date.now();

const message = await anthropic.messages.create({
  model: "claude-3-5-sonnet-latest",
  max_tokens: 200,
  messages: [{ role: "user", content: "Summarize this support case in 3 bullet points." }]
});

const event = createAnthropicUsageEvent({
  model: "claude-3-5-sonnet-latest",
  resolvedModel: message.model,        // e.g. "claude-3-5-sonnet-20241022"
  usage: message.usage,                // auto-extracts tokens, cache_read, cache_creation
  latencyMs: Date.now() - startedAt,
  costUsd: 0.0081, // optional if you have pricing calc
  tags: {
    task_type: "summarize",
    feature: "support_summary",
    route: "POST /api/support/summary",
    customer_plan: "pro",
    environment: "prod"
  }
});

await trackUsage(event);

Gemini (Real SDK Integration)

import { GoogleGenAI } from "@google/genai";
import { init, trackUsage, createGeminiUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
const startedAt = Date.now();

const response = await ai.models.generateContent({
  model: "gemini-2.0-flash",
  contents: [{ role: "user", parts: [{ text: "Translate 'Hello world' to French." }] }]
});

const event = createGeminiUsageEvent({
  model: "gemini-2.0-flash",
  resolvedModel: response.modelVersion,  // e.g. "gemini-2.0-flash-001"
  usage: response.usageMetadata,         // auto-extracts tokens, cachedContent, thoughts
  latencyMs: Date.now() - startedAt,
  tags: {
    task_type: "translate",
    feature: "ui_i18n",
    route: "POST /api/translate",
    environment: "prod"
  }
});

await trackUsage(event);

OpenRouter

OpenRouter is OpenAI-compatible, so the existing createOpenAIUsageEvent() helper works out of the box. Set provider: "openrouter" in tags for attribution.

Option A: SDK Integration

import OpenAI from "openai";
import { init, trackUsage, createOpenAIUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const openrouter = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY!,
});

const startedAt = Date.now();

const response = await openrouter.chat.completions.create({
  model: "anthropic/claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Hello" }],
});

const event = createOpenAIUsageEvent({
  model: "anthropic/claude-sonnet-4-20250514",
  resolvedModel: response.model,
  usage: response.usage,
  latencyMs: Date.now() - startedAt,
  tags: {
    provider: "openrouter",  // Override provider to "openrouter"
    task_type: "chat",
    feature: "support",
    route: "/api/chat",
  }
});

await trackUsage(event);

Option B: LiteLLM (Python)

If you use LiteLLM with the openrouter/ model prefix, our aispendguard-litellm integration auto-detects OpenRouter and tracks all calls.

import litellm
from aispendguard_litellm import AISpendGuardLogger

litellm.callbacks.append(
    AISpendGuardLogger(api_key="asg_...",
                       default_tags={"feature": "api", "route": "/chat"})
)

response = litellm.completion(
    model="openrouter/anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}],
    metadata={"aispendguard_tags": {"task_type": "chat"}},
)

Option C: Broadcast Webhook (Zero-Code)

OpenRouter can push all your usage telemetry directly to AISpendGuard via its Broadcast feature. No SDK, no code changes — just configure the webhook and you're done.

// Zero-code setup — no SDK needed!
// 1. Copy your AISpendGuard API key from /settings/workspace
// 2. In OpenRouter → Settings → Broadcast → Add Webhook:
//    URL:  https://www.aispendguard.com/api/ingest/otlp
//    Auth: Authorization: Bearer asg_your_api_key
// 3. Enable privacy mode (recommended)
// 4. Done — all OpenRouter usage flows to AISpendGuard automatically

Privacy mode strips prompt/response content. Even without it, AISpendGuard automatically strips all prompt content — only cost, tokens, and metadata are stored.

Python (HTTP)

import requests
from datetime import datetime, timezone

url = "https://www.aispendguard.com/api/ingest"
api_key = "asg_your_api_key"

payload = {
    "events": [
        {
            "event_id": "evt_123",
            "provider": "openai",
            "model": "gpt-4o-mini",
            "input_tokens": 120,
            "output_tokens": 12,
            "latency_ms": 840,
            "cost_usd": 0.0021,
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "tags": {
                "task_type": "classify",
                "feature": "lead_classifier",
                "route": "POST /api/ai/classify",
                "customer_plan": "free",
                "environment": "prod",
                "customer_defined_1": ["value1", "value2"],
                "customer_defined_2": ["service1", "service2"]
            }
        }
    ]
}

res = requests.post(url, json=payload, headers={"x-api-key": api_key})
print(res.status_code, res.json())

Go (HTTP)

package main

import (
  "bytes"
  "fmt"
  "net/http"
)

func main() {
  payload := []byte(`{
    "events": [{
      "event_id": "evt_123",
      "provider": "openai",
      "model": "gpt-4o-mini",
      "input_tokens": 120,
      "output_tokens": 12,
      "latency_ms": 840,
      "cost_usd": 0.0021,
      "timestamp": "2026-03-04T12:00:00Z",
      "tags": {
        "task_type": "classify",
        "feature": "lead_classifier",
        "route": "POST /api/ai/classify",
        "customer_plan": "free",
        "environment": "prod",
        "customer_defined_1": ["value1", "value2"],
        "customer_defined_2": ["service1", "service2"]
      }
    }]
  }`)

  req, _ := http.NewRequest("POST", "https://www.aispendguard.com/api/ingest", bytes.NewBuffer(payload))
  req.Header.Set("Content-Type", "application/json")
  req.Header.Set("x-api-key", "asg_your_api_key")

  resp, err := http.DefaultClient.Do(req)
  if err != nil { panic(err) }
  defer resp.Body.Close()
  fmt.Println("status:", resp.StatusCode)
}

cURL

curl -X POST https://www.aispendguard.com/api/ingest \
  -H "Content-Type: application/json" \
  -H "x-api-key: asg_your_api_key" \
  -d '{
    "events": [{
      "event_id": "evt_123",
      "provider": "openai",
      "model": "gpt-4o-mini",
      "input_tokens": 120,
      "output_tokens": 12,
      "latency_ms": 840,
      "cost_usd": 0.0021,
      "timestamp": "2026-03-04T12:00:00Z",
      "tags": {
        "task_type": "classify",
        "feature": "lead_classifier",
        "route": "POST /api/ai/classify",
        "customer_plan": "free",
        "environment": "prod",
        "customer_defined_1": ["value1", "value2"],
        "customer_defined_2": ["service1", "service2"]
      }
    }]
  }'

Auto-Wrap (Zero-Code Tracking)

Wrap your AI client once — every call is tracked automatically. No manual trackUsage() needed.

OpenAI

import OpenAI from "openai";
import { init, wrapOpenAI } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  defaultTags: {
    feature: "chatbot",
    route: "POST /api/chat",
    environment: "prod",
  },
});

const openai = wrapOpenAI(new OpenAI());

// Every call is now tracked automatically:
const res = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
  // Override tags per call (optional):
  asgTags: { task_type: "chat", customer_plan: "pro" },
});

Anthropic

import Anthropic from "@anthropic-ai/sdk";
import { init, wrapAnthropic } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  defaultTags: { feature: "support", route: "POST /api/support" },
});

const anthropic = wrapAnthropic(new Anthropic());

const msg = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 200,
  messages: [{ role: "user", content: "Summarize this ticket" }],
  asgTags: { task_type: "summarize" },
});

Gemini

import { GoogleGenAI } from "@google/genai";
import { init, wrapGemini } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  defaultTags: { feature: "translate", route: "POST /api/translate" },
});

const model = wrapGemini(
  ai.models,       // pass the models object
  "gemini-2.0-flash" // model name is required
);

const res = await model.generateContent({
  contents: [{ role: "user", parts: [{ text: "Translate to French" }] }],
  asgTags: { task_type: "translate" },
});

defaultTags

Tags passed to init({ defaultTags }) are merged into every auto-wrapped call. Per-call asgTags override defaults. Use this to set feature, route, and environment once instead of repeating them.

LangChain.js Integration

The SDK includes a LangChain.js callback handler that tracks every LLM call automatically. Works with any LangChain-supported provider (OpenAI, Anthropic, Google, Mistral, Cohere, etc.).

import { ChatOpenAI } from "@langchain/openai";
import { init, AISpendGuardCallbackHandler } from "@aispendguard/sdk";

init({ apiKey: process.env.AISPENDGUARD_API_KEY! });

const handler = new AISpendGuardCallbackHandler({
  defaultTags: {
    feature: "rag_pipeline",
    route: "POST /api/ask",
  },
});

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  callbacks: [handler],
});

// Or pass to any chain/agent:
const result = await chain.invoke(
  { input: "..." },
  { callbacks: [handler] }
);

The handler auto-detects the provider, model, and token usage from LangChain's callback data. It never reads prompts or outputs — only metadata is tracked. Events are deduplicated by LangChain run ID.

For LangChain Python, install pip install aispendguard-langchain — see the aispendguard-langchain repo.

Python SDK

Native Python SDK with batched transport, provider helpers, and the same validation rules as the TypeScript SDK.

pip install aispendguard

from aispendguard import AISpendGuard, create_openai_event
import openai, time

client = AISpendGuard(api_key="asg_your_key_here")

openai_client = openai.OpenAI()
start = time.time()

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Classify: 'I want to cancel'"}],
)

event = create_openai_event(
    model="gpt-4o-mini",
    usage=response.usage,
    latency_ms=int((time.time() - start) * 1000),
    tags={
        "task_type": "classify",
        "feature": "ticket_triage",
        "route": "POST /api/classify",
    },
)
client.track(event)

Also supports create_anthropic_event and create_gemini_event. See the full docs at github.com/AISpendGuard/aispendguard-python.

OpenTelemetry (OTLP)

If you already instrument with OpenTelemetry, send GenAI traces directly — no SDK needed.

POST https://www.aispendguard.com/api/otel/v1/traces
Content-Type: application/json
Authorization: Bearer asg_your_api_key

# Standard OTLP/HTTP JSON format with GenAI semantic conventions:
#   gen_ai.system          → provider (openai, anthropic, google)
#   gen_ai.request.model   → model name
#   gen_ai.usage.input_tokens
#   gen_ai.usage.output_tokens

# AISpendGuard-specific attributes (optional):
#   asg.task_type, asg.feature, asg.route — override tags
#   asg.*                  → custom tags (prefix stripped)

# Headers for default tags:
#   x-asg-feature: my_feature
#   x-asg-route: POST /api/endpoint

Works with any OTLP-compatible instrumentation (OpenLLMetry, Traceloop, custom spans). Prompt content and model outputs are automatically stripped — only token counts and metadata are stored.

Budget Alerts

Set a monthly USD spending cap and get email alerts at 75% and 90% thresholds. Available on the Pro plan.

Setup via Dashboard

Go to /billing → Budget panel → set your monthly limit, enable alert thresholds, and add an alert email address.

Setup via API

# Create or update budget (requires Clerk session, OWNER/ADMIN role)
POST /api/budgets
Content-Type: application/json

{
  "monthlyLimitUsd": 500,
  "alertAt75": true,
  "alertAt90": true,
  "alertEmail": "alerts@yourcompany.com"
}

# Check current budget
GET /api/budgets

# Remove budget
DELETE /api/budgets

Alerts fire in real-time at ingest — as soon as your spend crosses a threshold, you get an email. Each threshold fires once per calendar month (deduplicated automatically). Spend is calculated from both rolled-up daily aggregates and unprocessed events for accuracy.

Ingest Response

Every POST /api/ingest call returns a JSON object with these fields:

{
  "accepted": 1,
  "duplicates": 0,
  "rejected": 0,
  "event_ids": ["cm...abc"],       // IDs of accepted events
  "warnings": [                     // Non-critical issues (event still accepted)
    "events[0].tags.task_type \"lab-benchmark\" is not recognized — coerced to \"other\"",
    "events[0].tags.my-key is not a supported tag key — stripped"
  ],
  "enforcement": {                  // Budget status (event always accepted)
    "action": "block",              // "none" | "warn" | "block"
    "reason": "workspace_budget_exceeded",
    "budget_limit": 50.00,
    "current_spend": 52.34
  },
  "dashboard_url": "https://www.aispendguard.com/events",
  "usage": {
    "eventsThisMonth": 1234,
    "monthlyLimit": 50000,
    "tier": "FREE"
  }
}

Use event_ids to verify your events were stored. Open dashboard_url to see them live.

Warnings: Tag validation issues (invalid task_type, unsupported tag keys, values too long) produce warnings — events are always accepted with best-effort coercion. Only privacy violations (forbidden keys like prompt, message) cause hard rejection.

Enforcement: When your workspace exceeds its budget, events are still tracked and accepted. The enforcement field signals the budget status so your code can decide whether to continue sending requests.

Error Reference

Status	Meaning	Fix
`400`	Invalid request body or privacy violation	Check `errors[]` in response — invalid JSON, forbidden keys (prompt, message, etc.), or structurally unparseable events. Tag validation issues return 200 with `warnings[]` instead.
`401`	Missing or invalid API key	Pass `x-api-key` header or `Authorization: Bearer <key>`. Check key is not revoked.
`429`	Rate limit or monthly event limit reached	Rate limit: 120 req/min per key, wait and retry. Event limit: check `usage.eventsThisMonth` — upgrade to PRO for 500K/mo.
`500`	Server error	Retry with exponential backoff. If persistent, check status page.

Error Response Shape (400)

{
  "accepted": 0,
  "duplicates": 0,
  "rejected": 2,
  "errors": [
    "events[0] contains forbidden field: prompt",
    "events[1] invalid provider: must be a non-empty string"
  ]
}

Troubleshooting

Events accepted but not on dashboard?

Dashboard reads from daily rollups which update on each cron run. New events appear after the next rollup cycle. Check /events page for raw events — they appear immediately.

SDK trackUsage() doesn't throw but events missing?

By default the SDK is fire-and-forget — errors are logged to console, not thrown. Use init({ strict: true }) to throw on failures, then check the error message.

Getting 401 with a valid key?

Check if the key was revoked in Settings. Generate a new one if needed. Ensure you pass the full key string (starts with asg_).

Duplicate events being skipped?

Events are deduplicated by event_id or a content hash (provider + model + tokens + timestamp). Use unique event_id values per call, or omit it to let the server generate one.

Cost showing as $0?

Cost is calculated server-side from model pricing. If the model isn't in our pricing database, cost will be $0. Pass cost_usd in the event to override. Check /model-prices for supported models.

Streaming Responses

With streaming, usage data arrives in the final chunk, not during the stream. Accumulate the stream, then track usage after it completes.

OpenAI Streaming

import time
start = time.time()

# Must pass stream_options to get usage in stream
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    stream_options={"include_usage": True},
)

usage = None
for chunk in stream:
    if chunk.usage:
        usage = chunk.usage
    # ... process chunk.choices[0].delta

if usage:
    event = create_openai_event(
        model="gpt-4o-mini",
        usage=usage,
        latency_ms=int((time.time() - start) * 1000),
        tags={"task_type": "chat", "feature": "assistant", "route": "POST /api/chat"},
    )
    asg.track(event)

Anthropic Streaming

with anthropic_client.messages.stream(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=200,
) as stream:
    for text in stream.text_stream:
        pass  # process text chunks
    message = stream.get_final_message()

# Usage is on the final message object
event = create_anthropic_event(
    model="claude-sonnet-4-20250514",
    usage=message.usage,
    latency_ms=latency,
    tags={"task_type": "chat", "feature": "assistant", "route": "POST /api/chat"},
)
asg.track(event)

TypeScript SDK (OpenAI)

const stream = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  stream_options: { include_usage: true },
});

let usage;
for await (const chunk of stream) {
  if (chunk.usage) usage = chunk.usage;
  // ... process chunk
}

if (usage) {
  const event = createOpenAIUsageEvent({
    model: "gpt-4o-mini",
    usage,
    latencyMs: Date.now() - start,
    tags: { task_type: "chat", feature: "assistant", route: "POST /api/chat" },
  });
  await trackUsage(event);
}

Data Export API

Export your usage events as JSON or CSV for internal dashboards, reporting, or analysis. Requires Clerk authentication (session cookie).

# JSON (default)
GET /api/export?from=2026-03-01&to=2026-04-01&limit=5000

# CSV
GET /api/export?format=csv&from=2026-03-01&to=2026-04-01

# Parameters:
#   from    — start date (default: 1st of current month)
#   to      — end date (default: 1st of next month)
#   format  — "json" (default) or "csv"
#   limit   — max rows, 1–10000 (default: 1000)