Documentation
Start From Zero To Hero
Set up workspace, connect SDK, send events, and read savings insights.
How It Works
Create workspace + key
In /settings/workspace, create workspace and API key.
Instrument your app
Add tags-only tracking in your AI request handlers.
Send usage events
Use SDK or direct HTTP to POST /api/ingest.
Run rollups
Daily rollups aggregate usage by provider/feature/route/task.
Act on savings
Dashboard shows where spend concentrates and what to optimize first.
Limits & Guardrails
- Required tags:
task_type,feature,route. - Custom tags are allowed and auto-accepted if key is lowercase snake_case.
- Custom tag values can be
stringorstring[]. - Limits: max 24 tags/event, max 16 values in one array tag, max 120 chars/value.
- Prompt/content/output-like fields are blocked by privacy guard.
task_type Reference
Pick the value that best describes what the model is being asked to produce. The right task_typeis how AISpendGuard knows when you're using a more expensive model than the task actually needs.
| Value | What it does | Typical output | Best model tier | Batch-safe |
|---|---|---|---|---|
answer | Direct Q&A, RAG-backed responses, knowledge retrieval | 100–800 tok | standard | ✗ user-facing |
classify | Label, categorize, detect intent, route to a bucket | 1–10 tok | micro | ✓ strong |
extract | Pull structured fields from unstructured text | 50–300 tok | micro | ✓ yes |
summarize | Condense long content, TLDR, bullet points | 100–500 tok | standard | ✓ yes |
generate | Write or draft new content, creative writing, ideation | 300–2000 tok | standard | ✓ yes |
rewrite | Paraphrase, tone-adjust, edit existing text | ≈ input | standard | ✓ yes |
translate | Translate between languages | ≈ input | micro | ✓ yes |
code | Generate, complete, review, or explain code | 200–1500 tok | premium | ✓ yes |
eval | LLM-as-judge, quality scoring, test assertions | 10–50 tok | micro | ✓ best candidate |
embed | Text embedding / vector generation | fixed vector | ✓ strong | |
route | Decide which tool, agent, or path to take next | 1–20 tok | micro | ✓ yes |
plan | Decompose a goal into subtasks, strategy reasoning | 100–500 tok | premium | ✓ yes |
agent_step | Single step inside a multi-step agent loop | 50–800 tok | varies | usually ✗ |
vision | Understand images, PDFs, screenshots (multimodal) | 100–600 tok | standard | ✓ yes |
chat | Multi-turn stateful conversation (not one-shot Q&A) | 100–500 tok | standard | ✗ real-time |
other | None of the above — avoid, reduces waste detection quality | — | — | — |
Model tiers
task_type is classify, route, or eval and you are using a premium model with average output under 100 tokens — AISpendGuard will flag this as overspend and show the exact monthly saving from switching to micro tier.Extended Token Fields
These optional fields enable accurate cost calculation and cost-spike detection. Provider helpers extract them automatically — pass response.usage and they are captured for you.
| Field | What it tracks | Provider source |
|---|---|---|
resolvedModelstring | Pinned model version returned by provider (e.g. gpt-4o-mini-2024-07-18)⚠ Silent upgrades go undetected; price lookup uses alias | response.modelmessage.modelresponse.modelVersion |
inputTokensCachedinteger | Cache read hits — subset of inputTokens, billed cheaper (OpenAI 0.5×, Anthropic 0.1×)⚠ Spend overstated on cached calls; cache ROI invisible | prompt_tokens_details.cached_tokenscache_read_input_tokenscachedContentTokenCount |
inputTokensCacheWriteinteger · Anthropic only | Cache write cost — subset of inputTokens, billed at 1.25× base input price⚠ Spend understated when building Anthropic prompt cache | cache_creation_input_tokens |
thinkingTokensinteger | Reasoning/thinking tokens — subset of outputTokens. Can be 3–10× the visible output on o1/o3/Gemini 2.5⚠ Cost spikes from reasoning-heavy calls are invisible | completion_tokens_details.reasoning_tokensoutput_tokens_details.reasoning_tokensusageMetadata.thoughtsTokenCount |
claude-3-7-sonnet with thinking: { type: "enabled" }, thinking tokens are included in output_tokens but are not reported separately in the usage object. To track them, count content blocks of type "thinking" manually and pass the token count via thinkingTokens.OpenAI (Real SDK Integration)
import OpenAI from "openai";
import { init, trackUsage, createOpenAIUsageEvent } from "@aispendguard/sdk";
init({
apiKey: process.env.AISPENDGUARD_API_KEY!,
endpoint: "https://www.aispendguard.com/api/ingest",
});
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
const startedAt = Date.now();
const response = await openai.responses.create({
model: "gpt-4o-mini",
input: "Classify this message: 'I want to cancel my subscription'"
});
const event = createOpenAIUsageEvent({
model: "gpt-4o-mini",
resolvedModel: response.model, // e.g. "gpt-4o-mini-2024-07-18"
usage: response.usage, // auto-extracts tokens, cache hits, reasoning tokens
latencyMs: Date.now() - startedAt,
costUsd: 0.0021, // optional if you have pricing calc
tags: {
task_type: "classify",
feature: "ticket_triage",
route: "POST /api/support/triage",
customer_plan: "free",
environment: "prod"
}
});
await trackUsage(event);Anthropic (Real SDK Integration)
import Anthropic from "@anthropic-ai/sdk";
import { init, trackUsage, createAnthropicUsageEvent } from "@aispendguard/sdk";
init({
apiKey: process.env.AISPENDGUARD_API_KEY!,
endpoint: "https://www.aispendguard.com/api/ingest",
});
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
const startedAt = Date.now();
const message = await anthropic.messages.create({
model: "claude-3-5-sonnet-latest",
max_tokens: 200,
messages: [{ role: "user", content: "Summarize this support case in 3 bullet points." }]
});
const event = createAnthropicUsageEvent({
model: "claude-3-5-sonnet-latest",
resolvedModel: message.model, // e.g. "claude-3-5-sonnet-20241022"
usage: message.usage, // auto-extracts tokens, cache_read, cache_creation
latencyMs: Date.now() - startedAt,
costUsd: 0.0081, // optional if you have pricing calc
tags: {
task_type: "summarize",
feature: "support_summary",
route: "POST /api/support/summary",
customer_plan: "pro",
environment: "prod"
}
});
await trackUsage(event);Gemini (Real SDK Integration)
import { GoogleGenAI } from "@google/genai";
import { init, trackUsage, createGeminiUsageEvent } from "@aispendguard/sdk";
init({
apiKey: process.env.AISPENDGUARD_API_KEY!,
endpoint: "https://www.aispendguard.com/api/ingest",
});
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
const startedAt = Date.now();
const response = await ai.models.generateContent({
model: "gemini-2.0-flash",
contents: [{ role: "user", parts: [{ text: "Translate 'Hello world' to French." }] }]
});
const event = createGeminiUsageEvent({
model: "gemini-2.0-flash",
resolvedModel: response.modelVersion, // e.g. "gemini-2.0-flash-001"
usage: response.usageMetadata, // auto-extracts tokens, cachedContent, thoughts
latencyMs: Date.now() - startedAt,
tags: {
task_type: "translate",
feature: "ui_i18n",
route: "POST /api/translate",
environment: "prod"
}
});
await trackUsage(event);OpenRouter
OpenRouter is OpenAI-compatible, so the existing createOpenAIUsageEvent() helper works out of the box. Set provider: "openrouter" in tags for attribution.
Option A: SDK Integration
import OpenAI from "openai";
import { init, trackUsage, createOpenAIUsageEvent } from "@aispendguard/sdk";
init({
apiKey: process.env.AISPENDGUARD_API_KEY!,
endpoint: "https://www.aispendguard.com/api/ingest",
});
const openrouter = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY!,
});
const startedAt = Date.now();
const response = await openrouter.chat.completions.create({
model: "anthropic/claude-sonnet-4-20250514",
messages: [{ role: "user", content: "Hello" }],
});
const event = createOpenAIUsageEvent({
model: "anthropic/claude-sonnet-4-20250514",
resolvedModel: response.model,
usage: response.usage,
latencyMs: Date.now() - startedAt,
tags: {
provider: "openrouter", // Override provider to "openrouter"
task_type: "chat",
feature: "support",
route: "/api/chat",
}
});
await trackUsage(event);Option B: LiteLLM (Python)
If you use LiteLLM with the openrouter/ model prefix, our aispendguard-litellm integration auto-detects OpenRouter and tracks all calls.
import litellm
from aispendguard_litellm import AISpendGuardLogger
litellm.callbacks.append(
AISpendGuardLogger(api_key="asg_...",
default_tags={"feature": "api", "route": "/chat"})
)
response = litellm.completion(
model="openrouter/anthropic/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}],
metadata={"aispendguard_tags": {"task_type": "chat"}},
)Option C: Broadcast Webhook (Zero-Code)
OpenRouter can push all your usage telemetry directly to AISpendGuard via its Broadcast feature. No SDK, no code changes — just configure the webhook and you're done.
// Zero-code setup — no SDK needed!
// 1. Copy your AISpendGuard API key from /settings/workspace
// 2. In OpenRouter → Settings → Broadcast → Add Webhook:
// URL: https://www.aispendguard.com/api/ingest/otlp
// Auth: Authorization: Bearer asg_your_api_key
// 3. Enable privacy mode (recommended)
// 4. Done — all OpenRouter usage flows to AISpendGuard automaticallyPrivacy mode strips prompt/response content. Even without it, AISpendGuard automatically strips all prompt content — only cost, tokens, and metadata are stored.
Python (HTTP)
import requests
from datetime import datetime, timezone
url = "https://www.aispendguard.com/api/ingest"
api_key = "asg_your_api_key"
payload = {
"events": [
{
"event_id": "evt_123",
"provider": "openai",
"model": "gpt-4o-mini",
"input_tokens": 120,
"output_tokens": 12,
"latency_ms": 840,
"cost_usd": 0.0021,
"timestamp": datetime.now(timezone.utc).isoformat(),
"tags": {
"task_type": "classify",
"feature": "lead_classifier",
"route": "POST /api/ai/classify",
"customer_plan": "free",
"environment": "prod",
"customer_defined_1": ["value1", "value2"],
"customer_defined_2": ["service1", "service2"]
}
}
]
}
res = requests.post(url, json=payload, headers={"x-api-key": api_key})
print(res.status_code, res.json())Go (HTTP)
package main
import (
"bytes"
"fmt"
"net/http"
)
func main() {
payload := []byte(`{
"events": [{
"event_id": "evt_123",
"provider": "openai",
"model": "gpt-4o-mini",
"input_tokens": 120,
"output_tokens": 12,
"latency_ms": 840,
"cost_usd": 0.0021,
"timestamp": "2026-03-04T12:00:00Z",
"tags": {
"task_type": "classify",
"feature": "lead_classifier",
"route": "POST /api/ai/classify",
"customer_plan": "free",
"environment": "prod",
"customer_defined_1": ["value1", "value2"],
"customer_defined_2": ["service1", "service2"]
}
}]
}`)
req, _ := http.NewRequest("POST", "https://www.aispendguard.com/api/ingest", bytes.NewBuffer(payload))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("x-api-key", "asg_your_api_key")
resp, err := http.DefaultClient.Do(req)
if err != nil { panic(err) }
defer resp.Body.Close()
fmt.Println("status:", resp.StatusCode)
}cURL
curl -X POST https://www.aispendguard.com/api/ingest \
-H "Content-Type: application/json" \
-H "x-api-key: asg_your_api_key" \
-d '{
"events": [{
"event_id": "evt_123",
"provider": "openai",
"model": "gpt-4o-mini",
"input_tokens": 120,
"output_tokens": 12,
"latency_ms": 840,
"cost_usd": 0.0021,
"timestamp": "2026-03-04T12:00:00Z",
"tags": {
"task_type": "classify",
"feature": "lead_classifier",
"route": "POST /api/ai/classify",
"customer_plan": "free",
"environment": "prod",
"customer_defined_1": ["value1", "value2"],
"customer_defined_2": ["service1", "service2"]
}
}]
}'Auto-Wrap (Zero-Code Tracking)
Wrap your AI client once — every call is tracked automatically. No manual trackUsage() needed.
OpenAI
import OpenAI from "openai";
import { init, wrapOpenAI } from "@aispendguard/sdk";
init({
apiKey: process.env.AISPENDGUARD_API_KEY!,
defaultTags: {
feature: "chatbot",
route: "POST /api/chat",
environment: "prod",
},
});
const openai = wrapOpenAI(new OpenAI());
// Every call is now tracked automatically:
const res = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello" }],
// Override tags per call (optional):
asgTags: { task_type: "chat", customer_plan: "pro" },
});Anthropic
import Anthropic from "@anthropic-ai/sdk";
import { init, wrapAnthropic } from "@aispendguard/sdk";
init({
apiKey: process.env.AISPENDGUARD_API_KEY!,
defaultTags: { feature: "support", route: "POST /api/support" },
});
const anthropic = wrapAnthropic(new Anthropic());
const msg = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 200,
messages: [{ role: "user", content: "Summarize this ticket" }],
asgTags: { task_type: "summarize" },
});Gemini
import { GoogleGenAI } from "@google/genai";
import { init, wrapGemini } from "@aispendguard/sdk";
init({
apiKey: process.env.AISPENDGUARD_API_KEY!,
defaultTags: { feature: "translate", route: "POST /api/translate" },
});
const model = wrapGemini(
ai.models, // pass the models object
"gemini-2.0-flash" // model name is required
);
const res = await model.generateContent({
contents: [{ role: "user", parts: [{ text: "Translate to French" }] }],
asgTags: { task_type: "translate" },
});defaultTags
Tags passed to init({ defaultTags }) are merged into every auto-wrapped call. Per-call asgTags override defaults. Use this to set feature, route, and environment once instead of repeating them.
LangChain.js Integration
The SDK includes a LangChain.js callback handler that tracks every LLM call automatically. Works with any LangChain-supported provider (OpenAI, Anthropic, Google, Mistral, Cohere, etc.).
import { ChatOpenAI } from "@langchain/openai";
import { init, AISpendGuardCallbackHandler } from "@aispendguard/sdk";
init({ apiKey: process.env.AISPENDGUARD_API_KEY! });
const handler = new AISpendGuardCallbackHandler({
defaultTags: {
feature: "rag_pipeline",
route: "POST /api/ask",
},
});
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
callbacks: [handler],
});
// Or pass to any chain/agent:
const result = await chain.invoke(
{ input: "..." },
{ callbacks: [handler] }
);The handler auto-detects the provider, model, and token usage from LangChain's callback data. It never reads prompts or outputs — only metadata is tracked. Events are deduplicated by LangChain run ID.
For LangChain Python, install pip install aispendguard-langchain — see the aispendguard-langchain repo.
Python SDK
Native Python SDK with batched transport, provider helpers, and the same validation rules as the TypeScript SDK.
pip install aispendguardfrom aispendguard import AISpendGuard, create_openai_event
import openai, time
client = AISpendGuard(api_key="asg_your_key_here")
openai_client = openai.OpenAI()
start = time.time()
response = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Classify: 'I want to cancel'"}],
)
event = create_openai_event(
model="gpt-4o-mini",
usage=response.usage,
latency_ms=int((time.time() - start) * 1000),
tags={
"task_type": "classify",
"feature": "ticket_triage",
"route": "POST /api/classify",
},
)
client.track(event)Also supports create_anthropic_event and create_gemini_event. See the full docs at github.com/AISpendGuard/aispendguard-python.
OpenTelemetry (OTLP)
If you already instrument with OpenTelemetry, send GenAI traces directly — no SDK needed.
POST https://www.aispendguard.com/api/otel/v1/traces
Content-Type: application/json
Authorization: Bearer asg_your_api_key
# Standard OTLP/HTTP JSON format with GenAI semantic conventions:
# gen_ai.system → provider (openai, anthropic, google)
# gen_ai.request.model → model name
# gen_ai.usage.input_tokens
# gen_ai.usage.output_tokens
# AISpendGuard-specific attributes (optional):
# asg.task_type, asg.feature, asg.route — override tags
# asg.* → custom tags (prefix stripped)
# Headers for default tags:
# x-asg-feature: my_feature
# x-asg-route: POST /api/endpointWorks with any OTLP-compatible instrumentation (OpenLLMetry, Traceloop, custom spans). Prompt content and model outputs are automatically stripped — only token counts and metadata are stored.
Budget Alerts
Set a monthly USD spending cap and get email alerts at 75% and 90% thresholds. Available on the Pro plan.
Setup via Dashboard
Go to /billing → Budget panel → set your monthly limit, enable alert thresholds, and add an alert email address.
Setup via API
# Create or update budget (requires Clerk session, OWNER/ADMIN role)
POST /api/budgets
Content-Type: application/json
{
"monthlyLimitUsd": 500,
"alertAt75": true,
"alertAt90": true,
"alertEmail": "alerts@yourcompany.com"
}
# Check current budget
GET /api/budgets
# Remove budget
DELETE /api/budgetsAlerts fire in real-time at ingest — as soon as your spend crosses a threshold, you get an email. Each threshold fires once per calendar month (deduplicated automatically). Spend is calculated from both rolled-up daily aggregates and unprocessed events for accuracy.
Ingest Response
Every POST /api/ingest call returns a JSON object with these fields:
{
"accepted": 1,
"duplicates": 0,
"rejected": 0,
"event_ids": ["cm...abc"], // IDs of accepted events
"warnings": [ // Non-critical issues (event still accepted)
"events[0].tags.task_type \"lab-benchmark\" is not recognized — coerced to \"other\"",
"events[0].tags.my-key is not a supported tag key — stripped"
],
"enforcement": { // Budget status (event always accepted)
"action": "block", // "none" | "warn" | "block"
"reason": "workspace_budget_exceeded",
"budget_limit": 50.00,
"current_spend": 52.34
},
"dashboard_url": "https://www.aispendguard.com/events",
"usage": {
"eventsThisMonth": 1234,
"monthlyLimit": 50000,
"tier": "FREE"
}
}Use event_ids to verify your events were stored. Open dashboard_url to see them live.
Warnings: Tag validation issues (invalid task_type, unsupported tag keys, values too long) produce warnings — events are always accepted with best-effort coercion. Only privacy violations (forbidden keys like prompt, message) cause hard rejection.
Enforcement: When your workspace exceeds its budget, events are still tracked and accepted. The enforcement field signals the budget status so your code can decide whether to continue sending requests.
Error Reference
| Status | Meaning | Fix |
|---|---|---|
400 | Invalid request body or privacy violation | Check errors[] in response — invalid JSON, forbidden keys (prompt, message, etc.), or structurally unparseable events. Tag validation issues return 200 with warnings[] instead. |
401 | Missing or invalid API key | Pass x-api-key header or Authorization: Bearer <key>. Check key is not revoked. |
429 | Rate limit or monthly event limit reached | Rate limit: 120 req/min per key, wait and retry. Event limit: check usage.eventsThisMonth — upgrade to PRO for 500K/mo. |
500 | Server error | Retry with exponential backoff. If persistent, check status page. |
Error Response Shape (400)
{
"accepted": 0,
"duplicates": 0,
"rejected": 2,
"errors": [
"events[0] contains forbidden field: prompt",
"events[1] invalid provider: must be a non-empty string"
]
}Troubleshooting
Events accepted but not on dashboard?
Dashboard reads from daily rollups which update on each cron run. New events appear after the next rollup cycle. Check /events page for raw events — they appear immediately.
SDK trackUsage() doesn't throw but events missing?
By default the SDK is fire-and-forget — errors are logged to console, not thrown. Use init({ strict: true }) to throw on failures, then check the error message.
Getting 401 with a valid key?
Check if the key was revoked in Settings. Generate a new one if needed. Ensure you pass the full key string (starts with asg_).
Duplicate events being skipped?
Events are deduplicated by event_id or a content hash (provider + model + tokens + timestamp). Use unique event_id values per call, or omit it to let the server generate one.
Cost showing as $0?
Cost is calculated server-side from model pricing. If the model isn't in our pricing database, cost will be $0. Pass cost_usd in the event to override. Check /model-prices for supported models.
Streaming Responses
With streaming, usage data arrives in the final chunk, not during the stream. Accumulate the stream, then track usage after it completes.
OpenAI Streaming
import time
start = time.time()
# Must pass stream_options to get usage in stream
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
stream_options={"include_usage": True},
)
usage = None
for chunk in stream:
if chunk.usage:
usage = chunk.usage
# ... process chunk.choices[0].delta
if usage:
event = create_openai_event(
model="gpt-4o-mini",
usage=usage,
latency_ms=int((time.time() - start) * 1000),
tags={"task_type": "chat", "feature": "assistant", "route": "POST /api/chat"},
)
asg.track(event)Anthropic Streaming
with anthropic_client.messages.stream(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=200,
) as stream:
for text in stream.text_stream:
pass # process text chunks
message = stream.get_final_message()
# Usage is on the final message object
event = create_anthropic_event(
model="claude-sonnet-4-20250514",
usage=message.usage,
latency_ms=latency,
tags={"task_type": "chat", "feature": "assistant", "route": "POST /api/chat"},
)
asg.track(event)TypeScript SDK (OpenAI)
const stream = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello" }],
stream: true,
stream_options: { include_usage: true },
});
let usage;
for await (const chunk of stream) {
if (chunk.usage) usage = chunk.usage;
// ... process chunk
}
if (usage) {
const event = createOpenAIUsageEvent({
model: "gpt-4o-mini",
usage,
latencyMs: Date.now() - start,
tags: { task_type: "chat", feature: "assistant", route: "POST /api/chat" },
});
await trackUsage(event);
}Data Export API
Export your usage events as JSON or CSV for internal dashboards, reporting, or analysis. Requires Clerk authentication (session cookie).
# JSON (default)
GET /api/export?from=2026-03-01&to=2026-04-01&limit=5000
# CSV
GET /api/export?format=csv&from=2026-03-01&to=2026-04-01
# Parameters:
# from — start date (default: 1st of current month)
# to — end date (default: 1st of next month)
# format — "json" (default) or "csv"
# limit — max rows, 1–10000 (default: 1000)