Slashboard Docs
Slashboard is a real-time cost observability platform for LLM applications. It tracks every token, every dollar, and every team â so you know exactly what AI is costing you, and why.
Quick Start
Get your first event into Slashboard in under 5 minutes.
- 1Create an accountSign up at app.slashllm.com â free, no credit card required.
- 2Get your API keyGo to Settings â API Keys and create a key. Copy it â it is only shown once. Keys look like
sb_live_xxxxxxxxxxxxxxxx. - 3Send your first eventPaste your key below and run this curl command:You'll get a 202 Accepted response with an event ID.bash
curl -X POST https://ep.slashllm.com/api/v1/ingest \ -H "Authorization: Bearer sb_live_your_key_here" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "startTime": "2026-06-01T10:00:00Z", "prompt_tokens": 120, "completion_tokens": 80, "response_cost": 0.0024, "metadata": { "team_id": "engineering" } }' - 4Open the dashboardVisit app.slashllm.com â your event appears instantly in the Cost Explorer.
How It Works
Slashboard is a managed ingest pipeline. Your app sends one HTTP POST per completion; Slashboard handles normalization, attribution, cost computation, storage, and visualization.
Ingestion is async â your app sees a 202 Accepted immediately, with no latency added to your LLM request path. Events are processed by a background worker and typically visible in the dashboard within 1â2 seconds.
| Endpoint | Use case | Latency added |
|---|---|---|
/api/v1/ingest | Single event (REST) | ~0ms (async) |
/api/v1/ingest/litellm | LiteLLM native webhook (array, ndjson, single) | ~0ms (async) |
/api/v1/ingest/batch | Up to 100 events in one POST | ~0ms (async) |
LiteLLM Integration
RecommendedThe native LiteLLM webhook is the simplest integration on the market â two environment variables, zero custom code, zero downloaded files. LiteLLM ships its built-in GenericAPILogger directly to Slashboard using the StandardLoggingPayload format.
Option A â Environment variables (proxy or SDK)
Set these two variables in your shell or .env file. LiteLLM reads them automatically when success_callback includes "generic_api".
# Add to your .env or shell profile
export GENERIC_LOGGER_ENDPOINT=https://ep.slashllm.com/api/v1/ingest/litellm
export GENERIC_LOGGER_HEADERS="Authorization=Bearer sb_live_your_key_here"Option B â Python SDK (recommended for applications)
Instantiate GenericAPILogger directly inside your async context. This is required because the logger uses asyncio.create_taskfor its batch-flush routine â it must run inside a live event loop.
import asyncio
import litellm
from litellm.integrations.generic_api.generic_api_callback import GenericAPILogger
async def main():
# Wire Slashboard â no custom code, just LiteLLM's native logger
logger = GenericAPILogger(
endpoint="https://ep.slashllm.com/api/v1/ingest/litellm",
headers={"Authorization": "Bearer sb_live_your_key_here"},
)
litellm.callbacks = [logger]
# Use acompletion â required for the async GenericAPILogger
response = await litellm.acompletion(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this ticket."}],
metadata={
"requester_metadata": {
"team": "support",
"feature": "ticket-summarizer",
"env": "production",
},
},
)
# Flush before the event loop closes
await asyncio.sleep(1.0)
await logger.async_send_batch()
asyncio.run(main())Option C â Python SDK with config file
Want to keep credentials out of code? Load them from a YAML or JSON config file and pass the values to GenericAPILogger. This gives you the same "externalize your config" pattern that the proxy YAML provides, without needing the proxy.
Create a slashboard.yaml (or JSON) alongside your app:
# slashboard.yaml
slashboard:
endpoint: https://ep.slashllm.com/api/v1/ingest/litellm
api_key: sb_live_your_key_here
log_format: json_arrayThen load it at startup:
import asyncio
import yaml
from litellm.integrations.generic_api.generic_api_callback import GenericAPILogger
import litellm
def load_slashboard_logger(config_path: str = "slashboard.yaml") -> GenericAPILogger:
with open(config_path) as f:
cfg = yaml.safe_load(f)["slashboard"]
return GenericAPILogger(
endpoint=cfg["endpoint"],
headers={"Authorization": f"Bearer {cfg['api_key']}"},
log_format=cfg.get("log_format", "json_array"),
)
async def main():
logger = load_slashboard_logger() # reads slashboard.yaml
litellm.callbacks = [logger]
response = await litellm.acompletion(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this ticket."}],
metadata={
"requester_metadata": {
"team": "support",
"feature": "ticket-summarizer",
},
},
)
await asyncio.sleep(1.0)
await logger.async_send_batch()
asyncio.run(main())callback_settings dict (the Python equivalent of proxy YAML) does not auto-resolve for SDK acompletion() calls â the resolver only runs at proxy startup, not during SDK completions. Always instantiate GenericAPILogger directly as shown above.Option D â LiteLLM Proxy (YAML config)
If you run the LiteLLM proxy server, add Slashboard to your litellm_config.yaml:
litellm_settings:
callbacks: ["slashboard"]
callback_settings:
slashboard:
callback_type: generic_api
endpoint: https://ep.slashllm.com/api/v1/ingest/litellm
headers:
Authorization: Bearer sb_live_your_key_here
log_format: json_array
event_types: ["llm_api_success", "llm_api_failure"]Wire formats supported
The /api/v1/ingest/litellm endpoint auto-detects the body format LiteLLM uses â you don't configure anything.
| Format | Body shape | When LiteLLM sends it |
|---|---|---|
json_array | Bare JSON array [ {...}, {...} ] | Default â batch of events |
single | Single JSON object { ... } | One POST per event |
ndjson | Newline-delimited {...}\n{...} | Streaming format |
Direct REST API
If you're not using LiteLLM, send events directly via HTTP POST. Any language, any framework â if it can make an HTTP request, it works.
Python
import requests
from datetime import datetime, timezone
def track_llm_call(model, prompt_tokens, completion_tokens, cost, team, feature):
requests.post(
"https://ep.slashllm.com/api/v1/ingest",
headers={"Authorization": "Bearer sb_live_your_key_here"},
json={
"model": model,
"startTime": datetime.now(timezone.utc).isoformat(),
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"response_cost": cost,
"status": "success",
"metadata": {"team_id": team, "feature": feature},
},
timeout=3,
)TypeScript / Node.js
async function trackLLMCall(params: {
model: string;
promptTokens: number;
completionTokens: number;
cost: number;
team: string;
feature: string;
}) {
await fetch("https://ep.slashllm.com/api/v1/ingest", {
method: "POST",
headers: {
Authorization: "Bearer sb_live_your_key_here",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: params.model,
startTime: new Date().toISOString(),
prompt_tokens: params.promptTokens,
completion_tokens: params.completionTokens,
response_cost: params.cost,
status: "success",
metadata: { team_id: params.team, feature: params.feature },
}),
});
}Go
func TrackLLMCall(model, team, feature string, promptTokens, completionTokens int, cost float64) error {
payload, _ := json.Marshal(map[string]any{
"model": model,
"startTime": time.Now().UTC().Format(time.RFC3339),
"prompt_tokens": promptTokens,
"completion_tokens": completionTokens,
"response_cost": cost,
"status": "success",
"metadata": map[string]string{"team_id": team, "feature": feature},
})
req, _ := http.NewRequest("POST", "https://ep.slashllm.com/api/v1/ingest", bytes.NewReader(payload))
req.Header.Set("Authorization", "Bearer sb_live_your_key_here")
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil { return err }
defer resp.Body.Close()
return nil
}Python Callback (Advanced)
For maximum control â custom filtering, local buffering, or enrichment before sending â use the SlashboardLogger custom callback. Drop slashboard_callback.py into your project and wire it as a LiteLLM callback.
1. Download the callback file
curl -O https://raw.githubusercontent.com/slashboard-io/slashboard/main/demos/litellm-test-app/slashboard_callback.py2. Wire it up
import litellm
from slashboard_callback import SlashboardLogger
import os
os.environ["SLASHBOARD_API_KEY"] = "sb_live_your_key_here"
os.environ["SLASHBOARD_API_URL"] = "https://ep.slashllm.com"
litellm.callbacks = [SlashboardLogger()]
response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
metadata={
"requester_metadata": {
"team": "engineering",
"feature": "chat",
"env": "production",
},
},
)Attribution & Teams
Attribution tells Slashboard who is spending what. Every event can carry team, feature, environment, and user dimensions â these power the Cost Explorer breakdowns and budget alerts.
Dimensions
| Field | Type | Where to set it | Example |
|---|---|---|---|
team_id | string | metadata.requester_metadata.team_id (SDK) or metadata.user_api_key_team_id (proxy) | engineering |
team | string | metadata.requester_metadata.team (LiteLLM SDK) | engineering |
feature | string | metadata.requester_metadata.feature | code-review |
env | string | metadata.requester_metadata.env | production |
end_user | string | top-level end_user field | user_abc123 |
Best practices
Standardize your team_id values across all services before ingesting data. The Cost Explorer groups by exact string match â inconsistent names (e.g. "eng" vs "engineering") result in split rows.
# Good â consistent team IDs across your org
TEAMS = {"engineering", "data-science", "marketing", "support", "finance"}
# Pass in every completion call (LiteLLM SDK path)
metadata = {
"requester_metadata": {
"team": "engineering", # team attribution
"feature": "code-review", # feature/product area
"env": "production", # environment
}
}Cost Model
Slashboard stores the cost you send. If you send response_cost: 0, it stores zero â it does not silently recompute. This keeps your dashboard consistent with what your provider actually charges.
Sending cost from LiteLLM
LiteLLM automatically computes response_cost from its built-in pricing tables and includes it in the StandardLoggingPayload. No extra work needed.
Sending cost from your own code
If you compute cost yourself, include both input_cost and output_cost for the most accurate breakdown charts:
# GPT-4o pricing (June 2026)
INPUT_RATE = 2.50 / 1_000_000 # $ per token
OUTPUT_RATE = 10.0 / 1_000_000 # $ per token
input_cost = prompt_tokens * INPUT_RATE
output_cost = completion_tokens * OUTPUT_RATE
payload = {
"model": "gpt-4o",
"input_cost": input_cost,
"output_cost": output_cost,
"response_cost": input_cost + output_cost,
...
}API â POST /api/v1/ingest
Ingest a single LLM completion event.
https://ep.slashllm.com/api/v1/ingestRequest body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | â | Model identifier, e.g. gpt-4o |
startTime | ISO 8601 string | â | When the request started |
endTime | ISO 8601 string | â | When the response was received |
prompt_tokens | integer | â | Number of prompt tokens |
completion_tokens | integer | â | Number of completion tokens |
total_tokens | integer | â | Total tokens (computed if omitted) |
response_cost | float | â | Total cost in USD |
input_cost | float | â | Cost of prompt tokens |
output_cost | float | â | Cost of completion tokens |
status | "success" | "failure" | â | Default: "success" |
error_str | string | â | Error message for failed calls |
metadata | object | â | Attribution: team_id, feature, env, etc. |
request_tags | array of objects | â | Custom key-value tags |
end_user | string | â | End-user identifier |
cache_hit | boolean | â | Whether this was served from cache |
Response
{ "status": "accepted", "id": "evt_a1b2c3d4âĻ" }API â POST /api/v1/ingest/litellm
Native LiteLLM webhook endpoint. Accepts the three wire formats emitted by LiteLLM's built-in GenericAPILogger.
https://ep.slashllm.com/api/v1/ingest/litellmAccepted body formats
| Format | Content-Type | Body |
|---|---|---|
| json_array (default) | application/json | Bare JSON array [ {...}, {...} ] |
| single | application/json | Single JSON object { ... } |
| ndjson | application/x-ndjson | One JSON object per line |
[ it's treated as an array, otherwise single or ndjson. No Content-Type header configuration needed.Limits
| Limit | Value |
|---|---|
| Max events per request | 100 |
| Max body size | 1 MB |
| Rate limit | 1,000 req/min per org |
API â POST /api/v1/ingest/batch
Send up to 100 events in a single HTTP request. Each event is queued independently. Useful for replaying historical data or bulk ingestion.
https://ep.slashllm.com/api/v1/ingest/batchRequest body
{
"events": [
{
"model": "gpt-4o",
"startTime": "2026-06-01T10:00:00Z",
"prompt_tokens": 120,
"response_cost": 0.0024,
"metadata": { "team_id": "engineering" }
},
...
]
}Response
{ "status": "accepted", "count": 2, "ids": ["evt_a1b2âĻ", "evt_c3d4âĻ"] }Create a free account and get your API key in 30 seconds.