Documentation

Slashboard Docs

Slashboard is a real-time cost observability platform for LLM applications. It tracks every token, every dollar, and every team — so you know exactly what AI is costing you, and why.

⚡
2-minute setup
Two environment variables. No SDK to install.
🔍
Attribution built-in
Break spend down by team, feature, model, and environment.
🔒
PII-safe by default
Prompt content never leaves your infrastructure unless you opt in.

Quick Start

Get your first event into Slashboard in under 5 minutes.

  1. 1
    Create an account
    Sign up at app.slashllm.com — free, no credit card required.
  2. 2
    Get your API key
    Go to Settings → API Keys and create a key. Copy it — it is only shown once. Keys look like sb_live_xxxxxxxxxxxxxxxx.
  3. 3
    Send your first event
    Paste your key below and run this curl command:
    bash
    curl -X POST https://ep.slashllm.com/api/v1/ingest \
      -H "Authorization: Bearer sb_live_your_key_here" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-4o",
        "startTime": "2026-06-01T10:00:00Z",
        "prompt_tokens": 120,
        "completion_tokens": 80,
        "response_cost": 0.0024,
        "metadata": { "team_id": "engineering" }
      }'
    You'll get a 202 Accepted response with an event ID.
  4. 4
    Open the dashboard
    Visit app.slashllm.com — your event appears instantly in the Cost Explorer.

How It Works

Slashboard is a managed ingest pipeline. Your app sends one HTTP POST per completion; Slashboard handles normalization, attribution, cost computation, storage, and visualization.

Architecture
Your App ──→ /api/v1/ingest/litellm ──→ Redis Stream
│ StandardLoggingPayload │ event queue │
Redis Stream ──→ Worker ──→ PostgreSQL + TimescaleDB
│ translate │ persist │ aggregate │
PostgreSQL ──→ Dashboard API ──→ Your Browser

Ingestion is async — your app sees a 202 Accepted immediately, with no latency added to your LLM request path. Events are processed by a background worker and typically visible in the dashboard within 1–2 seconds.

EndpointUse caseLatency added
/api/v1/ingestSingle event (REST)~0ms (async)
/api/v1/ingest/litellmLiteLLM native webhook (array, ndjson, single)~0ms (async)
/api/v1/ingest/batchUp to 100 events in one POST~0ms (async)

LiteLLM Integration

Recommended

The native LiteLLM webhook is the simplest integration on the market — two environment variables, zero custom code, zero downloaded files. LiteLLM ships its built-in GenericAPILogger directly to Slashboard using the StandardLoggingPayload format.

✅ Tip
This integration works with any LiteLLM version â‰Ĩ 1.0 and supports all 100+ LiteLLM providers out of the box — OpenAI, Anthropic, Google, Groq, NVIDIA, Azure, Bedrock, and more.

Option A — Environment variables (proxy or SDK)

Set these two variables in your shell or .env file. LiteLLM reads them automatically when success_callback includes "generic_api".

bash
# Add to your .env or shell profile
export GENERIC_LOGGER_ENDPOINT=https://ep.slashllm.com/api/v1/ingest/litellm
export GENERIC_LOGGER_HEADERS="Authorization=Bearer sb_live_your_key_here"

Option B — Python SDK (recommended for applications)

Instantiate GenericAPILogger directly inside your async context. This is required because the logger uses asyncio.create_taskfor its batch-flush routine — it must run inside a live event loop.

python
import asyncio
import litellm
from litellm.integrations.generic_api.generic_api_callback import GenericAPILogger

async def main():
    # Wire Slashboard — no custom code, just LiteLLM's native logger
    logger = GenericAPILogger(
        endpoint="https://ep.slashllm.com/api/v1/ingest/litellm",
        headers={"Authorization": "Bearer sb_live_your_key_here"},
    )
    litellm.callbacks = [logger]

    # Use acompletion — required for the async GenericAPILogger
    response = await litellm.acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize this ticket."}],
        metadata={
            "requester_metadata": {
                "team": "support",
                "feature": "ticket-summarizer",
                "env": "production",
            },
        },
    )

    # Flush before the event loop closes
    await asyncio.sleep(1.0)
    await logger.async_send_batch()

asyncio.run(main())

Option C — Python SDK with config file

Want to keep credentials out of code? Load them from a YAML or JSON config file and pass the values to GenericAPILogger. This gives you the same "externalize your config" pattern that the proxy YAML provides, without needing the proxy.

Create a slashboard.yaml (or JSON) alongside your app:

yaml
# slashboard.yaml
slashboard:
  endpoint: https://ep.slashllm.com/api/v1/ingest/litellm
  api_key: sb_live_your_key_here
  log_format: json_array

Then load it at startup:

python
import asyncio
import yaml
from litellm.integrations.generic_api.generic_api_callback import GenericAPILogger
import litellm

def load_slashboard_logger(config_path: str = "slashboard.yaml") -> GenericAPILogger:
    with open(config_path) as f:
        cfg = yaml.safe_load(f)["slashboard"]
    return GenericAPILogger(
        endpoint=cfg["endpoint"],
        headers={"Authorization": f"Bearer {cfg['api_key']}"},
        log_format=cfg.get("log_format", "json_array"),
    )

async def main():
    logger = load_slashboard_logger()   # reads slashboard.yaml
    litellm.callbacks = [logger]

    response = await litellm.acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize this ticket."}],
        metadata={
            "requester_metadata": {
                "team": "support",
                "feature": "ticket-summarizer",
            },
        },
    )

    await asyncio.sleep(1.0)
    await logger.async_send_batch()

asyncio.run(main())
âš ī¸ Warning
LiteLLM's callback_settings dict (the Python equivalent of proxy YAML) does not auto-resolve for SDK acompletion() calls — the resolver only runs at proxy startup, not during SDK completions. Always instantiate GenericAPILogger directly as shown above.

Option D — LiteLLM Proxy (YAML config)

If you run the LiteLLM proxy server, add Slashboard to your litellm_config.yaml:

yaml
litellm_settings:
  callbacks: ["slashboard"]

callback_settings:
  slashboard:
    callback_type: generic_api
    endpoint: https://ep.slashllm.com/api/v1/ingest/litellm
    headers:
      Authorization: Bearer sb_live_your_key_here
    log_format: json_array
    event_types: ["llm_api_success", "llm_api_failure"]

Wire formats supported

The /api/v1/ingest/litellm endpoint auto-detects the body format LiteLLM uses — you don't configure anything.

FormatBody shapeWhen LiteLLM sends it
json_arrayBare JSON array [ {...}, {...} ]Default — batch of events
singleSingle JSON object { ... }One POST per event
ndjsonNewline-delimited {...}\n{...}Streaming format

Direct REST API

If you're not using LiteLLM, send events directly via HTTP POST. Any language, any framework — if it can make an HTTP request, it works.

Python

python
import requests
from datetime import datetime, timezone

def track_llm_call(model, prompt_tokens, completion_tokens, cost, team, feature):
    requests.post(
        "https://ep.slashllm.com/api/v1/ingest",
        headers={"Authorization": "Bearer sb_live_your_key_here"},
        json={
            "model": model,
            "startTime": datetime.now(timezone.utc).isoformat(),
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "response_cost": cost,
            "status": "success",
            "metadata": {"team_id": team, "feature": feature},
        },
        timeout=3,
    )

TypeScript / Node.js

typescript
async function trackLLMCall(params: {
  model: string;
  promptTokens: number;
  completionTokens: number;
  cost: number;
  team: string;
  feature: string;
}) {
  await fetch("https://ep.slashllm.com/api/v1/ingest", {
    method: "POST",
    headers: {
      Authorization: "Bearer sb_live_your_key_here",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: params.model,
      startTime: new Date().toISOString(),
      prompt_tokens: params.promptTokens,
      completion_tokens: params.completionTokens,
      response_cost: params.cost,
      status: "success",
      metadata: { team_id: params.team, feature: params.feature },
    }),
  });
}

Go

go
func TrackLLMCall(model, team, feature string, promptTokens, completionTokens int, cost float64) error {
    payload, _ := json.Marshal(map[string]any{
        "model":			model,
        "startTime":			time.Now().UTC().Format(time.RFC3339),
        "prompt_tokens":		promptTokens,
        "completion_tokens":	completionTokens,
        "response_cost":		cost,
        "status":			"success",
        "metadata":			map[string]string{"team_id": team, "feature": feature},
    })
    req, _ := http.NewRequest("POST", "https://ep.slashllm.com/api/v1/ingest", bytes.NewReader(payload))
    req.Header.Set("Authorization", "Bearer sb_live_your_key_here")
    req.Header.Set("Content-Type", "application/json")
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return err }
    defer resp.Body.Close()
    return nil
}

Python Callback (Advanced)

For maximum control — custom filtering, local buffering, or enrichment before sending — use the SlashboardLogger custom callback. Drop slashboard_callback.py into your project and wire it as a LiteLLM callback.

â„šī¸ Note
The LiteLLM native webhook (above) is simpler for most teams. Use this approach when you need to pre-process events, filter out certain calls, or add custom enrichment before data reaches Slashboard.

1. Download the callback file

bash
curl -O https://raw.githubusercontent.com/slashboard-io/slashboard/main/demos/litellm-test-app/slashboard_callback.py

2. Wire it up

python
import litellm
from slashboard_callback import SlashboardLogger
import os

os.environ["SLASHBOARD_API_KEY"] = "sb_live_your_key_here"
os.environ["SLASHBOARD_API_URL"] = "https://ep.slashllm.com"

litellm.callbacks = [SlashboardLogger()]

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    metadata={
        "requester_metadata": {
            "team": "engineering",
            "feature": "chat",
            "env": "production",
        },
    },
)

Attribution & Teams

Attribution tells Slashboard who is spending what. Every event can carry team, feature, environment, and user dimensions — these power the Cost Explorer breakdowns and budget alerts.

Dimensions

FieldTypeWhere to set itExample
team_idstringmetadata.requester_metadata.team_id (SDK) or metadata.user_api_key_team_id (proxy)engineering
teamstringmetadata.requester_metadata.team (LiteLLM SDK)engineering
featurestringmetadata.requester_metadata.featurecode-review
envstringmetadata.requester_metadata.envproduction
end_userstringtop-level end_user fielduser_abc123

Best practices

Standardize your team_id values across all services before ingesting data. The Cost Explorer groups by exact string match — inconsistent names (e.g. "eng" vs "engineering") result in split rows.

python
# Good — consistent team IDs across your org
TEAMS = {"engineering", "data-science", "marketing", "support", "finance"}

# Pass in every completion call (LiteLLM SDK path)
metadata = {
    "requester_metadata": {
        "team": "engineering",		# team attribution
        "feature": "code-review",	# feature/product area
        "env": "production",		# environment
    }
}

Tags & Filtering

Tags are arbitrary key-value pairs stored alongside each event. They drive filtering and grouping in the Cost Explorer — any tag key becomes a pivot axis.

Sending tags via the REST API

json
{
  "model": "claude-opus-4-6",
  "startTime": "2026-06-01T10:00:00Z",
  "prompt_tokens": 800,
  "response_cost": 0.0420,
  "request_tags": [
    { "team": "legal" },
    { "feature": "contract-review" },
    { "env": "production" },
    { "customer_tier": "enterprise" }
  ]
}

Sending tags via LiteLLM metadata

When using the LiteLLM SDK, nest your custom keys inside metadata.requester_metadata. This is LiteLLM's designated namespace for caller-supplied data — keys placed here are forwarded through the StandardLoggingPayload and picked up by Slashboard's ingest adapter as tags.

python
await litellm.acompletion(
    model="gpt-4o",
    messages=[...],
    metadata={
        "requester_metadata": {
            "team": "legal",
            "feature": "contract-review",
            "env": "production",
            "customer_tier": "enterprise",  # custom tag
            "experiment": "prompt-v3",	   # A/B test breakdown
        },
    },
}

Cost Model

Slashboard stores the cost you send. If you send response_cost: 0, it stores zero — it does not silently recompute. This keeps your dashboard consistent with what your provider actually charges.

Sending cost from LiteLLM

LiteLLM automatically computes response_cost from its built-in pricing tables and includes it in the StandardLoggingPayload. No extra work needed.

Sending cost from your own code

If you compute cost yourself, include both input_cost and output_cost for the most accurate breakdown charts:

python
# GPT-4o pricing (June 2026)
INPUT_RATE  = 2.50 / 1_000_000   # $ per token
OUTPUT_RATE = 10.0 / 1_000_000   # $ per token

input_cost  = prompt_tokens     * INPUT_RATE
output_cost = completion_tokens * OUTPUT_RATE

payload = {
    "model": "gpt-4o",
    "input_cost": input_cost,
    "output_cost": output_cost,
    "response_cost": input_cost + output_cost,
    ...
}

API — POST /api/v1/ingest

Ingest a single LLM completion event.

POSThttps://ep.slashllm.com/api/v1/ingest

Request body

FieldTypeRequiredDescription
modelstring✅Model identifier, e.g. gpt-4o
startTimeISO 8601 string✅When the request started
endTimeISO 8601 string—When the response was received
prompt_tokensinteger—Number of prompt tokens
completion_tokensinteger—Number of completion tokens
total_tokensinteger—Total tokens (computed if omitted)
response_costfloat—Total cost in USD
input_costfloat—Cost of prompt tokens
output_costfloat—Cost of completion tokens
status"success" | "failure"—Default: "success"
error_strstring—Error message for failed calls
metadataobject—Attribution: team_id, feature, env, etc.
request_tagsarray of objects—Custom key-value tags
end_userstring—End-user identifier
cache_hitboolean—Whether this was served from cache

Response

json
{ "status": "accepted", "id": "evt_a1b2c3d4â€Ļ" }

API — POST /api/v1/ingest/litellm

Native LiteLLM webhook endpoint. Accepts the three wire formats emitted by LiteLLM's built-in GenericAPILogger.

POSThttps://ep.slashllm.com/api/v1/ingest/litellm

Accepted body formats

FormatContent-TypeBody
json_array (default)application/jsonBare JSON array [ {...}, {...} ]
singleapplication/jsonSingle JSON object { ... }
ndjsonapplication/x-ndjsonOne JSON object per line
â„šī¸ Note
The endpoint auto-detects the format — if the body starts with [ it's treated as an array, otherwise single or ndjson. No Content-Type header configuration needed.

Limits

LimitValue
Max events per request100
Max body size1 MB
Rate limit1,000 req/min per org

API — POST /api/v1/ingest/batch

Send up to 100 events in a single HTTP request. Each event is queued independently. Useful for replaying historical data or bulk ingestion.

POSThttps://ep.slashllm.com/api/v1/ingest/batch

Request body

json
{
  "events": [
    {
      "model": "gpt-4o",
      "startTime": "2026-06-01T10:00:00Z",
      "prompt_tokens": 120,
      "response_cost": 0.0024,
      "metadata": { "team_id": "engineering" }
    },
    ...
  ]
}

Response

json
{ "status": "accepted", "count": 2, "ids": ["evt_a1b2â€Ļ", "evt_c3d4â€Ļ"] }
Ready to start?

Create a free account and get your API key in 30 seconds.