Documentation

Slashboard Docs

Slashboard is a real-time cost observability platform for LLM applications. It tracks every token, every dollar, and every team — so you know exactly what AI is costing you, and why.

⚡

2-minute setup

Two environment variables. No SDK to install.

🔍

Attribution built-in

Break spend down by team, feature, model, and environment.

🔒

PII-safe by default

Prompt content never leaves your infrastructure unless you opt in.

Quick Start

Get your first event into Slashboard in under 5 minutes.

1
Create an account
Sign up at app.slashllm.com — free, no credit card required.
2
Get your API key
Go to Settings → API Keys and create a key. Copy it — it is only shown once. Keys look like sb_live_xxxxxxxxxxxxxxxx.

Send your first event

Paste your key below and run this curl command:

bash

curl -X POST https://ep.slashllm.com/api/v1/ingest \
  -H "Authorization: Bearer sb_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "startTime": "2026-06-01T10:00:00Z",
    "prompt_tokens": 120,
    "completion_tokens": 80,
    "response_cost": 0.0024,
    "metadata": { "team_id": "engineering" }
  }'

You'll get a 202 Accepted response with an event ID.

4
Open the dashboard
Visit app.slashllm.com — your event appears instantly in the Cost Explorer.

How It Works

Slashboard is a managed ingest pipeline. Your app sends one HTTP POST per completion; Slashboard handles normalization, attribution, cost computation, storage, and visualization.

Architecture

Your App ──→ /api/v1/ingest/litellm ──→ Redis Stream

│ StandardLoggingPayload │ event queue │

Redis Stream ──→ Worker ──→ PostgreSQL + TimescaleDB

│ translate │ persist │ aggregate │

PostgreSQL ──→ Dashboard API ──→ Your Browser

Ingestion is async — your app sees a 202 Accepted immediately, with no latency added to your LLM request path. Events are processed by a background worker and typically visible in the dashboard within 1–2 seconds.

Endpoint	Use case	Latency added
`/api/v1/ingest`	Single event (REST)	~0ms (async)
`/api/v1/ingest/litellm`	LiteLLM native webhook (array, ndjson, single)	~0ms (async)
`/api/v1/ingest/batch`	Up to 100 events in one POST	~0ms (async)

LiteLLM Integration

Recommended

The native LiteLLM webhook is the simplest integration on the market — two environment variables, zero custom code, zero downloaded files. LiteLLM ships its built-in GenericAPILogger directly to Slashboard using the StandardLoggingPayload format.

✅ Tip

This integration works with any LiteLLM version ≥ 1.0 and supports all 100+ LiteLLM providers out of the box — OpenAI, Anthropic, Google, Groq, NVIDIA, Azure, Bedrock, and more.

Option A — Environment variables (proxy or SDK)

Set these two variables in your shell or .env file. LiteLLM reads them automatically when success_callback includes "generic_api".

bash

# Add to your .env or shell profile
export GENERIC_LOGGER_ENDPOINT=https://ep.slashllm.com/api/v1/ingest/litellm
export GENERIC_LOGGER_HEADERS="Authorization=Bearer sb_live_your_key_here"

Option B — Python SDK (recommended for applications)

Instantiate GenericAPILogger directly inside your async context. This is required because the logger uses asyncio.create_taskfor its batch-flush routine — it must run inside a live event loop.

python

import asyncio
import litellm
from litellm.integrations.generic_api.generic_api_callback import GenericAPILogger

async def main():
    # Wire Slashboard — no custom code, just LiteLLM's native logger
    logger = GenericAPILogger(
        endpoint="https://ep.slashllm.com/api/v1/ingest/litellm",
        headers={"Authorization": "Bearer sb_live_your_key_here"},
    )
    litellm.callbacks = [logger]

    # Use acompletion — required for the async GenericAPILogger
    response = await litellm.acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize this ticket."}],
        metadata={
            "requester_metadata": {
                "team": "support",
                "feature": "ticket-summarizer",
                "env": "production",
            },
        },
    )

    # Flush before the event loop closes
    await asyncio.sleep(1.0)
    await logger.async_send_batch()

asyncio.run(main())

Option C — Python SDK with config file

Want to keep credentials out of code? Load them from a YAML or JSON config file and pass the values to GenericAPILogger. This gives you the same "externalize your config" pattern that the proxy YAML provides, without needing the proxy.

Create a slashboard.yaml (or JSON) alongside your app:

yaml

# slashboard.yaml
slashboard:
  endpoint: https://ep.slashllm.com/api/v1/ingest/litellm
  api_key: sb_live_your_key_here
  log_format: json_array

Then load it at startup:

python

import asyncio
import yaml
from litellm.integrations.generic_api.generic_api_callback import GenericAPILogger
import litellm

def load_slashboard_logger(config_path: str = "slashboard.yaml") -> GenericAPILogger:
    with open(config_path) as f:
        cfg = yaml.safe_load(f)["slashboard"]
    return GenericAPILogger(
        endpoint=cfg["endpoint"],
        headers={"Authorization": f"Bearer {cfg['api_key']}"},
        log_format=cfg.get("log_format", "json_array"),
    )

async def main():
    logger = load_slashboard_logger()   # reads slashboard.yaml
    litellm.callbacks = [logger]

    response = await litellm.acompletion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize this ticket."}],
        metadata={
            "requester_metadata": {
                "team": "support",
                "feature": "ticket-summarizer",
            },
        },
    )

    await asyncio.sleep(1.0)
    await logger.async_send_batch()

asyncio.run(main())

⚠️ Warning

LiteLLM's callback_settings dict (the Python equivalent of proxy YAML) does not auto-resolve for SDK acompletion() calls — the resolver only runs at proxy startup, not during SDK completions. Always instantiate GenericAPILogger directly as shown above.

Option D — LiteLLM Proxy (YAML config)

If you run the LiteLLM proxy server, add Slashboard to your litellm_config.yaml:

yaml

litellm_settings:
  callbacks: ["slashboard"]

callback_settings:
  slashboard:
    callback_type: generic_api
    endpoint: https://ep.slashllm.com/api/v1/ingest/litellm
    headers:
      Authorization: Bearer sb_live_your_key_here
    log_format: json_array
    event_types: ["llm_api_success", "llm_api_failure"]

Wire formats supported

The /api/v1/ingest/litellm endpoint auto-detects the body format LiteLLM uses — you don't configure anything.

Format	Body shape	When LiteLLM sends it
`json_array`	Bare JSON array [ {...}, {...} ]	Default — batch of events
`single`	Single JSON object { ... }	One POST per event
`ndjson`	Newline-delimited {...}\n{...}	Streaming format

Direct REST API

If you're not using LiteLLM, send events directly via HTTP POST. Any language, any framework — if it can make an HTTP request, it works.

Python

python

import requests
from datetime import datetime, timezone

def track_llm_call(model, prompt_tokens, completion_tokens, cost, team, feature):
    requests.post(
        "https://ep.slashllm.com/api/v1/ingest",
        headers={"Authorization": "Bearer sb_live_your_key_here"},
        json={
            "model": model,
            "startTime": datetime.now(timezone.utc).isoformat(),
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "response_cost": cost,
            "status": "success",
            "metadata": {"team_id": team, "feature": feature},
        },
        timeout=3,
    )

TypeScript / Node.js

typescript

async function trackLLMCall(params: {
  model: string;
  promptTokens: number;
  completionTokens: number;
  cost: number;
  team: string;
  feature: string;
}) {
  await fetch("https://ep.slashllm.com/api/v1/ingest", {
    method: "POST",
    headers: {
      Authorization: "Bearer sb_live_your_key_here",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: params.model,
      startTime: new Date().toISOString(),
      prompt_tokens: params.promptTokens,
      completion_tokens: params.completionTokens,
      response_cost: params.cost,
      status: "success",
      metadata: { team_id: params.team, feature: params.feature },
    }),
  });
}

Go

func TrackLLMCall(model, team, feature string, promptTokens, completionTokens int, cost float64) error {
    payload, _ := json.Marshal(map[string]any{
        "model":			model,
        "startTime":			time.Now().UTC().Format(time.RFC3339),
        "prompt_tokens":		promptTokens,
        "completion_tokens":	completionTokens,
        "response_cost":		cost,
        "status":			"success",
        "metadata":			map[string]string{"team_id": team, "feature": feature},
    })
    req, _ := http.NewRequest("POST", "https://ep.slashllm.com/api/v1/ingest", bytes.NewReader(payload))
    req.Header.Set("Authorization", "Bearer sb_live_your_key_here")
    req.Header.Set("Content-Type", "application/json")
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return err }
    defer resp.Body.Close()
    return nil
}

Python Callback (Advanced)

For maximum control — custom filtering, local buffering, or enrichment before sending — use the SlashboardLogger custom callback. Drop slashboard_callback.py into your project and wire it as a LiteLLM callback.

ℹ️ Note

The LiteLLM native webhook (above) is simpler for most teams. Use this approach when you need to pre-process events, filter out certain calls, or add custom enrichment before data reaches Slashboard.

1. Download the callback file

bash

curl -O https://raw.githubusercontent.com/slashboard-io/slashboard/main/demos/litellm-test-app/slashboard_callback.py

2. Wire it up

python

import litellm
from slashboard_callback import SlashboardLogger
import os

os.environ["SLASHBOARD_API_KEY"] = "sb_live_your_key_here"
os.environ["SLASHBOARD_API_URL"] = "https://ep.slashllm.com"

litellm.callbacks = [SlashboardLogger()]

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    metadata={
        "requester_metadata": {
            "team": "engineering",
            "feature": "chat",
            "env": "production",
        },
    },
)

Attribution & Teams

Attribution tells Slashboard who is spending what. Every event can carry team, feature, environment, and user dimensions — these power the Cost Explorer breakdowns and budget alerts.

Dimensions

Field	Type	Where to set it	Example
`team_id`	string	metadata.requester_metadata.team_id (SDK) or metadata.user_api_key_team_id (proxy)	engineering
`team`	string	metadata.requester_metadata.team (LiteLLM SDK)	engineering
`feature`	string	metadata.requester_metadata.feature	code-review
`env`	string	metadata.requester_metadata.env	production
`end_user`	string	top-level end_user field	user_abc123

Best practices

Standardize your team_id values across all services before ingesting data. The Cost Explorer groups by exact string match — inconsistent names (e.g. "eng" vs "engineering") result in split rows.

python

# Good — consistent team IDs across your org
TEAMS = {"engineering", "data-science", "marketing", "support", "finance"}

# Pass in every completion call (LiteLLM SDK path)
metadata = {
    "requester_metadata": {
        "team": "engineering",		# team attribution
        "feature": "code-review",	# feature/product area
        "env": "production",		# environment
    }
}

Tags & Filtering

Tags are arbitrary key-value pairs stored alongside each event. They drive filtering and grouping in the Cost Explorer — any tag key becomes a pivot axis.

Sending tags via the REST API

json

{
  "model": "claude-opus-4-6",
  "startTime": "2026-06-01T10:00:00Z",
  "prompt_tokens": 800,
  "response_cost": 0.0420,
  "request_tags": [
    { "team": "legal" },
    { "feature": "contract-review" },
    { "env": "production" },
    { "customer_tier": "enterprise" }
  ]
}

Sending tags via LiteLLM metadata

When using the LiteLLM SDK, nest your custom keys inside metadata.requester_metadata. This is LiteLLM's designated namespace for caller-supplied data — keys placed here are forwarded through the StandardLoggingPayload and picked up by Slashboard's ingest adapter as tags.

python

await litellm.acompletion(
    model="gpt-4o",
    messages=[...],
    metadata={
        "requester_metadata": {
            "team": "legal",
            "feature": "contract-review",
            "env": "production",
            "customer_tier": "enterprise",  # custom tag
            "experiment": "prompt-v3",	   # A/B test breakdown
        },
    },
}

Cost Model

Slashboard stores the cost you send. If you send response_cost: 0, it stores zero — it does not silently recompute. This keeps your dashboard consistent with what your provider actually charges.

Sending cost from LiteLLM

LiteLLM automatically computes response_cost from its built-in pricing tables and includes it in the StandardLoggingPayload. No extra work needed.

Sending cost from your own code

If you compute cost yourself, include both input_cost and output_cost for the most accurate breakdown charts:

python

# GPT-4o pricing (June 2026)
INPUT_RATE  = 2.50 / 1_000_000   # $ per token
OUTPUT_RATE = 10.0 / 1_000_000   # $ per token

input_cost  = prompt_tokens     * INPUT_RATE
output_cost = completion_tokens * OUTPUT_RATE

payload = {
    "model": "gpt-4o",
    "input_cost": input_cost,
    "output_cost": output_cost,
    "response_cost": input_cost + output_cost,
    ...
}

API — POST /api/v1/ingest

Ingest a single LLM completion event.

POSThttps://ep.slashllm.com/api/v1/ingest

Request body

Field	Type	Required	Description
`model`	string	✅	Model identifier, e.g. gpt-4o
`startTime`	ISO 8601 string	✅	When the request started
`endTime`	ISO 8601 string	—	When the response was received
`prompt_tokens`	integer	—	Number of prompt tokens
`completion_tokens`	integer	—	Number of completion tokens
`total_tokens`	integer	—	Total tokens (computed if omitted)
`response_cost`	float	—	Total cost in USD
`input_cost`	float	—	Cost of prompt tokens
`output_cost`	float	—	Cost of completion tokens
`status`	"success" \| "failure"	—	Default: "success"
`error_str`	string	—	Error message for failed calls
`metadata`	object	—	Attribution: team_id, feature, env, etc.
`request_tags`	array of objects	—	Custom key-value tags
`end_user`	string	—	End-user identifier
`cache_hit`	boolean	—	Whether this was served from cache

Response

json

{ "status": "accepted", "id": "evt_a1b2c3d4…" }

API — POST /api/v1/ingest/litellm

Native LiteLLM webhook endpoint. Accepts the three wire formats emitted by LiteLLM's built-in GenericAPILogger.

POSThttps://ep.slashllm.com/api/v1/ingest/litellm

Accepted body formats

Format	Content-Type	Body
json_array (default)	application/json	Bare JSON array [ {...}, {...} ]
single	application/json	Single JSON object { ... }
ndjson	application/x-ndjson	One JSON object per line

ℹ️ Note

The endpoint auto-detects the format — if the body starts with [ it's treated as an array, otherwise single or ndjson. No Content-Type header configuration needed.

Limits

Limit	Value
Max events per request	100
Max body size	1 MB
Rate limit	1,000 req/min per org

API — POST /api/v1/ingest/batch

Send up to 100 events in a single HTTP request. Each event is queued independently. Useful for replaying historical data or bulk ingestion.

POSThttps://ep.slashllm.com/api/v1/ingest/batch

Request body

json

{
  "events": [
    {
      "model": "gpt-4o",
      "startTime": "2026-06-01T10:00:00Z",
      "prompt_tokens": 120,
      "response_cost": 0.0024,
      "metadata": { "team_id": "engineering" }
    },
    ...
  ]
}

Response

json

{ "status": "accepted", "count": 2, "ids": ["evt_a1b2…", "evt_c3d4…"] }

Ready to start?

Create a free account and get your API key in 30 seconds.

Start for free Interactive API reference