LLM API Gateway

One API. Every model.

cargo install crabllm

Route requests to OpenAI, Anthropic, Gemini, Azure, Bedrock, or Ollama. Sub-millisecond overhead. Single binary. No runtime.

0.26ms P50 latency

Gateway overhead at 5,000 concurrent requests per second. Lower is better.

GatewayP50P99
CrabLLM0.26ms0.54ms
Bifrost0.61ms1.26ms
LiteLLM159ms227ms

How it works

1. Configure

listen = "0.0.0.0:8080"

[providers.openai]
kind = "openai"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o"]

[providers.anthropic]
kind = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
models = ["claude-sonnet-4-20250514"]

2. Run

crabllm --config crabllm.toml

3. Send requests

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4-20250514",
       "messages": [{"role": "user", "content": "Hello!"}]}'

Same OpenAI format, any provider. CrabLLM translates automatically.

Frequently asked questions

0.26ms P50 at 5,000 RPS. Rust with Tokio — no GC pauses, no interpreter. The gateway is not the bottleneck.