CrabTalkCrabTalk

Introduction

High-performance LLM API gateway in Rust — one API format, many providers, sub-millisecond overhead.

CrabLLM is a high-performance LLM API gateway written in Rust. It sits between your application and LLM providers, exposing an OpenAI-compatible API surface.

One API format. Many providers. Low overhead.

What it does

You send requests in OpenAI format to CrabLLM. It routes them to the configured provider — OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, or Ollama — translating the request and response as needed.

Your application talks to one endpoint. CrabLLM handles the rest:

  • Provider translation — Anthropic, Google, and Bedrock have their own API formats. CrabLLM translates automatically.
  • Routing — Weighted random selection across multiple providers for the same model. Automatic fallback when a provider fails.
  • Streaming — SSE streaming proxied without buffering.
  • Auth — Virtual API keys with per-key model access control.
  • Extensions — Rate limiting, caching, cost tracking, budget enforcement.

Why Rust

  • Sub-millisecond overhead — no GC pauses, no interpreter startup.
  • Memory safety — without runtime cost.
  • Concurrency — Tokio async runtime handles thousands of concurrent streaming connections efficiently.
  • Deployment — single static binary. No interpreter, no virtualenv, no Docker required.

Feature comparison

FeatureLiteLLMCrabLLM
/chat/completionsyesyes
/embeddingsyesyes
/modelsyesyes
OpenAI provideryesyes
Anthropic provideryesyes
Google Gemini provideryesyes
Azure OpenAI provideryesyes
AWS Bedrock provideryesyes
Tool/function callingyesyes
SSE streamingyesyes
Virtual keys + authyesyes
Weighted routingyesyes
Model aliasingyesyes
Retry + fallbackyesyes
Rate limiting (RPM/TPM)yesyes
Cost/usage trackingyesyes
Budget enforcementyesyes
Request cachingyesyes
Image/audio endpointsyesyes
Storage (memory)yesyes
Storage (persistent)PostgresSQLite
Redis storageyesyes

On this page