Routing
Weighted random selection, exponential backoff retry, automatic provider fallback, and timeouts.
CrabLLM decides which provider handles a request based on model name, routing weights, and fallback logic.
Model resolution
When a request arrives, CrabLLM looks up the model name in the configured providers. If the model is an alias, it resolves to the canonical name first (single-hop lookup).
Weighted selection
When multiple providers serve the same model, one is selected via weighted random selection. Higher weight values mean more traffic:
[providers.primary]
kind = "openai"
api_key = "${OPENAI_KEY_1}"
models = ["gpt-4o"]
weight = 3 # 75% of traffic
[providers.secondary]
kind = "openai"
api_key = "${OPENAI_KEY_2}"
models = ["gpt-4o"]
weight = 1 # 25% of trafficSelection is stateless — no shared counters. Each request picks independently.
Retry
When a provider returns a transient error (HTTP 429, 500, 502, 503, 504), CrabLLM retries the same provider with exponential backoff:
- Base delay: 100ms, doubling each retry.
- Full jitter: each sleep is a random duration in
[backoff/2, backoff]to prevent thundering herd. - Max retries: configurable per provider via
max_retries(default 2).
[providers.openai]
kind = "openai"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o"]
max_retries = 3 # retry up to 3 timesSet max_retries = 0 to disable retry entirely.
Fallback
When retries are exhausted on a provider, CrabLLM tries the next provider by descending weight. This continues until a provider succeeds or all providers have been tried.
# Primary provider (tried first)
[providers.openai]
kind = "openai"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o"]
weight = 2
# Fallback provider (tried if primary fails)
[providers.azure]
kind = "azure"
api_key = "${AZURE_KEY}"
base_url = "https://my-resource.openai.azure.com"
api_version = "2024-02-01"
models = ["gpt-4o"]
weight = 1Timeouts
Each provider call is wrapped in a timeout. If the timeout expires, the request is treated as a transient error (triggers retry/fallback):
[providers.openai]
kind = "openai"
api_key = "${OPENAI_API_KEY}"
models = ["gpt-4o"]
timeout = 60 # seconds (default: 30)Timeout errors return HTTP 504 Gateway Timeout if all providers time out.
Streaming behavior
For streaming requests, retry and fallback only apply to connection errors (before the stream starts). Once the first SSE chunk is sent to the client, the connection is committed to that provider.