Architecture
Design principles, workspace layout, and request flow through the CrabLLM gateway.
Principles
- Simplicity over abstraction. No trait where a function suffices.
- Single responsibility. Each crate has one focused job.
- OpenAI as canonical format. Providers translate to/from it.
- Streaming first-class. Never buffer a full response when streaming.
- Configuration-driven. Provider setup and routing from config, not code.
- Minimal gateway latency. Avoid hot-path allocations.
Workspace layout
crabllm/
crates/
crabllm/ — binary, wires everything together
core/ — shared types, config, errors
provider/ — provider enum + translation modules
proxy/ — HTTP server, routing, extensions
bench/ — benchmark mock backendCrates
crabllm
Binary entry point. Loads TOML config, builds the provider registry, initializes the storage backend and extensions, starts the Axum HTTP server. CLI args: --config and --bind.
core
Shared types with no business logic. Contains:
- Config —
GatewayConfigwith env var interpolation. - Types — OpenAI-compatible wire format structs (request, response, chunk).
- Error — error enum with transient detection for retry logic.
- Storage — async KV trait with memory, SQLite, and Redis backends.
- Extension — hook trait for the request pipeline.
provider
Provider dispatch. The Provider enum has variants for each supported provider. Each variant dispatches to a per-provider module that handles request/response translation. ProviderRegistry maps model names to weighted deployment lists.
proxy
Axum HTTP server. Route handlers implement retry + fallback across deployments. Auth middleware validates virtual keys. Five built-in extensions run as in-handler hooks.