Local Models
Run LLMs locally on Apple Silicon via MLX — no Docker, no Ollama, no separate process.
CrabDash can route to local models alongside cloud providers. Your app doesn't need to know the difference — the same 127.0.0.1:8787 endpoint serves both.
MLX (Apple Silicon native)
On Apple Silicon Macs, CrabDash runs models directly via MLX — no separate process, no Docker, no Ollama required.
Pull a model
- Click the menubar icon
- Open Models → Local
- Browse available models or search by name
- Click Pull — the model downloads to
~/.crabdash/models/
Route to a local model
Once pulled, the model appears in /v1/models. Use it like any other model:
curl http://127.0.0.1:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
"messages": [{"role": "user", "content": "Hello"}]
}'Recommended models
| Model | Size | Use case |
|---|---|---|
| Llama 3.2 3B 4-bit | ~1.8 GB | Fast local chat, coding assistance |
| Qwen 2.5 7B 4-bit | ~4.5 GB | Stronger reasoning, multilingual |
| Mistral 7B 4-bit | ~4.2 GB | General purpose, good code generation |
Memory requirement: roughly 1 GB per billion parameters at 4-bit quantization. A 16 GB Mac can comfortably run 7B models alongside other apps.
Routing between local and cloud
You can route specific models or apps to local models while keeping others on cloud providers. In Settings → Routing:
- By model: Pin a model name to the local provider
- By app: Route requests from a specific app (identified by API key) to local-only
This is useful for keeping development traffic local (free, private) while using cloud models for production workloads.