CrabTalkCrabTalk

Local Models

Run LLMs locally on Apple Silicon via MLX — no Docker, no Ollama, no separate process.

CrabDash can route to local models alongside cloud providers. Your app doesn't need to know the difference — the same 127.0.0.1:8787 endpoint serves both.

MLX (Apple Silicon native)

On Apple Silicon Macs, CrabDash runs models directly via MLX — no separate process, no Docker, no Ollama required.

Pull a model

  1. Click the menubar icon
  2. Open Models → Local
  3. Browse available models or search by name
  4. Click Pull — the model downloads to ~/.crabdash/models/

Route to a local model

Once pulled, the model appears in /v1/models. Use it like any other model:

curl http://127.0.0.1:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
ModelSizeUse case
Llama 3.2 3B 4-bit~1.8 GBFast local chat, coding assistance
Qwen 2.5 7B 4-bit~4.5 GBStronger reasoning, multilingual
Mistral 7B 4-bit~4.2 GBGeneral purpose, good code generation

Memory requirement: roughly 1 GB per billion parameters at 4-bit quantization. A 16 GB Mac can comfortably run 7B models alongside other apps.

Routing between local and cloud

You can route specific models or apps to local models while keeping others on cloud providers. In Settings → Routing:

  • By model: Pin a model name to the local provider
  • By app: Route requests from a specific app (identified by API key) to local-only

This is useful for keeping development traffic local (free, private) while using cloud models for production workloads.

On this page