Open-source AI gateway

One key. Every model.

Drop ModelMeld between your coding tools and your LLM backends. Cheap requests run on a fast local model; hard ones go to the frontier APIs you already pay for. Your keys stay on your machine.

AGPL-3.0 core OpenAI-compatible Keys stay local
How it works

Three steps to a smarter, cheaper inference path.

01

Point your tool at localhost

Claude Code, Cursor, Aider, anything OpenAI-compatible — base URL stays the same. No code changes.

OPENAI_BASE_URL=http://localhost:8000/v1
02

ModelMeld routes each request

Simple prompts (autocomplete, type hints, docstring adds) go to a fast local model. Hard prompts (debugging, architecture) go to the frontier API you already pay for.

route: scout → qwen-7b (local)
route: scout → claude-sonnet-4-6
03

Right answer, less money

Same model name on the wire. Same answer quality on the prompts that matter. A line item that's a fraction of pure-frontier.

$ /usr/bin/finops yesterday
  cloud spend  $0.27
  local spend  $0.00

Your frontier API keys are stored locally and never leave your machine. The hosted local model is the only path that touches our infrastructure.

What's validated

Real numbers from real coding-tool traffic.

Benchmarks run on the open-source gateway. No CRMs, no logos, no fluff — when a number lands here it has a reproducible test behind it.

Cross-model memory continuity

Switch from local to frontier mid-conversation, no context loss. Validated end-to-end.

Drop-in for the tools you use

Speaks Anthropic Messages for Claude Code and OpenAI Chat Completions for Cursor, Aider, Continue, Cline. No client code changes.

Polyglot from day one

Routing validated across Python, JavaScript, TypeScript, Go, Rust, and Java dev-tool prompts. The scout works on prompt shape, not language.

Pricing

Pay for what's actually expensive.

Local-routed traffic is effectively free. You pay for tokens that actually went to a frontier API, plus a small margin on hosted local. Bring-your-own everything is always free.

OSS
FreeAGPL-3.0

Self-host ModelMeld. Bring your own frontier keys and your own GPU. Everything works on a snapshot of the routing data; the live curated feed is the upgrade.

  • Self-host ModelMeld
  • Bring your own frontier API keys
  • Bring your own local GPU
  • Bundled seed routing data (stale but functional)
See details
Most popular
Hosted
$0.30/ Mtoken (local model)

Same engine, but we host the local model. No DevOps, no GPU procurement. Live curated routing feed bundled in. Your frontier keys stay on your machine.

  • We host the local model — no GPU required
  • Pay-as-you-go: top up in $20 / $50 / $100 packs
  • Your frontier keys stay on your machine
  • Live curated routing feed (Pro tier — bundled)
See details
Pro
$29/ month

For self-hosters who want the live curated routing feed without our hosted model. Always up-to-date model selection data, same engine.

  • Always up-to-date routing feed
  • Same OSS engine
  • Self-host on your own hardware
  • Cancel anytime
See details
Enterprise
Custom

Your hardware, your cloud, or hosted by us — pricing flexes to the deployment. SSO + RBAC, audit logs, custom SLAs, direct engineering support.

  • Deploy on your hardware, your cloud, or ours
  • SSO (OIDC) + RBAC
  • SOC 2-grade immutable audit log
  • Custom SLAs
See details
Trust

Built into the product.

Your API keys never leave your machine.

The local installer holds them; routing decisions happen in your process. The frontier provider sees a call from your machine, not ours.

Open-source core.

ModelMeld is AGPL-3.0 licensed. No lock-in, fork-friendly, contribute-friendly — calling the gateway over HTTP from your tools doesn't make them AGPL. The whole routing engine is public.

We store the minimum we can.

  • Always: request metadata (model, latency, tokens) plus a SHA-256 prompt hash — never the prompt body itself.
  • Only when you enable them: completion cache and tiered memory persist prompt-equivalent content — cache for repeats, memory for cross-model continuity. Both are off by default in OSS and Hosted. Tenant-scoped when on, you control retention.

Works with the tools you already use.

Two native API surfaces — Anthropic Messages and OpenAI Chat Completions — so Claude Code, Cursor, Aider, Continue, Cline, and any OpenAI or Anthropic SDK drop in unchanged. No SDK swaps, no client code rewrites.

Try it. Twenty bucks gets you a long way.

Self-host the gateway in five minutes, or skip the GPU and use our hosted pods. Either way you can be making routed requests in the time it takes to read this paragraph.