Fast Apply: Code Merging at 10,500 tok/s for AI Coding Agents

Fast Apply is a 7B model that merges AI-generated code edits into files at 10,500 tokens per second with 98% accuracy. OpenAI-compatible API, $0.80/M input tokens. Used by JetBrains, Vercel, and Webflow.

March 5, 2026 ยท 2 min read

Coding agents generate edits. Applying those edits to files reliably, quickly, and cheaply is the infrastructure gap that Fast Apply fills. A 7B model trained on code merging, running on custom CUDA kernels at 10,500 tokens per second.

10,500
Tokens per second
98%
Merge accuracy
$0.80
Per 1M input tokens
0.8s
500-line file merge

The Problem: Applying Edits Is Harder Than Generating Them

Every coding agent faces the same bottleneck. The planning model generates an edit. Now you need to apply it to the file. There are three approaches, and they all have problems.

ApproachSpeedAccuracyCost
Full-file rewrite80-100 tok/sHigh (but slow)3,500-4,500 tokens/edit
Search-and-replaceInstant84-96% (brittle)Minimal
Diff / patchInstantFails on complex editsMinimal
Fast Apply10,500 tok/s98-100%700-1,400 tokens/edit

Full-file rewrites are accurate but slow and expensive. Asking Claude Sonnet to rewrite a 500-line file takes 5-6 seconds and costs 10-15x more per token than a specialized model. At scale, your agent spends more time and money on file I/O than on reasoning.

Search-and-replace is fast but brittle. It breaks on files with repetitive structures, nested blocks, or when the surrounding context has shifted since the edit was generated. Morph's benchmarks show 84-96% success rates across frontier models, with 2-3.5x more retry turns needed on failures.

Diff/patch works for simple line additions but fails on non-contiguous edits, refactors that move code between functions, or edits that span multiple scopes.

Fast Apply exists because code merging is a well-defined enough task that a small, purpose-trained model beats general-purpose LLMs at it.

How Fast Apply Works

The workflow has four steps. Your coding agent already does step 1 and 2. Fast Apply handles step 3.

1. Agent generates edit

Your planning model (Claude, GPT, etc.) decides what to change and outputs a lazy edit snippet with '// ... existing code ...' markers for unchanged sections.

2. Call Fast Apply API

Send the original file, the edit snippet, and the instruction to the API. One HTTP request. OpenAI-compatible, so no SDK changes needed.

3. Model merges

The 7B model runs on custom CUDA kernels with speculative decoding. It merges the edit into the original file and returns the complete result. 0.8s for a 500-line file.

4. Write to disk

The response contains the full merged file. Diff it against the original for review, or write directly. Streaming supported for real-time preview.

Why a 7B model?

Code merging is a well-defined task: take original + edit, produce merged output. It does not require world knowledge, multi-step reasoning, or long-horizon planning. A smaller model trained specifically on this task runs 100x faster than a general-purpose model and achieves higher accuracy on the specific task, because it does not waste capacity on capabilities the merge step does not need.

Models & Pricing

ModelSpeedInput $/1MOutput $/1MBest For
morph-v3-fast10,500+ tok/s$0.80$1.20Real-time edits, high throughput
morph-v3-large5,000+ tok/s$0.90$1.90Complex multi-scope changes
auto5,000-10,500 tok/sVariesVariesRecommended default

Both models support 262K token context windows. The auto model routes between morph-v3-fast and morph-v3-large based on edit complexity. Use auto unless you have a specific latency requirement.

Cost comparison

Fast Apply is cheaper per token and uses fewer tokens per edit. The combined savings are significant at scale.

Fast ApplyClaude SonnetGPT-4o
Price (input/1M)$0.80$15.00$10.00
Speed10,500 tok/s~80 tok/s~100 tok/s
Tokens per edit700-1,4003,500-4,5003,500-4,500
500-line file0.8s5-6s4-5s
Merge accuracy98%95%92%

API Reference

One endpoint. OpenAI-compatible. No new SDK required.

Endpoint

POST https://api.morphllm.com/v1/chat/completions

Authentication

Bearer token in the Authorization header. Get your API key from the dashboard.

Request body

ParameterTypeRequiredDescription
modelstringYesmorph-v3-fast, morph-v3-large, or auto
messagesarrayYesSingle user message with XML-formatted content
streambooleanNoEnable streaming (default: false)
max_tokensintegerNoMaximum output tokens
temperaturenumberNoSampling temperature (default: 0)

Response

Standard OpenAI chat completion response. The merged code is in response.choices[0].message.content. Token usage is reported in response.usage.

Response shape

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "morph-v3-fast",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "// The complete merged file content"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 847,
    "completion_tokens": 312,
    "total_tokens": 1159
  }
}

Integration Examples

TypeScript (OpenAI SDK)

TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MORPH_API_KEY,
  baseURL: "https://api.morphllm.com/v1",
});

const originalCode = `function greet(name) {
  return "Hello " + name;
}`;

const editSnippet = `function greet(name: string): string {
  // ... existing code ...
}`;

const instruction = "Add TypeScript type annotations";

const response = await client.chat.completions.create({
  model: "morph-v3-fast",
  messages: [{
    role: "user",
    content: `<instruction>${instruction}</instruction>
<code>${originalCode}</code>
<update>${editSnippet}</update>`,
  }],
});

// Result: the complete merged file
const mergedCode = response.choices[0].message.content;
// function greet(name: string): string {
//   return "Hello " + name;
// }

Python

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.morphllm.com/v1"
)

original_code = """def process(items):
    result = []
    for item in items:
        result.append(item.upper())
    return result"""

edit_snippet = """def process(items: list[str]) -> list[str]:
    # ... existing code ...
    return [item.upper() for item in items]"""

instruction = "Add type hints and use list comprehension"

response = client.chat.completions.create(
    model="morph-v3-fast",
    messages=[{
        "role": "user",
        "content": f"<instruction>{instruction}</instruction>\n"
                   f"<code>{original_code}</code>\n"
                   f"<update>{edit_snippet}</update>"
    }]
)

merged = response.choices[0].message.content

curl

curl

curl -X POST "https://api.morphllm.com/v1/chat/completions" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "morph-v3-fast",
    "messages": [{
      "role": "user",
      "content": "<instruction>Add error handling</instruction>\n<code>function divide(a, b) {\n  return a / b;\n}</code>\n<update>function divide(a, b) {\n  if (b === 0) throw new Error(\"Division by zero\");\n  // ... existing code ...\n}</update>"
    }],
    "stream": false
  }'

Streaming

Streaming (TypeScript)

const stream = await client.chat.completions.create({
  model: "morph-v3-fast",
  messages: [{
    role: "user",
    content: `<instruction>${instruction}</instruction>
<code>${originalCode}</code>
<update>${editSnippet}</update>`,
  }],
  stream: true,
});

// Process chunks as they arrive
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

The Edit Format

Fast Apply accepts three XML-style tags in the message content. This is the same format most coding agents already use when generating lazy edits.

TagRequiredDescription
<instruction>OptionalDescription of the change. Should be model-generated, not hardcoded.
<code>YesThe complete original file content.
<update>YesThe edit snippet with // ... existing code ... markers for unchanged sections.

The existing code marker

The key convention is // ... existing code .... This marker tells Fast Apply "keep everything that was here in the original." The edit only includes the lines that changed plus these markers.

Example: adding error handling to a function

// Original file (goes in <code> tag)
export async function fetchUser(id: string) {
  const response = await fetch(`/api/users/${id}`);
  const data = await response.json();
  return data;
}

export async function deleteUser(id: string) {
  await fetch(`/api/users/${id}`, { method: "DELETE" });
}
Update
// Edit snippet (goes in <update> tag)
export async function fetchUser(id: string) {
  const response = await fetch(`/api/users/${id}`);
  if (!response.ok) {
    throw new Error(`Failed to fetch user: ${response.status}`);
  }
  // ... existing code ...
}

// ... existing code ...

The merged output will contain fetchUser with the new error handling and the original const data line preserved, plus deleteUser unchanged. The markers handle both "keep the rest of this function" and "keep the rest of this file."

Critical rule

Always include // ... existing code ... markers for unchanged sections. Omitting the marker tells the model to delete that section. This is intentional: it lets you remove code by simply not including it, but it means you must always mark sections you want to keep.

Benchmarks

Speed: tokens per second

Modeltok/s500-line file
Morph Fast Apply10,5000.8s
Llama 3.2 70B (Cerebras)2,600~3s
Gemini 2.5 Flash275~25s
GPT-4o90~75s
Claude Sonnet80~85s

Accuracy: merge success rate

ModelAccuracy
Morph (v3-large)98%
Claude 3.794%
GPT-4o92%
Claude 3.5 Sonnet91%
Qwen 2.5 Coder 32B87%
Gemini 2.0 Flash85%
GPT-4o-mini82%

Fast Apply vs. search-and-replace

When paired with five frontier planning models, Fast Apply achieves 100% merge success. Search-and-replace with the same models achieves 84-96% success and requires 2-3.5x more retry turns on failures.

Where search-and-replace fails

Repetitive code structures (multiple similar functions), nested blocks (if/else inside try/catch inside loops), overlapping edit regions (two edits that touch the same lines), and context drift (file changed between edit generation and application). Fast Apply handles all of these because it processes the full file semantically, not by string matching.

Architecture

Fast Apply is a 7B parameter model. The merge task is well-defined: given an original file and an edit snippet, produce the merged output. This bounded scope means a smaller model trained specifically on code merging outperforms general-purpose models that are 10-100x larger.

Custom CUDA kernels

Purpose-built inference kernels optimized for the merge operation. Not a generic inference engine.

Speculative decoding

Predicts multiple tokens ahead and validates in parallel. This is why 10,500 tok/s is possible with a 7B model.

262K context

Handles files up to 262K tokens. Large enough for any single-file edit in practice.

Where Fast Apply fits in the stack

Morph provides three specialized models for coding agent infrastructure. Each handles a different mechanical task so the planning model can focus on reasoning.

ModelTaskWhat it replaces
Fast ApplyCode mergingFull-file rewrites, search-and-replace, diff/patch
WarpGrepCode searchgrep/ripgrep with manual result filtering
EmbeddingsCode retrievalStatic vector search pipelines

Use Cases

Building a coding agent

If you are building an AI coding agent or IDE plugin, Fast Apply replaces the file-writing layer. Your planning model generates edits in the lazy format (which is the default output format of Claude, GPT, and most coding-oriented models). Fast Apply merges them into the target file. This separation lets you use the best planning model for reasoning and a specialized model for the mechanical merge.

Replacing full-file rewrites

If your agent currently asks the LLM to rewrite the entire file for every edit, Fast Apply cuts token usage by 50-60% and latency by 90%+. A 1,000-line file takes 1.3 seconds instead of 10-12 seconds.

Replacing search-and-replace

If your agent uses search-and-replace to apply edits, Fast Apply eliminates the retry loops caused by context drift and ambiguous matches. The 100% merge success rate (across five frontier models) vs. 84-96% for search-and-replace means fewer failed edits and less wasted compute.

MCP tool in Claude Code or Cursor

Fast Apply is available as an MCP tool that drops into Claude Code, Cursor, and any MCP-compatible environment. The agent calls it as a tool. No code changes to your agent required.

Getting Started

Three steps. Under five minutes.

1. Get an API key

Sign up at morphllm.com/dashboard/api-keys. The free tier includes 250K credits ($2.50 worth).

2. Install the SDK

Install

# Python
pip install openai

# TypeScript / Node.js
npm install openai

3. Make your first call

Test it

curl -X POST "https://api.morphllm.com/v1/chat/completions" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "morph-v3-fast",
    "messages": [{
      "role": "user",
      "content": "<instruction>Add a return type annotation</instruction>\n<code>function add(a: number, b: number) {\n  return a + b;\n}</code>\n<update>function add(a: number, b: number): number {\n  // ... existing code ...\n}</update>"
    }]
  }'

4. Try the playground

Test edits interactively at morphllm.com/dashboard/playground/apply before integrating into your agent.

Frequently Asked Questions

What is Fast Apply?

A 7B model trained on code merging. It takes an original file, an edit snippet with "existing code" markers, and an instruction. It returns the complete merged file at 10,500 tokens per second with 98% accuracy.

How much does it cost?

morph-v3-fast: $0.80/M input, $1.20/M output. morph-v3-large: $0.90/M input, $1.90/M output. The free tier includes $2.50 worth of credits. Paid plans start at $20/month with $20 in credits included.

Which model should I use?

Use auto (recommended). It routes between morph-v3-fast and morph-v3-large based on edit complexity. Use morph-v3-fast if you need consistent sub-second latency. Use morph-v3-large for complex multi-scope refactors where accuracy matters more than speed.

Do I need to change my SDK?

No. The API is OpenAI-compatible. If you already use the OpenAI SDK (Python or TypeScript), just change the baseURL and apiKey. No new dependencies.

What languages does it support?

Fast Apply works with any programming language. It is trained on code merging across all major languages: Python, TypeScript, JavaScript, Go, Rust, Java, C++, Ruby, PHP, Swift, Kotlin, and more.

Does it support streaming?

Yes. Set stream: true. Useful for showing real-time diffs as the merged file streams back, especially on files over 500 lines.

What if the merge fails?

At 98% accuracy, most edits merge cleanly on the first attempt. When a merge does fail, it is typically because the edit snippet was ambiguous or conflicted with the original. The response will still contain a best-effort merge. You can retry with a more specific instruction or fall back to the morph-v3-large model.

Is there an MCP integration?

Yes. Fast Apply is available as an MCP (Model Context Protocol) tool that works with Claude Code, Cursor, and other MCP-compatible environments. See the docs for setup instructions.

Start merging code at 10,500 tok/s

Free tier included. OpenAI-compatible API. No new SDK required. Get an API key and make your first call in under five minutes.