Fast Apply: Code Merging at 10,500 tok/s for AI Agents

Coding agents generate edits. Applying those edits to files reliably, quickly, and cheaply is the infrastructure gap that Fast Apply fills. A 7B model trained on code merging, running on custom CUDA kernels at 10,500 tokens per second.

10,500

Tokens per second

98%

Merge accuracy

$0.80

Per 1M input tokens

0.8s

500-line file merge

The Problem: Applying Edits Is Harder Than Generating Them

Every coding agent faces the same bottleneck. The planning model generates an edit. Now you need to apply it to the file. There are three approaches, and they all have problems.

Approach	Speed	Accuracy	Cost
Full-file rewrite	80-100 tok/s	High (but slow)	3,500-4,500 tokens/edit
Search-and-replace	Instant	84-96% (brittle)	Minimal
Diff / patch	Instant	Fails on complex edits	Minimal
Fast Apply	10,500 tok/s	98-100%	700-1,400 tokens/edit

Full-file rewrites are accurate but slow and expensive. Asking Claude Sonnet to rewrite a 500-line file takes 5-6 seconds and costs 10-15x more per token than a specialized model. At scale, your agent spends more time and money on file I/O than on reasoning.

Search-and-replace is fast but brittle. It breaks on files with repetitive structures, nested blocks, or when the surrounding context has shifted since the edit was generated. Morph's benchmarks show 84-96% success rates across frontier models, with 2-3.5x more retry turns needed on failures.

Diff/patch works for simple line additions but fails on non-contiguous edits, refactors that move code between functions, or edits that span multiple scopes.

Fast Apply exists because code merging is a well-defined enough task that a small, purpose-trained model beats general-purpose LLMs at it.

How Fast Apply Works

The workflow has four steps. Your coding agent already does step 1 and 2. Fast Apply handles step 3.

1. Agent generates edit

Your planning model (Claude, GPT, etc.) decides what to change and outputs a lazy edit snippet with '// ... existing code ...' markers for unchanged sections.

2. Call Fast Apply API

Send the original file, the edit snippet, and the instruction to the API. One HTTP request. OpenAI-compatible, so no SDK changes needed.

3. Model merges

The 7B model runs on custom CUDA kernels with speculative decoding. It merges the edit into the original file and returns the complete result. 0.8s for a 500-line file.

4. Write to disk

The response contains the full merged file. Diff it against the original for review, or write directly. Streaming supported for real-time preview.

Why a 7B model?

Code merging is a well-defined task: take original + edit, produce merged output. It does not require world knowledge, multi-step reasoning, or long-horizon planning. A smaller model trained specifically on this task runs 100x faster than a general-purpose model and achieves higher accuracy on the specific task, because it does not waste capacity on capabilities the merge step does not need.

Models & Pricing

Model	Speed	Input $/1M	Output $/1M	Best For
morph-v3-fast	10,500+ tok/s	$0.80	$1.20	Real-time edits, high throughput
morph-v3-large	5,000+ tok/s	$0.90	$1.90	Complex multi-scope changes
auto	5,000-10,500 tok/s	Varies	Varies	Recommended default

Both models support 262K token context windows. The auto model routes between morph-v3-fast and morph-v3-large based on edit complexity. Use auto unless you have a specific latency requirement.

Cost comparison

Fast Apply is cheaper per token and uses fewer tokens per edit. The combined savings are significant at scale.

	Fast Apply	Claude Sonnet	GPT-5.4
Price (input/1M)	$0.80	$15.00	$10.00
Speed	10,500 tok/s	~80 tok/s	~100 tok/s
Tokens per edit	700-1,400	3,500-4,500	3,500-4,500
500-line file	0.8s	5-6s	4-5s
Merge accuracy	98%	95%	92%

API Reference

One endpoint. OpenAI-compatible. No new SDK required.

Endpoint

POST https://api.morphllm.com/v1/chat/completions

Authentication

Bearer token in the Authorization header. Get your API key from the dashboard.

Request body

Parameter	Type	Required	Description
model	string	Yes	morph-v3-fast, morph-v3-large, or auto
messages	array	Yes	Single user message with XML-formatted content
stream	boolean	No	Enable streaming (default: false)
max_tokens	integer	No	Maximum output tokens
temperature	number	No	Sampling temperature (default: 0)

Response

Standard OpenAI chat completion response. The merged code is in response.choices[0].message.content. Token usage is reported in response.usage.

Response shape

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "morph-v3-fast",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "// The complete merged file content"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 847,
    "completion_tokens": 312,
    "total_tokens": 1159
  }
}

Integration Examples

TypeScript (OpenAI SDK)

TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MORPH_API_KEY,
  baseURL: "https://api.morphllm.com/v1",
});

const originalCode = `function greet(name) {
  return "Hello " + name;
}`;

const editSnippet = `function greet(name: string): string {
  // ... existing code ...
}`;

const instruction = "Add TypeScript type annotations";

const response = await client.chat.completions.create({
  model: "morph-v3-fast",
  messages: [{
    role: "user",
    content: `<instruction>${instruction}</instruction>
<code>${originalCode}</code>
<update>${editSnippet}</update>`,
  }],
});

// Result: the complete merged file
const mergedCode = response.choices[0].message.content;
// function greet(name: string): string {
//   return "Hello " + name;
// }

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.morphllm.com/v1"
)

original_code = """def process(items):
    result = []
    for item in items:
        result.append(item.upper())
    return result"""

edit_snippet = """def process(items: list[str]) -> list[str]:
    # ... existing code ...
    return [item.upper() for item in items]"""

instruction = "Add type hints and use list comprehension"

response = client.chat.completions.create(
    model="morph-v3-fast",
    messages=[{
        "role": "user",
        "content": f"<instruction>{instruction}</instruction>\n"
                   f"<code>{original_code}</code>\n"
                   f"<update>{edit_snippet}</update>"
    }]
)

merged = response.choices[0].message.content

curl

curl -X POST "https://api.morphllm.com/v1/chat/completions" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "morph-v3-fast",
    "messages": [{
      "role": "user",
      "content": "<instruction>Add error handling</instruction>\n<code>function divide(a, b) {\n  return a / b;\n}</code>\n<update>function divide(a, b) {\n  if (b === 0) throw new Error(\"Division by zero\");\n  // ... existing code ...\n}</update>"
    }],
    "stream": false
  }'

Streaming

Streaming (TypeScript)

const stream = await client.chat.completions.create({
  model: "morph-v3-fast",
  messages: [{
    role: "user",
    content: `<instruction>${instruction}</instruction>
<code>${originalCode}</code>
<update>${editSnippet}</update>`,
  }],
  stream: true,
});

// Process chunks as they arrive
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

The Edit Format

Fast Apply accepts three XML-style tags in the message content. This is the same format most coding agents already use when generating lazy edits.

Tag	Required	Description
<instruction>	Optional	Description of the change. Should be model-generated, not hardcoded.
<code>	Yes	The complete original file content.
<update>	Yes	The edit snippet with // ... existing code ... markers for unchanged sections.

The existing code marker

The key convention is // ... existing code .... This marker tells Fast Apply "keep everything that was here in the original." The edit only includes the lines that changed plus these markers.

Example: adding error handling to a function

// Original file (goes in <code> tag)
export async function fetchUser(id: string) {
  const response = await fetch(`/api/users/${id}`);
  const data = await response.json();
  return data;
}

export async function deleteUser(id: string) {
  await fetch(`/api/users/${id}`, { method: "DELETE" });
}

Update

// Edit snippet (goes in <update> tag)
export async function fetchUser(id: string) {
  const response = await fetch(`/api/users/${id}`);
  if (!response.ok) {
    throw new Error(`Failed to fetch user: ${response.status}`);
  }
  // ... existing code ...
}

// ... existing code ...

The merged output will contain fetchUser with the new error handling and the original const data line preserved, plus deleteUser unchanged. The markers handle both "keep the rest of this function" and "keep the rest of this file."

Critical rule

Always include // ... existing code ... markers for unchanged sections. Omitting the marker tells the model to delete that section. This is intentional: it lets you remove code by simply not including it, but it means you must always mark sections you want to keep.

Benchmarks

Speed: tokens per second

Model	tok/s	500-line file
Morph Fast Apply	10,500	0.8s
Llama 3.2 70B (Cerebras)	2,600	~3s
Gemini 2.5 Flash	275	~25s
GPT-5.4	90	~75s
Claude Sonnet	80	~85s

Accuracy: merge success rate

Model	Accuracy
Morph (v3-large)	98%
Claude 3.7	94%
GPT-5.4	92%
Claude 3.5 Sonnet	91%
Qwen 2.5 Coder 32B	87%
Gemini 2.0 Flash	85%
GPT-5.4-mini	82%

Fast Apply vs. search-and-replace

When paired with five frontier planning models, Fast Apply achieves 100% merge success. Search-and-replace with the same models achieves 84-96% success and requires 2-3.5x more retry turns on failures.

Where search-and-replace fails

Repetitive code structures (multiple similar functions), nested blocks (if/else inside try/catch inside loops), overlapping edit regions (two edits that touch the same lines), and context drift (file changed between edit generation and application). Fast Apply handles all of these because it processes the full file semantically, not by string matching.

Architecture

Fast Apply is a 7B parameter model. The merge task is well-defined: given an original file and an edit snippet, produce the merged output. This bounded scope means a smaller model trained specifically on code merging outperforms general-purpose models that are 10-100x larger.

Custom CUDA kernels

Purpose-built inference kernels optimized for the merge operation. Not a generic inference engine.

Speculative decoding

Predicts multiple tokens ahead and validates in parallel. This is why 10,500 tok/s is possible with a 7B model.

262K context

Handles files up to 262K tokens. Large enough for any single-file edit in practice.

Where Fast Apply fits in the stack

Morph provides three specialized models for coding agent infrastructure. Each handles a different mechanical task so the planning model can focus on reasoning.

Model	Task	What it replaces
Fast Apply	Code merging	Full-file rewrites, search-and-replace, diff/patch
WarpGrep	Code search	grep/ripgrep with manual result filtering
Embeddings	Code retrieval	Static vector search pipelines

Use Cases

Building a coding agent

If you are building an AI coding agent or IDE plugin, Fast Apply replaces the file-writing layer. Your planning model generates edits in the lazy format (which is the default output format of Claude, GPT, and most coding-oriented models). Fast Apply merges them into the target file. This separation lets you use the best planning model for reasoning and a specialized model for the mechanical merge.

Replacing full-file rewrites

If your agent currently asks the LLM to rewrite the entire file for every edit, Fast Apply cuts token usage by 50-60% and latency by 90%+. A 1,000-line file takes 1.3 seconds instead of 10-12 seconds.

Replacing search-and-replace

If your agent uses search-and-replace to apply edits, Fast Apply eliminates the retry loops caused by context drift and ambiguous matches. The 100% merge success rate (across five frontier models) vs. 84-96% for search-and-replace means fewer failed edits and less wasted compute.

MCP tool in Claude Code or Cursor

Fast Apply is available as an MCP tool that drops into Claude Code, Cursor, and any MCP-compatible environment. The agent calls it as a tool. No code changes to your agent required.

Getting Started

Three steps. Under five minutes.

1. Get an API key

2. Install the SDK

Install

# Python
pip install openai

# TypeScript / Node.js
npm install openai

3. Make your first call

Test it

curl -X POST "https://api.morphllm.com/v1/chat/completions" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "morph-v3-fast",
    "messages": [{
      "role": "user",
      "content": "<instruction>Add a return type annotation</instruction>\n<code>function add(a: number, b: number) {\n  return a + b;\n}</code>\n<update>function add(a: number, b: number): number {\n  // ... existing code ...\n}</update>"
    }]
  }'

4. Try the playground

Test edits interactively at morphllm.com/dashboard/playground/apply before integrating into your agent.

Frequently Asked Questions

What is Fast Apply?

A 7B model trained on code merging. It takes an original file, an edit snippet with "existing code" markers, and an instruction. It returns the complete merged file at 10,500 tokens per second with 98% accuracy.

How much does it cost?

morph-v3-fast: $0.80/M input, $1.20/M output. morph-v3-large: $0.90/M input, $1.90/M output. The free tier includes $2.50 worth of credits. Paid plans start at $20/month with $20 in credits included.

Which model should I use?

Use auto (recommended). It routes between morph-v3-fast and morph-v3-large based on edit complexity. Use morph-v3-fast if you need consistent sub-second latency. Use morph-v3-large for complex multi-scope refactors where accuracy matters more than speed.

Do I need to change my SDK?

No. The API is OpenAI-compatible. If you already use the OpenAI SDK (Python or TypeScript), just change the baseURL and apiKey. No new dependencies.

What languages does it support?

Fast Apply works with any programming language. It is trained on code merging across all major languages: Python, TypeScript, JavaScript, Go, Rust, Java, C++, Ruby, PHP, Swift, Kotlin, and more.

Does it support streaming?

Yes. Set stream: true. Useful for showing real-time diffs as the merged file streams back, especially on files over 500 lines.

What if the merge fails?

At 98% accuracy, most edits merge cleanly on the first attempt. When a merge does fail, it is typically because the edit snippet was ambiguous or conflicted with the original. The response will still contain a best-effort merge. You can retry with a more specific instruction or fall back to the morph-v3-large model.

Is there an MCP integration?

Yes. Fast Apply is available as an MCP (Model Context Protocol) tool that works with Claude Code, Cursor, and other MCP-compatible environments. See the docs for setup instructions.

Start merging code at 10,500 tok/s

Free tier included. OpenAI-compatible API. No new SDK required. Get an API key and make your first call in under five minutes.

Get API Key

Try Playground

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Fast Apply: Code Merging at 10,500 tok/s for AI Coding Agents

The Problem: Applying Edits Is Harder Than Generating Them

How Fast Apply Works

1. Agent generates edit

2. Call Fast Apply API

3. Model merges

4. Write to disk

Why a 7B model?

Models & Pricing

Cost comparison

API Reference

Endpoint

Authentication

Request body

Response

Response shape

Integration Examples

TypeScript (OpenAI SDK)

TypeScript

Python

Python

curl

curl

Streaming

Streaming (TypeScript)

The Edit Format

The existing code marker

Example: adding error handling to a function

Critical rule

Benchmarks

Speed: tokens per second

Accuracy: merge success rate

Fast Apply vs. search-and-replace

Where search-and-replace fails

Architecture

Custom CUDA kernels

Speculative decoding

262K context

Where Fast Apply fits in the stack

Use Cases

Building a coding agent

Replacing full-file rewrites

Replacing search-and-replace

MCP tool in Claude Code or Cursor

Getting Started

1. Get an API key

2. Install the SDK

Install

3. Make your first call

Test it

4. Try the playground

Frequently Asked Questions

What is Fast Apply?

How much does it cost?

Which model should I use?

Do I need to change my SDK?

What languages does it support?

Does it support streaming?

What if the merge fails?

Is there an MCP integration?

Start merging code at 10,500 tok/s