Coding agents generate edits. Applying those edits to files reliably, quickly, and cheaply is the infrastructure gap that Fast Apply fills. A 7B model trained on code merging, running on custom CUDA kernels at 10,500 tokens per second.
The Problem: Applying Edits Is Harder Than Generating Them
Every coding agent faces the same bottleneck. The planning model generates an edit. Now you need to apply it to the file. There are three approaches, and they all have problems.
| Approach | Speed | Accuracy | Cost |
|---|---|---|---|
| Full-file rewrite | 80-100 tok/s | High (but slow) | 3,500-4,500 tokens/edit |
| Search-and-replace | Instant | 84-96% (brittle) | Minimal |
| Diff / patch | Instant | Fails on complex edits | Minimal |
| Fast Apply | 10,500 tok/s | 98-100% | 700-1,400 tokens/edit |
Full-file rewrites are accurate but slow and expensive. Asking Claude Sonnet to rewrite a 500-line file takes 5-6 seconds and costs 10-15x more per token than a specialized model. At scale, your agent spends more time and money on file I/O than on reasoning.
Search-and-replace is fast but brittle. It breaks on files with repetitive structures, nested blocks, or when the surrounding context has shifted since the edit was generated. Morph's benchmarks show 84-96% success rates across frontier models, with 2-3.5x more retry turns needed on failures.
Diff/patch works for simple line additions but fails on non-contiguous edits, refactors that move code between functions, or edits that span multiple scopes.
Fast Apply exists because code merging is a well-defined enough task that a small, purpose-trained model beats general-purpose LLMs at it.
How Fast Apply Works
The workflow has four steps. Your coding agent already does step 1 and 2. Fast Apply handles step 3.
1. Agent generates edit
Your planning model (Claude, GPT, etc.) decides what to change and outputs a lazy edit snippet with '// ... existing code ...' markers for unchanged sections.
2. Call Fast Apply API
Send the original file, the edit snippet, and the instruction to the API. One HTTP request. OpenAI-compatible, so no SDK changes needed.
3. Model merges
The 7B model runs on custom CUDA kernels with speculative decoding. It merges the edit into the original file and returns the complete result. 0.8s for a 500-line file.
4. Write to disk
The response contains the full merged file. Diff it against the original for review, or write directly. Streaming supported for real-time preview.
Why a 7B model?
Code merging is a well-defined task: take original + edit, produce merged output. It does not require world knowledge, multi-step reasoning, or long-horizon planning. A smaller model trained specifically on this task runs 100x faster than a general-purpose model and achieves higher accuracy on the specific task, because it does not waste capacity on capabilities the merge step does not need.
Models & Pricing
| Model | Speed | Input $/1M | Output $/1M | Best For |
|---|---|---|---|---|
| morph-v3-fast | 10,500+ tok/s | $0.80 | $1.20 | Real-time edits, high throughput |
| morph-v3-large | 5,000+ tok/s | $0.90 | $1.90 | Complex multi-scope changes |
| auto | 5,000-10,500 tok/s | Varies | Varies | Recommended default |
Both models support 262K token context windows. The auto model routes between morph-v3-fast and morph-v3-large based on edit complexity. Use auto unless you have a specific latency requirement.
Cost comparison
Fast Apply is cheaper per token and uses fewer tokens per edit. The combined savings are significant at scale.
| Fast Apply | Claude Sonnet | GPT-4o | |
|---|---|---|---|
| Price (input/1M) | $0.80 | $15.00 | $10.00 |
| Speed | 10,500 tok/s | ~80 tok/s | ~100 tok/s |
| Tokens per edit | 700-1,400 | 3,500-4,500 | 3,500-4,500 |
| 500-line file | 0.8s | 5-6s | 4-5s |
| Merge accuracy | 98% | 95% | 92% |
API Reference
One endpoint. OpenAI-compatible. No new SDK required.
Endpoint
POST https://api.morphllm.com/v1/chat/completions
Authentication
Bearer token in the Authorization header. Get your API key from the dashboard.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | morph-v3-fast, morph-v3-large, or auto |
| messages | array | Yes | Single user message with XML-formatted content |
| stream | boolean | No | Enable streaming (default: false) |
| max_tokens | integer | No | Maximum output tokens |
| temperature | number | No | Sampling temperature (default: 0) |
Response
Standard OpenAI chat completion response. The merged code is in response.choices[0].message.content. Token usage is reported in response.usage.
Response shape
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "morph-v3-fast",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "// The complete merged file content"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 847,
"completion_tokens": 312,
"total_tokens": 1159
}
}Integration Examples
TypeScript (OpenAI SDK)
TypeScript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.MORPH_API_KEY,
baseURL: "https://api.morphllm.com/v1",
});
const originalCode = `function greet(name) {
return "Hello " + name;
}`;
const editSnippet = `function greet(name: string): string {
// ... existing code ...
}`;
const instruction = "Add TypeScript type annotations";
const response = await client.chat.completions.create({
model: "morph-v3-fast",
messages: [{
role: "user",
content: `<instruction>${instruction}</instruction>
<code>${originalCode}</code>
<update>${editSnippet}</update>`,
}],
});
// Result: the complete merged file
const mergedCode = response.choices[0].message.content;
// function greet(name: string): string {
// return "Hello " + name;
// }Python
Python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.morphllm.com/v1"
)
original_code = """def process(items):
result = []
for item in items:
result.append(item.upper())
return result"""
edit_snippet = """def process(items: list[str]) -> list[str]:
# ... existing code ...
return [item.upper() for item in items]"""
instruction = "Add type hints and use list comprehension"
response = client.chat.completions.create(
model="morph-v3-fast",
messages=[{
"role": "user",
"content": f"<instruction>{instruction}</instruction>\n"
f"<code>{original_code}</code>\n"
f"<update>{edit_snippet}</update>"
}]
)
merged = response.choices[0].message.contentcurl
curl
curl -X POST "https://api.morphllm.com/v1/chat/completions" \
-H "Authorization: Bearer $MORPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "morph-v3-fast",
"messages": [{
"role": "user",
"content": "<instruction>Add error handling</instruction>\n<code>function divide(a, b) {\n return a / b;\n}</code>\n<update>function divide(a, b) {\n if (b === 0) throw new Error(\"Division by zero\");\n // ... existing code ...\n}</update>"
}],
"stream": false
}'Streaming
Streaming (TypeScript)
const stream = await client.chat.completions.create({
model: "morph-v3-fast",
messages: [{
role: "user",
content: `<instruction>${instruction}</instruction>
<code>${originalCode}</code>
<update>${editSnippet}</update>`,
}],
stream: true,
});
// Process chunks as they arrive
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}The Edit Format
Fast Apply accepts three XML-style tags in the message content. This is the same format most coding agents already use when generating lazy edits.
| Tag | Required | Description |
|---|---|---|
| <instruction> | Optional | Description of the change. Should be model-generated, not hardcoded. |
| <code> | Yes | The complete original file content. |
| <update> | Yes | The edit snippet with // ... existing code ... markers for unchanged sections. |
The existing code marker
The key convention is // ... existing code .... This marker tells Fast Apply "keep everything that was here in the original." The edit only includes the lines that changed plus these markers.
Example: adding error handling to a function
// Original file (goes in <code> tag)
export async function fetchUser(id: string) {
const response = await fetch(`/api/users/${id}`);
const data = await response.json();
return data;
}
export async function deleteUser(id: string) {
await fetch(`/api/users/${id}`, { method: "DELETE" });
}// Edit snippet (goes in <update> tag)
export async function fetchUser(id: string) {
const response = await fetch(`/api/users/${id}`);
if (!response.ok) {
throw new Error(`Failed to fetch user: ${response.status}`);
}
// ... existing code ...
}
// ... existing code ...The merged output will contain fetchUser with the new error handling and the original const data line preserved, plus deleteUser unchanged. The markers handle both "keep the rest of this function" and "keep the rest of this file."
Critical rule
Always include // ... existing code ... markers for unchanged sections. Omitting the marker tells the model to delete that section. This is intentional: it lets you remove code by simply not including it, but it means you must always mark sections you want to keep.
Benchmarks
Speed: tokens per second
| Model | tok/s | 500-line file |
|---|---|---|
| Morph Fast Apply | 10,500 | 0.8s |
| Llama 3.2 70B (Cerebras) | 2,600 | ~3s |
| Gemini 2.5 Flash | 275 | ~25s |
| GPT-4o | 90 | ~75s |
| Claude Sonnet | 80 | ~85s |
Accuracy: merge success rate
| Model | Accuracy |
|---|---|
| Morph (v3-large) | 98% |
| Claude 3.7 | 94% |
| GPT-4o | 92% |
| Claude 3.5 Sonnet | 91% |
| Qwen 2.5 Coder 32B | 87% |
| Gemini 2.0 Flash | 85% |
| GPT-4o-mini | 82% |
Fast Apply vs. search-and-replace
When paired with five frontier planning models, Fast Apply achieves 100% merge success. Search-and-replace with the same models achieves 84-96% success and requires 2-3.5x more retry turns on failures.
Where search-and-replace fails
Repetitive code structures (multiple similar functions), nested blocks (if/else inside try/catch inside loops), overlapping edit regions (two edits that touch the same lines), and context drift (file changed between edit generation and application). Fast Apply handles all of these because it processes the full file semantically, not by string matching.
Architecture
Fast Apply is a 7B parameter model. The merge task is well-defined: given an original file and an edit snippet, produce the merged output. This bounded scope means a smaller model trained specifically on code merging outperforms general-purpose models that are 10-100x larger.
Custom CUDA kernels
Purpose-built inference kernels optimized for the merge operation. Not a generic inference engine.
Speculative decoding
Predicts multiple tokens ahead and validates in parallel. This is why 10,500 tok/s is possible with a 7B model.
262K context
Handles files up to 262K tokens. Large enough for any single-file edit in practice.
Where Fast Apply fits in the stack
Morph provides three specialized models for coding agent infrastructure. Each handles a different mechanical task so the planning model can focus on reasoning.
| Model | Task | What it replaces |
|---|---|---|
| Fast Apply | Code merging | Full-file rewrites, search-and-replace, diff/patch |
| WarpGrep | Code search | grep/ripgrep with manual result filtering |
| Embeddings | Code retrieval | Static vector search pipelines |
Use Cases
Building a coding agent
If you are building an AI coding agent or IDE plugin, Fast Apply replaces the file-writing layer. Your planning model generates edits in the lazy format (which is the default output format of Claude, GPT, and most coding-oriented models). Fast Apply merges them into the target file. This separation lets you use the best planning model for reasoning and a specialized model for the mechanical merge.
Replacing full-file rewrites
If your agent currently asks the LLM to rewrite the entire file for every edit, Fast Apply cuts token usage by 50-60% and latency by 90%+. A 1,000-line file takes 1.3 seconds instead of 10-12 seconds.
Replacing search-and-replace
If your agent uses search-and-replace to apply edits, Fast Apply eliminates the retry loops caused by context drift and ambiguous matches. The 100% merge success rate (across five frontier models) vs. 84-96% for search-and-replace means fewer failed edits and less wasted compute.
MCP tool in Claude Code or Cursor
Fast Apply is available as an MCP tool that drops into Claude Code, Cursor, and any MCP-compatible environment. The agent calls it as a tool. No code changes to your agent required.
Getting Started
Three steps. Under five minutes.
1. Get an API key
Sign up at morphllm.com/dashboard/api-keys. The free tier includes 250K credits ($2.50 worth).
2. Install the SDK
Install
# Python
pip install openai
# TypeScript / Node.js
npm install openai3. Make your first call
Test it
curl -X POST "https://api.morphllm.com/v1/chat/completions" \
-H "Authorization: Bearer $MORPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "morph-v3-fast",
"messages": [{
"role": "user",
"content": "<instruction>Add a return type annotation</instruction>\n<code>function add(a: number, b: number) {\n return a + b;\n}</code>\n<update>function add(a: number, b: number): number {\n // ... existing code ...\n}</update>"
}]
}'4. Try the playground
Test edits interactively at morphllm.com/dashboard/playground/apply before integrating into your agent.
Frequently Asked Questions
What is Fast Apply?
A 7B model trained on code merging. It takes an original file, an edit snippet with "existing code" markers, and an instruction. It returns the complete merged file at 10,500 tokens per second with 98% accuracy.
How much does it cost?
morph-v3-fast: $0.80/M input, $1.20/M output. morph-v3-large: $0.90/M input, $1.90/M output. The free tier includes $2.50 worth of credits. Paid plans start at $20/month with $20 in credits included.
Which model should I use?
Use auto (recommended). It routes between morph-v3-fast and morph-v3-large based on edit complexity. Use morph-v3-fast if you need consistent sub-second latency. Use morph-v3-large for complex multi-scope refactors where accuracy matters more than speed.
Do I need to change my SDK?
No. The API is OpenAI-compatible. If you already use the OpenAI SDK (Python or TypeScript), just change the baseURL and apiKey. No new dependencies.
What languages does it support?
Fast Apply works with any programming language. It is trained on code merging across all major languages: Python, TypeScript, JavaScript, Go, Rust, Java, C++, Ruby, PHP, Swift, Kotlin, and more.
Does it support streaming?
Yes. Set stream: true. Useful for showing real-time diffs as the merged file streams back, especially on files over 500 lines.
What if the merge fails?
At 98% accuracy, most edits merge cleanly on the first attempt. When a merge does fail, it is typically because the edit snippet was ambiguous or conflicted with the original. The response will still contain a best-effort merge. You can retry with a more specific instruction or fall back to the morph-v3-large model.
Is there an MCP integration?
Yes. Fast Apply is available as an MCP (Model Context Protocol) tool that works with Claude Code, Cursor, and other MCP-compatible environments. See the docs for setup instructions.
Start merging code at 10,500 tok/s
Free tier included. OpenAI-compatible API. No new SDK required. Get an API key and make your first call in under five minutes.