Zed's code extraction focuses on extracting code from the buffer, code blocks, and metadata from LLM output. Here's how they do it, illustrated with TypeScript examples:
Concept: Before sending to the LLM, Zed extracts relevant code snippets around the user's cursor. This is done by using the Buffer
's API to get text and position information.
TypeScript Analogy:
interface TextRange {
start: number;
end: number;
}
function extractCodeSnippet(
bufferText: string,
selectionRange: TextRange
): string {
const codeBefore = bufferText.slice(0, selectionRange.start);
const codeAfter = bufferText.slice(selectionRange.end);
return `${codeBefore}\\\\n<selection>\\\\n${codeAfter}`;
}
const bufferText = `
function add(a: number, b: number) {
console.log("start");
const sum = a + b;
return sum;
}
console.log("end")
`;
const selectionRange = { start: 45, end: 60 };
const codeSnippet = extractCodeSnippet(bufferText, selectionRange);
// `codeSnippet` will be:
// `
// function add(a: number, b: number) {
// console.log("start");
// <selection>
// return sum;
// }
// console.log("end")
// `
Zed Implementation: This is reflected in BufferCodegen
's logic, where the code extracts the text around the selection range and inserts it into the LanguageModelRequest
.
Concept: Zed parses the buffer using tree-sitter to understand the structure of code. This is particularly important when the user has a code block selected, because it allows extracting the text from the code block instead of sending the surrounding fence (e.g., ```).
TypeScript Analogy:
interface CodeBlockNode {
start: number;
end: number;
kind: string;
children?: CodeBlockNode[];
}
function extractCodeFromBlock(bufferText: string, offset: number, rootNode: CodeBlockNode): string | null {
if(rootNode.kind === "fenced_code_block") {
for (const child of rootNode.children || []) {
if (child.kind === "code_fence_content" && child.start <= offset && offset <= child.end) {
return bufferText.slice(child.start, child.end);
}
}
}
return null;
}
const bufferText = `
\\\\`\\\\`\\\\`typescript
function add(a: number, b: number): number {
return a + b;
}
\\\\`\\\\`\\\\`
`;
const rootNode : CodeBlockNode = {
kind: "fenced_code_block",
start: 0,
end: 100,
children: [{
kind: "code_fence_content",
start: 19,
end: 85
}]
}
const code = extractCodeFromBlock(bufferText, 40, rootNode);
// code will be:
// `
// function add(a: number, b: number): number {
// return a + b;
// }
// `
Zed Implementation: This logic is used in find_surrounding_code_block
in crates/assistant/src/assistant_panel.rs
and in crates/assistant2/src/buffer_codegen.rs
.
Concept: When the LLM indicates a tool should be used, Zed extracts the tool name and input parameters from the completion.
TypeScript Analogy:
interface ToolUseEvent {
toolName: string;
input: string;
}
function extractToolUse(llmOutput: string): ToolUseEvent | null {
const toolUseRegex = /<tool name="(.*?)".*?input="(.*?)"/g;
const match = toolUseRegex.exec(llmOutput);
if (match) {
return {
toolName: match[1],
input: match[2],
}
} else {
return null;
}
}
const llmOutput = "This is a text with <tool name='search' input='hello world'/>";
const toolUse = extractToolUse(llmOutput);
// toolUse will be: { toolName: 'search', input: 'hello world' }
Zed Implementation: This process is done in crates/assistant2/src/thread.rs
and crates/language_models/src/provider/anthropic.rs
.
Concept: When performing a code edit, Zed expects a JSON-like output from the LLM, which it then deserializes using serde.
TypeScript Analogy:
interface EditInstruction {
path: string;
oldText: string;
newText: string;
operation: "insert" | "replace" | "delete";
}
function parseEditInstructions(llmOutput: string): EditInstruction[] {
try {
return JSON.parse(llmOutput);
} catch (e) {
console.error("Failed to parse", llmOutput);
return [];
}
}
const llmOutput = `
[
{
"path": "src/main.rs",
"oldText": "console.log",
"newText": "console.info",
"operation": "replace"
}
]
`;
const editInstructions = parseEditInstructions(llmOutput);
// `editInstructions` will be:
// `
// [
// {
// path: 'src/main.rs',
// oldText: 'console.log',
// newText: 'console.info',
// operation: 'replace'
// }
// ]
// `
Zed Implementation: This logic can be found in crates/assistant2/src/buffer_codegen.rs
.
Concept: Zed also uses XML-like tags to parse the output from LLMs to provide information about the output, like if it contains a patch or a title.
TypeScript Analogy:
interface XmlTag {
kind: string;
range: { start: number; end: number };
isOpenTag: boolean;
}
function parseXmlTags(text: string): XmlTag[] {
const xmlTagRegex = /<(\\\\/)?(\\\\w+)(.*?)>/g;
const tags = [];
let match;
while ((match = xmlTagRegex.exec(text)) !== null) {
const isOpenTag = match[1] !== "/";
tags.push({
kind: match[2],
range: { start: match.index, end: match.index + match[0].length },
isOpenTag,
});
}
return tags;
}
const text = `
<patch>
<title>Refactor foo</title>
<edit>
<path>src/main.rs</path>
<oldText>console.log</oldText>
<newText>console.info</newText>
<operation>replace</operation>
</edit>
</patch>
`;
const tags = parseXmlTags(text);
// tags will contain objects representing the different tags in the xml-like text.
Zed Implementation: The parsing of these XML tags is found in crates/assistant2/src/context.rs
.
Zed uses a PromptBuilder
to dynamically create prompts. Here's how prompt engineering is happening:
MultiBuffer
and Range<Offset>
and converted into text.user_prompt
is included in the generated prompt, allowing the user to guide the LLM's behavior. The user prompt can be just the key binding that is pressed or a text written in the editor.crates/assistant2/src/thread.rs
using the summarize
function.function buildEditPrompt(
codeBefore: string,
userPrompt: string,
codeAfter: string,
language?: string
): string {
let language_line = ""
if (language) {
language_line = `\\`\\`\\`${language}\\n`
}
return `
Given the following code snippet, perform the requested change.
${language_line}
${codeBefore}
${language_line.trim()}
User: ${userPrompt}
${language_line}
${codeAfter}
${language_line.trim()}
`;
}
const prompt = buildEditPrompt(
`
function add(a: number, b: number) {
console.log("start");`,
"change console.log to console.info",
`
return sum;
}
`,
"typescript"
);
// prompt will be a string that combines the context with the user's instruction.
summarize
function in crates/assistant2/src/thread.rs
, which adds a system prompt that instructs the LLM to summarize the conversation in a few words.