Zed's code extraction focuses on extracting code from the buffer, code blocks, and metadata from LLM output. Here's how they do it, illustrated with TypeScript examples:
Concept: Before sending to the LLM, Zed extracts relevant code snippets around the user's cursor. This is done by using the Buffer's API to get text and position information.
TypeScript Analogy:
interface TextRange {
start: number;
end: number;
}
function extractCodeSnippet(
bufferText: string,
selectionRange: TextRange
): string {
const codeBefore = bufferText.slice(0, selectionRange.start);
const codeAfter = bufferText.slice(selectionRange.end);
return `${codeBefore}\\\\n<selection>\\\\n${codeAfter}`;
}
const bufferText = `
function add(a: number, b: number) {
console.log("start");
const sum = a + b;
return sum;
}
console.log("end")
`;
const selectionRange = { start: 45, end: 60 };
const codeSnippet = extractCodeSnippet(bufferText, selectionRange);
// `codeSnippet` will be:
// `
// function add(a: number, b: number) {
// console.log("start");
// <selection>
// return sum;
// }
// console.log("end")
// `
Zed Implementation: This is reflected in BufferCodegen's logic, where the code extracts the text around the selection range and inserts it into the LanguageModelRequest.
Concept: Zed parses the buffer using tree-sitter to understand the structure of code. This is particularly important when the user has a code block selected, because it allows extracting the text from the code block instead of sending the surrounding fence (e.g., ```).
TypeScript Analogy:
interface CodeBlockNode {
start: number;
end: number;
kind: string;
children?: CodeBlockNode[];
}
function extractCodeFromBlock(bufferText: string, offset: number, rootNode: CodeBlockNode): string | null {
if(rootNode.kind === "fenced_code_block") {
for (const child of rootNode.children || []) {
if (child.kind === "code_fence_content" && child.start <= offset && offset <= child.end) {
return bufferText.slice(child.start, child.end);
}
}
}
return null;
}
const bufferText = `
\\\\`\\\\`\\\\`typescript
function add(a: number, b: number): number {
return a + b;
}
\\\\`\\\\`\\\\`
`;
const rootNode : CodeBlockNode = {
kind: "fenced_code_block",
start: 0,
end: 100,
children: [{
kind: "code_fence_content",
start: 19,
end: 85
}]
}
const code = extractCodeFromBlock(bufferText, 40, rootNode);
// code will be:
// `
// function add(a: number, b: number): number {
// return a + b;
// }
// `
Zed Implementation: This logic is used in find_surrounding_code_block in crates/assistant/src/assistant_panel.rs and in crates/assistant2/src/buffer_codegen.rs.
Concept: When the LLM indicates a tool should be used, Zed extracts the tool name and input parameters from the completion.
TypeScript Analogy:
interface ToolUseEvent {
toolName: string;
input: string;
}
function extractToolUse(llmOutput: string): ToolUseEvent | null {
const toolUseRegex = /<tool name="(.*?)".*?input="(.*?)"/g;
const match = toolUseRegex.exec(llmOutput);
if (match) {
return {
toolName: match[1],
input: match[2],
}
} else {
return null;
}
}
const llmOutput = "This is a text with <tool name='search' input='hello world'/>";
const toolUse = extractToolUse(llmOutput);
// toolUse will be: { toolName: 'search', input: 'hello world' }
Zed Implementation: This process is done in crates/assistant2/src/thread.rs and crates/language_models/src/provider/anthropic.rs.
Concept: When performing a code edit, Zed expects a JSON-like output from the LLM, which it then deserializes using serde.
TypeScript Analogy:
interface EditInstruction {
path: string;
oldText: string;
newText: string;
operation: "insert" | "replace" | "delete";
}
function parseEditInstructions(llmOutput: string): EditInstruction[] {
try {
return JSON.parse(llmOutput);
} catch (e) {
console.error("Failed to parse", llmOutput);
return [];
}
}
const llmOutput = `
[
{
"path": "src/main.rs",
"oldText": "console.log",
"newText": "console.info",
"operation": "replace"
}
]
`;
const editInstructions = parseEditInstructions(llmOutput);
// `editInstructions` will be:
// `
// [
// {
// path: 'src/main.rs',
// oldText: 'console.log',
// newText: 'console.info',
// operation: 'replace'
// }
// ]
// `
Zed Implementation: This logic can be found in crates/assistant2/src/buffer_codegen.rs.
Concept: Zed also uses XML-like tags to parse the output from LLMs to provide information about the output, like if it contains a patch or a title.
TypeScript Analogy:
interface XmlTag {
kind: string;
range: { start: number; end: number };
isOpenTag: boolean;
}
function parseXmlTags(text: string): XmlTag[] {
const xmlTagRegex = /<(\\\\/)?(\\\\w+)(.*?)>/g;
const tags = [];
let match;
while ((match = xmlTagRegex.exec(text)) !== null) {
const isOpenTag = match[1] !== "/";
tags.push({
kind: match[2],
range: { start: match.index, end: match.index + match[0].length },
isOpenTag,
});
}
return tags;
}
const text = `
<patch>
<title>Refactor foo</title>
<edit>
<path>src/main.rs</path>
<oldText>console.log</oldText>
<newText>console.info</newText>
<operation>replace</operation>
</edit>
</patch>
`;
const tags = parseXmlTags(text);
// tags will contain objects representing the different tags in the xml-like text.
Zed Implementation: The parsing of these XML tags is found in crates/assistant2/src/context.rs.
Zed uses a PromptBuilder to dynamically create prompts. Here's how prompt engineering is happening:
MultiBuffer and Range<Offset> and converted into text.user_prompt is included in the generated prompt, allowing the user to guide the LLM's behavior. The user prompt can be just the key binding that is pressed or a text written in the editor.crates/assistant2/src/thread.rs using the summarize function.function buildEditPrompt(
codeBefore: string,
userPrompt: string,
codeAfter: string,
language?: string
): string {
let language_line = ""
if (language) {
language_line = `\\`\\`\\`${language}\\n`
}
return `
Given the following code snippet, perform the requested change.
${language_line}
${codeBefore}
${language_line.trim()}
User: ${userPrompt}
${language_line}
${codeAfter}
${language_line.trim()}
`;
}
const prompt = buildEditPrompt(
`
function add(a: number, b: number) {
console.log("start");`,
"change console.log to console.info",
`
return sum;
}
`,
"typescript"
);
// prompt will be a string that combines the context with the user's instruction.
summarize function in crates/assistant2/src/thread.rs, which adds a system prompt that instructs the LLM to summarize the conversation in a few words.