Zed's code extraction focuses on extracting code from the buffer, code blocks, and metadata from LLM output. Here's how they do it, illustrated with TypeScript examples:
Concept: Before sending to the LLM, Zed extracts relevant code snippets around the user's cursor. This is done by using the Buffer's API to get text and position information.
TypeScript Analogy:
interface TextRange {
  start: number;
  end: number;
}
function extractCodeSnippet(
    bufferText: string,
    selectionRange: TextRange
): string {
    const codeBefore = bufferText.slice(0, selectionRange.start);
    const codeAfter = bufferText.slice(selectionRange.end);
    return `${codeBefore}\\\\n<selection>\\\\n${codeAfter}`;
}
const bufferText = `
    function add(a: number, b: number) {
      console.log("start");
      const sum = a + b;
      return sum;
    }
    console.log("end")
  `;
const selectionRange = { start: 45, end: 60 };
const codeSnippet = extractCodeSnippet(bufferText, selectionRange);
// `codeSnippet` will be:
// `
//     function add(a: number, b: number) {
//       console.log("start");
//       <selection>
//       return sum;
//     }
//     console.log("end")
//   `
Zed Implementation: This is reflected in BufferCodegen's logic, where the code extracts the text around the selection range and inserts it into the LanguageModelRequest.
Concept: Zed parses the buffer using tree-sitter to understand the structure of code. This is particularly important when the user has a code block selected, because it allows extracting the text from the code block instead of sending the surrounding fence (e.g., ```).
TypeScript Analogy:
interface CodeBlockNode {
  start: number;
  end: number;
  kind: string;
  children?: CodeBlockNode[];
}
function extractCodeFromBlock(bufferText: string, offset: number, rootNode: CodeBlockNode): string | null {
  if(rootNode.kind === "fenced_code_block") {
      for (const child of rootNode.children || []) {
          if (child.kind === "code_fence_content" && child.start <= offset && offset <= child.end) {
            return bufferText.slice(child.start, child.end);
          }
      }
    }
    return null;
}
const bufferText = `
    \\\\`\\\\`\\\\`typescript
    function add(a: number, b: number): number {
      return a + b;
    }
    \\\\`\\\\`\\\\`
  `;
 const rootNode : CodeBlockNode = {
     kind: "fenced_code_block",
      start: 0,
      end: 100,
      children: [{
         kind: "code_fence_content",
         start: 19,
         end: 85
     }]
 }
const code = extractCodeFromBlock(bufferText, 40, rootNode);
// code will be:
// `
//     function add(a: number, b: number): number {
//       return a + b;
//     }
//  `
Zed Implementation: This logic is used in find_surrounding_code_block in crates/assistant/src/assistant_panel.rs and in crates/assistant2/src/buffer_codegen.rs.
Concept: When the LLM indicates a tool should be used, Zed extracts the tool name and input parameters from the completion.
TypeScript Analogy:
interface ToolUseEvent {
  toolName: string;
  input: string;
}
function extractToolUse(llmOutput: string): ToolUseEvent | null {
  const toolUseRegex = /<tool name="(.*?)".*?input="(.*?)"/g;
  const match = toolUseRegex.exec(llmOutput);
  if (match) {
      return {
          toolName: match[1],
          input: match[2],
      }
  } else {
    return null;
  }
}
const llmOutput = "This is a text with <tool name='search' input='hello world'/>";
const toolUse = extractToolUse(llmOutput);
// toolUse will be: { toolName: 'search', input: 'hello world' }
Zed Implementation: This process is done in crates/assistant2/src/thread.rs and crates/language_models/src/provider/anthropic.rs.
Concept: When performing a code edit, Zed expects a JSON-like output from the LLM, which it then deserializes using serde.
TypeScript Analogy:
interface EditInstruction {
  path: string;
  oldText: string;
  newText: string;
  operation: "insert" | "replace" | "delete";
}
function parseEditInstructions(llmOutput: string): EditInstruction[] {
  try {
    return JSON.parse(llmOutput);
  } catch (e) {
    console.error("Failed to parse", llmOutput);
    return [];
  }
}
const llmOutput = `
    [
      {
        "path": "src/main.rs",
        "oldText": "console.log",
        "newText": "console.info",
        "operation": "replace"
      }
    ]
  `;
const editInstructions = parseEditInstructions(llmOutput);
// `editInstructions` will be:
// `
// [
//     {
//       path: 'src/main.rs',
//       oldText: 'console.log',
//       newText: 'console.info',
//       operation: 'replace'
//     }
//   ]
// `
Zed Implementation: This logic can be found in crates/assistant2/src/buffer_codegen.rs.
Concept: Zed also uses XML-like tags to parse the output from LLMs to provide information about the output, like if it contains a patch or a title.
TypeScript Analogy:
interface XmlTag {
  kind: string;
  range: { start: number; end: number };
  isOpenTag: boolean;
}
function parseXmlTags(text: string): XmlTag[] {
  const xmlTagRegex = /<(\\\\/)?(\\\\w+)(.*?)>/g;
  const tags = [];
  let match;
  while ((match = xmlTagRegex.exec(text)) !== null) {
    const isOpenTag = match[1] !== "/";
    tags.push({
      kind: match[2],
      range: { start: match.index, end: match.index + match[0].length },
      isOpenTag,
    });
  }
  return tags;
}
const text = `
<patch>
    <title>Refactor foo</title>
    <edit>
        <path>src/main.rs</path>
        <oldText>console.log</oldText>
        <newText>console.info</newText>
        <operation>replace</operation>
    </edit>
</patch>
`;
const tags = parseXmlTags(text);
// tags will contain objects representing the different tags in the xml-like text.
Zed Implementation: The parsing of these XML tags is found in crates/assistant2/src/context.rs.
Zed uses a PromptBuilder to dynamically create prompts. Here's how prompt engineering is happening:
MultiBuffer and Range<Offset> and converted into text.user_prompt is included in the generated prompt, allowing the user to guide the LLM's behavior. The user prompt can be just the key binding that is pressed or a text written in the editor.crates/assistant2/src/thread.rs using the summarize function.function buildEditPrompt(
    codeBefore: string,
    userPrompt: string,
    codeAfter: string,
    language?: string
): string {
  let language_line = ""
  if (language) {
    language_line = `\\`\\`\\`${language}\\n`
  }
    return `
    Given the following code snippet, perform the requested change.
    ${language_line}
    ${codeBefore}
    ${language_line.trim()}
    User: ${userPrompt}
    ${language_line}
    ${codeAfter}
    ${language_line.trim()}
    `;
}
const prompt = buildEditPrompt(
`
    function add(a: number, b: number) {
      console.log("start");`,
"change console.log to console.info",
`
      return sum;
    }
`,
    "typescript"
);
  // prompt will be a string that combines the context with the user's instruction.
summarize function in crates/assistant2/src/thread.rs, which adds a system prompt that instructs the LLM to summarize the conversation in a few words.