>LLM: Stream request with context LLM-->>MainLoop: text_delta: "I'll search for TODOs..." MainLoop-->>UI: Update display LLM-->>MainLoop: tool_use: GrepTool LLM-->>MainLoop: tool_use: ReadTool (multiple files) LLM-->>MainLoop: message_stop MainLoop->>ToolBatch: Execute tool batch par Parallel Execution ToolBatch->>Tool1: ReadTool.call() [Read-only] ToolBatch->>Tool2: GrepTool.call() [Read-only] Tool1-->>UI: Progress: "Reading file1.js" Tool2-->>UI: Progress: "Searching *.js" Tool1-->>ToolBatch: Result: File contents Tool2-->>ToolBatch: Result: 5 match"> >LLM: Stream request with context LLM-->>MainLoop: text_delta: "I'll search for TODOs..." MainLoop-->>UI: Update display LLM-->>MainLoop: tool_use: GrepTool LLM-->>MainLoop: tool_use: ReadTool (multiple files) LLM-->>MainLoop: message_stop MainLoop->>ToolBatch: Execute tool batch par Parallel Execution ToolBatch->>Tool1: ReadTool.call() [Read-only] ToolBatch->>Tool2: GrepTool.call() [Read-only] Tool1-->>UI: Progress: "Reading file1.js" Tool2-->>UI: Progress: "Searching *.js" Tool1-->>ToolBatch: Result: File contents Tool2-->>ToolBatch: Result: 5 match"> >LLM: Stream request with context LLM-->>MainLoop: text_delta: "I'll search for TODOs..." MainLoop-->>UI: Update display LLM-->>MainLoop: tool_use: GrepTool LLM-->>MainLoop: tool_use: ReadTool (multiple files) LLM-->>MainLoop: message_stop MainLoop->>ToolBatch: Execute tool batch par Parallel Execution ToolBatch->>Tool1: ReadTool.call() [Read-only] ToolBatch->>Tool2: GrepTool.call() [Read-only] Tool1-->>UI: Progress: "Reading file1.js" Tool2-->>UI: Progress: "Searching *.js" Tool1-->>ToolBatch: Result: File contents Tool2-->>ToolBatch: Result: 5 match">
sequenceDiagram
participant User
participant MainLoop as Main Loop (tt)
participant LLM as LLM API
participant ToolBatch as Tool Batcher
participant Tool1 as ReadTool
participant Tool2 as GrepTool
participant Tool3 as EditTool
participant UI as UI Renderer
User->>MainLoop: "Search for TODO comments and update them"
MainLoop->>LLM: Stream request with context
LLM-->>MainLoop: text_delta: "I'll search for TODOs..."
MainLoop-->>UI: Update display
LLM-->>MainLoop: tool_use: GrepTool
LLM-->>MainLoop: tool_use: ReadTool (multiple files)
LLM-->>MainLoop: message_stop
MainLoop->>ToolBatch: Execute tool batch
par Parallel Execution
ToolBatch->>Tool1: ReadTool.call() [Read-only]
ToolBatch->>Tool2: GrepTool.call() [Read-only]
Tool1-->>UI: Progress: "Reading file1.js"
Tool2-->>UI: Progress: "Searching *.js"
Tool1-->>ToolBatch: Result: File contents
Tool2-->>ToolBatch: Result: 5 matches
end
ToolBatch->>MainLoop: Tool results
MainLoop->>LLM: Continue with results
LLM-->>MainLoop: tool_use: EditTool
MainLoop->>ToolBatch: Execute write tool
Note over ToolBatch, Tool3: Sequential Execution
ToolBatch->>Tool3: EditTool.call() [Write]
Tool3-->>UI: Progress: "Editing file1.js"
Tool3-->>ToolBatch: Result: Success
ToolBatch->>MainLoop: Edit complete
MainLoop->>LLM: Continue with result
LLM-->>MainLoop: text_delta: "Updated 5 TODOs..."
MainLoop-->>UI: Final response
The heart of Claude Code is the tt
async generator function—a sophisticated state machine that orchestrates the entire conversation flow. Let's examine its actual structure:
// Reconstructed main loop signature with timing annotations
async function* tt(
currentMessages: CliMessage[], // Full history - Memory: O(conversation_length)
baseSystemPromptString: string, // Static prompt - ~2KB
currentGitContext: GitContext, // Git state - ~1-5KB typically
currentClaudeMdContents: ClaudeMdContent[], // Project context - ~5-50KB
permissionGranterFn: PermissionGranter, // Permission callback
toolUseContext: ToolUseContext, // Shared context - ~10KB
activeStreamingToolUse?: ToolUseBlock, // Resume state
loopState: {
turnId: string, // UUID for this turn
turnCounter: number, // Recursion depth
compacted?: boolean, // Was history compressed?
isResuming?: boolean // Resuming from save?
}
): AsyncGenerator<CliMessage, void, void> {
// ┌─ PHASE 1: Context Preparation [~50-200ms]
// ├─ PHASE 2: Auto-compaction Check [~0-3000ms if triggered]
// ├─ PHASE 3: System Prompt Assembly [~10-50ms]
// ├─ PHASE 4: LLM Stream Processing [~2000-10000ms]
// ├─ PHASE 5: Tool Execution [~100-30000ms per tool]
// └─ PHASE 6: Recursion or Completion [~0ms]
}
The first critical decision in the control flow is whether the conversation needs compaction:
// Auto-compaction logic (inferred implementation)
class ContextCompactionController {
private static readonly COMPACTION_THRESHOLDS = {
tokenCount: 100_000, // Aggressive token limit
messageCount: 200, // Message count fallback
costThreshold: 5.00 // Cost-based trigger
};
static async shouldCompact(
messages: CliMessage[],
model: string
): Promise<boolean> {
// Fast path: check message count first
if (messages.length < 50) return false;
// Expensive path: count tokens
const tokenCount = await this.estimateTokens(messages, model);
return tokenCount > this.COMPACTION_THRESHOLDS.tokenCount ||
messages.length > this.COMPACTION_THRESHOLDS.messageCount;
}
static async compact(
messages: CliMessage[],
context: ToolUseContext
): Promise<CompactionResult> {
// Phase 1: Identify messages to preserve
const preserve = this.identifyPreservedMessages(messages);
// Phase 2: Generate summary via LLM
const summary = await this.generateSummary(
messages.filter(m => !preserve.has(m.uuid)),
context
);
// Phase 3: Reconstruct message history
return {
messages: [
this.createSummaryMessage(summary),
...messages.filter(m => preserve.has(m.uuid))
],
tokensaved: this.calculateSavings(messages, summary)
};
}
}
Performance Characteristics:
The system prompt assembly reveals a sophisticated caching and composition strategy:
// System prompt composition pipeline
class SystemPromptAssembler {
private static cache = new Map<string, {
content: string,
hash: string,
expiry: number
}>();
static async assemble(
basePrompt: string,
claudeMd: ClaudeMdContent[],
gitContext: GitContext,
tools: ToolDefinition[],
model: string
): Promise<string | ContentBlock[]> {
// Parallel fetch of dynamic components
const [
claudeMdSection,
gitSection,
directorySection,
toolSection
] = await Promise.all([
this.formatClaudeMd(claudeMd),
this.formatGitContext(gitContext),
this.getDirectoryStructure(),
this.formatToolDefinitions(tools)
]);
// Model-specific adaptations
const modelSection = this.getModelAdaptations(model);
// Compose with smart truncation
return this.compose({
base: basePrompt, // Priority 1
model: modelSection, // Priority 2
claudeMd: claudeMdSection, // Priority 3
git: gitSection, // Priority 4
directory: directorySection, // Priority 5
tools: toolSection // Priority 6
});
}
private static getModelAdaptations(model: string): string {
// Model-specific prompt engineering
const adaptations = {
'claude-3-opus': {
style: 'detailed',
instructions: 'Think step by step. Show your reasoning.',
tokenBudget: 0.3 // 30% of context for reasoning
},
'claude-3-sonnet': {
style: 'balanced',
instructions: 'Be concise but thorough.',
tokenBudget: 0.2
},
'claude-3-haiku': {
style: 'brief',
instructions: 'Get to the point quickly.',
tokenBudget: 0.1
}
};
const config = adaptations[model] || adaptations['claude-3-sonnet'];
return this.formatModelInstructions(config);
}
}
The LLM streaming phase implements a complex event-driven state machine:
// Stream event processing state machine
class StreamEventProcessor {
private state: {
phase: 'idle' | 'message_start' | 'content' | 'tool_input' | 'complete';
currentMessage: Partial<CliMessage>;
contentBlocks: ContentBlock[];
activeToolInput?: {
toolId: string;
buffer: string;
parser: StreamingToolInputParser;
};
metrics: {
firstTokenLatency?: number;
tokensPerSecond: number[];
};
};
async *processStream(
stream: AsyncIterable<StreamEvent>
): AsyncGenerator<UIEvent | CliMessage> {
for await (const event of stream) {
switch (event.type) {
case 'message_start':
this.state.phase = 'message_start';
this.state.metrics.firstTokenLatency = Date.now() - startTime;
yield { type: 'ui_state', data: { status: 'assistant_responding' } };
break;
case 'content_block_start':
yield* this.handleContentBlockStart(event);
break;
case 'content_block_delta':
yield* this.handleContentBlockDelta(event);
break;
case 'content_block_stop':
yield* this.handleContentBlockStop(event);
break;
case 'message_stop':
yield* this.finalizeMessage(event);
break;
case 'error':
yield* this.handleError(event);
break;
}
}
}
private async *handleContentBlockDelta(
event: ContentBlockDeltaEvent
): AsyncGenerator<UIEvent> {
const block = this.state.contentBlocks[event.index];
switch (event.delta.type) {
case 'text_delta':
// Direct UI update for text
block.text += event.delta.text;
yield {
type: 'ui_text_delta',
data: {
text: event.delta.text,
blockIndex: event.index
}
};
break;
case 'input_json_delta':
// Accumulate JSON for tool input
if (this.state.activeToolInput) {
this.state.activeToolInput.buffer += event.delta.partial_json;
// Try parsing at strategic points
if (event.delta.partial_json.includes('}') ||
event.delta.partial_json.includes(']')) {
const result = this.state.activeToolInput.parser.addChunk(
event.delta.partial_json
);
if (result.complete) {
block.input = result.value;
yield {
type: 'ui_tool_preview',
data: {
toolId: this.state.activeToolInput.toolId,
input: result.value
}
};
}
}
}
break;
}
}
}
The tool execution system implements a sophisticated parallel/sequential execution strategy:
graph TB
subgraph "Tool Request Analysis"
ToolRequests[Tool Use Blocks] --> Categorize{Categorize by Type}
Categorize -->|Read-Only| ReadQueue[Read Queue]
Categorize -->|Write/Side-Effect| WriteQueue[Write Queue]
end
subgraph "Parallel Execution Pool"
ReadQueue --> ParallelPool[Parallel Executor]
ParallelPool --> Worker1[Worker 1]
ParallelPool --> Worker2[Worker 2]
ParallelPool --> WorkerN[Worker N]
Worker1 --> Results1[Result 1]
Worker2 --> Results2[Result 2]
WorkerN --> ResultsN[Result N]
end
subgraph "Sequential Execution"
WriteQueue --> SeqExecutor[Sequential Executor]
Results1 --> SeqExecutor
Results2 --> SeqExecutor
ResultsN --> SeqExecutor
SeqExecutor --> WriteTool1[Write Tool 1]
WriteTool1 --> WriteTool2[Write Tool 2]
WriteTool2 --> FinalResults[All Results]
end