Control Flow & The Orchestration Engine

>LLM: Stream request with context LLM-->>MainLoop: text_delta: "I'll search for TODOs..." MainLoop-->>UI: Update display LLM-->>MainLoop: tool_use: GrepTool LLM-->>MainLoop: tool_use: ReadTool (multiple files) LLM-->>MainLoop: message_stop MainLoop->>ToolBatch: Execute tool batch par Parallel Execution ToolBatch->>Tool1: ReadTool.call() [Read-only] ToolBatch->>Tool2: GrepTool.call() [Read-only] Tool1-->>UI: Progress: "Reading file1.js" Tool2-->>UI: Progress: "Searching *.js" Tool1-->>ToolBatch: Result: File contents Tool2-->>ToolBatch: Result: 5 match"> >LLM: Stream request with context LLM-->>MainLoop: text_delta: "I'll search for TODOs..." MainLoop-->>UI: Update display LLM-->>MainLoop: tool_use: GrepTool LLM-->>MainLoop: tool_use: ReadTool (multiple files) LLM-->>MainLoop: message_stop MainLoop->>ToolBatch: Execute tool batch par Parallel Execution ToolBatch->>Tool1: ReadTool.call() [Read-only] ToolBatch->>Tool2: GrepTool.call() [Read-only] Tool1-->>UI: Progress: "Reading file1.js" Tool2-->>UI: Progress: "Searching *.js" Tool1-->>ToolBatch: Result: File contents Tool2-->>ToolBatch: Result: 5 match"> >LLM: Stream request with context LLM-->>MainLoop: text_delta: "I'll search for TODOs..." MainLoop-->>UI: Update display LLM-->>MainLoop: tool_use: GrepTool LLM-->>MainLoop: tool_use: ReadTool (multiple files) LLM-->>MainLoop: message_stop MainLoop->>ToolBatch: Execute tool batch par Parallel Execution ToolBatch->>Tool1: ReadTool.call() [Read-only] ToolBatch->>Tool2: GrepTool.call() [Read-only] Tool1-->>UI: Progress: "Reading file1.js" Tool2-->>UI: Progress: "Searching *.js" Tool1-->>ToolBatch: Result: File contents Tool2-->>ToolBatch: Result: 5 match">

sequenceDiagram
    participant User
    participant MainLoop as Main Loop (tt)
    participant LLM as LLM API
    participant ToolBatch as Tool Batcher
    participant Tool1 as ReadTool
    participant Tool2 as GrepTool
    participant Tool3 as EditTool
    participant UI as UI Renderer
    User->>MainLoop: "Search for TODO comments and update them"
    MainLoop->>LLM: Stream request with context
    LLM-->>MainLoop: text_delta: "I'll search for TODOs..."
    MainLoop-->>UI: Update display
    LLM-->>MainLoop: tool_use: GrepTool
    LLM-->>MainLoop: tool_use: ReadTool (multiple files)
    LLM-->>MainLoop: message_stop
    MainLoop->>ToolBatch: Execute tool batch
    par Parallel Execution
        ToolBatch->>Tool1: ReadTool.call() [Read-only]
        ToolBatch->>Tool2: GrepTool.call() [Read-only]
        Tool1-->>UI: Progress: "Reading file1.js"
        Tool2-->>UI: Progress: "Searching *.js"
        Tool1-->>ToolBatch: Result: File contents
        Tool2-->>ToolBatch: Result: 5 matches
    end
    ToolBatch->>MainLoop: Tool results
    MainLoop->>LLM: Continue with results
    LLM-->>MainLoop: tool_use: EditTool
    MainLoop->>ToolBatch: Execute write tool
    Note over ToolBatch, Tool3: Sequential Execution
    ToolBatch->>Tool3: EditTool.call() [Write]
    Tool3-->>UI: Progress: "Editing file1.js"
    Tool3-->>ToolBatch: Result: Success
    ToolBatch->>MainLoop: Edit complete
    MainLoop->>LLM: Continue with result
    LLM-->>MainLoop: text_delta: "Updated 5 TODOs..."
    MainLoop-->>UI: Final response

The Main Conversation Loop: A Streaming State Machine

The heart of Claude Code is the tt async generator function—a sophisticated state machine that orchestrates the entire conversation flow. Let's examine its actual structure:

// Reconstructed main loop signature with timing annotations
async function* tt(
  currentMessages: CliMessage[],         // Full history - Memory: O(conversation_length)
  baseSystemPromptString: string,        // Static prompt - ~2KB
  currentGitContext: GitContext,         // Git state - ~1-5KB typically
  currentClaudeMdContents: ClaudeMdContent[], // Project context - ~5-50KB
  permissionGranterFn: PermissionGranter, // Permission callback
  toolUseContext: ToolUseContext,         // Shared context - ~10KB
  activeStreamingToolUse?: ToolUseBlock,  // Resume state
  loopState: {
    turnId: string,        // UUID for this turn
    turnCounter: number,   // Recursion depth
    compacted?: boolean,   // Was history compressed?
    isResuming?: boolean   // Resuming from save?
  }
): AsyncGenerator<CliMessage, void, void> {
  // ┌─ PHASE 1: Context Preparation [~50-200ms]
  // ├─ PHASE 2: Auto-compaction Check [~0-3000ms if triggered]
  // ├─ PHASE 3: System Prompt Assembly [~10-50ms]
  // ├─ PHASE 4: LLM Stream Processing [~2000-10000ms]
  // ├─ PHASE 5: Tool Execution [~100-30000ms per tool]
  // └─ PHASE 6: Recursion or Completion [~0ms]
}

Phase 1: Context Window Management

The first critical decision in the control flow is whether the conversation needs compaction:

// Auto-compaction logic (inferred implementation)
class ContextCompactionController {
  private static readonly COMPACTION_THRESHOLDS = {
    tokenCount: 100_000,      // Aggressive token limit
    messageCount: 200,        // Message count fallback
    costThreshold: 5.00       // Cost-based trigger
  };

  static async shouldCompact(
    messages: CliMessage[],
    model: string
  ): Promise<boolean> {
    // Fast path: check message count first
    if (messages.length < 50) return false;

    // Expensive path: count tokens
    const tokenCount = await this.estimateTokens(messages, model);

    return tokenCount > this.COMPACTION_THRESHOLDS.tokenCount ||
           messages.length > this.COMPACTION_THRESHOLDS.messageCount;
  }

  static async compact(
    messages: CliMessage[],
    context: ToolUseContext
  ): Promise<CompactionResult> {
    // Phase 1: Identify messages to preserve
    const preserve = this.identifyPreservedMessages(messages);

    // Phase 2: Generate summary via LLM
    const summary = await this.generateSummary(
      messages.filter(m => !preserve.has(m.uuid)),
      context
    );

    // Phase 3: Reconstruct message history
    return {
      messages: [
        this.createSummaryMessage(summary),
        ...messages.filter(m => preserve.has(m.uuid))
      ],
      tokensaved: this.calculateSavings(messages, summary)
    };
  }
}

Performance Characteristics:

Token counting: O(n) where n is total message content length
Summary generation: One additional LLM call (~2-3s)
Memory impact: Temporarily doubles message storage during compaction

Phase 2: Dynamic System Prompt Assembly

The system prompt assembly reveals a sophisticated caching and composition strategy:

// System prompt composition pipeline
class SystemPromptAssembler {
  private static cache = new Map<string, {
    content: string,
    hash: string,
    expiry: number
  }>();

  static async assemble(
    basePrompt: string,
    claudeMd: ClaudeMdContent[],
    gitContext: GitContext,
    tools: ToolDefinition[],
    model: string
  ): Promise<string | ContentBlock[]> {
    // Parallel fetch of dynamic components
    const [
      claudeMdSection,
      gitSection,
      directorySection,
      toolSection
    ] = await Promise.all([
      this.formatClaudeMd(claudeMd),
      this.formatGitContext(gitContext),
      this.getDirectoryStructure(),
      this.formatToolDefinitions(tools)
    ]);

    // Model-specific adaptations
    const modelSection = this.getModelAdaptations(model);

    // Compose with smart truncation
    return this.compose({
      base: basePrompt,           // Priority 1
      model: modelSection,        // Priority 2
      claudeMd: claudeMdSection,  // Priority 3
      git: gitSection,           // Priority 4
      directory: directorySection, // Priority 5
      tools: toolSection         // Priority 6
    });
  }

  private static getModelAdaptations(model: string): string {
    // Model-specific prompt engineering
    const adaptations = {
      'claude-3-opus': {
        style: 'detailed',
        instructions: 'Think step by step. Show your reasoning.',
        tokenBudget: 0.3  // 30% of context for reasoning
      },
      'claude-3-sonnet': {
        style: 'balanced',
        instructions: 'Be concise but thorough.',
        tokenBudget: 0.2
      },
      'claude-3-haiku': {
        style: 'brief',
        instructions: 'Get to the point quickly.',
        tokenBudget: 0.1
      }
    };

    const config = adaptations[model] || adaptations['claude-3-sonnet'];
    return this.formatModelInstructions(config);
  }
}

Phase 3: The Streaming State Machine

The LLM streaming phase implements a complex event-driven state machine:

// Stream event processing state machine
class StreamEventProcessor {
  private state: {
    phase: 'idle' | 'message_start' | 'content' | 'tool_input' | 'complete';
    currentMessage: Partial<CliMessage>;
    contentBlocks: ContentBlock[];
    activeToolInput?: {
      toolId: string;
      buffer: string;
      parser: StreamingToolInputParser;
    };
    metrics: {
      firstTokenLatency?: number;
      tokensPerSecond: number[];
    };
  };

  async *processStream(
    stream: AsyncIterable<StreamEvent>
  ): AsyncGenerator<UIEvent | CliMessage> {
    for await (const event of stream) {
      switch (event.type) {
        case 'message_start':
          this.state.phase = 'message_start';
          this.state.metrics.firstTokenLatency = Date.now() - startTime;
          yield { type: 'ui_state', data: { status: 'assistant_responding' } };
          break;

        case 'content_block_start':
          yield* this.handleContentBlockStart(event);
          break;

        case 'content_block_delta':
          yield* this.handleContentBlockDelta(event);
          break;

        case 'content_block_stop':
          yield* this.handleContentBlockStop(event);
          break;

        case 'message_stop':
          yield* this.finalizeMessage(event);
          break;

        case 'error':
          yield* this.handleError(event);
          break;
      }
    }
  }

  private async *handleContentBlockDelta(
    event: ContentBlockDeltaEvent
  ): AsyncGenerator<UIEvent> {
    const block = this.state.contentBlocks[event.index];

    switch (event.delta.type) {
      case 'text_delta':
        // Direct UI update for text
        block.text += event.delta.text;
        yield {
          type: 'ui_text_delta',
          data: {
            text: event.delta.text,
            blockIndex: event.index
          }
        };
        break;

      case 'input_json_delta':
        // Accumulate JSON for tool input
        if (this.state.activeToolInput) {
          this.state.activeToolInput.buffer += event.delta.partial_json;

          // Try parsing at strategic points
          if (event.delta.partial_json.includes('}') ||
              event.delta.partial_json.includes(']')) {
            const result = this.state.activeToolInput.parser.addChunk(
              event.delta.partial_json
            );

            if (result.complete) {
              block.input = result.value;
              yield {
                type: 'ui_tool_preview',
                data: {
                  toolId: this.state.activeToolInput.toolId,
                  input: result.value
                }
              };
            }
          }
        }
        break;
    }
  }
}

Phase 4: The Tool Execution Pipeline

The tool execution system implements a sophisticated parallel/sequential execution strategy:

graph TB
    subgraph "Tool Request Analysis"
        ToolRequests[Tool Use Blocks] --> Categorize{Categorize by Type}
        Categorize -->|Read-Only| ReadQueue[Read Queue]
        Categorize -->|Write/Side-Effect| WriteQueue[Write Queue]
    end

    subgraph "Parallel Execution Pool"
        ReadQueue --> ParallelPool[Parallel Executor]
        ParallelPool --> Worker1[Worker 1]
        ParallelPool --> Worker2[Worker 2]
        ParallelPool --> WorkerN[Worker N]

        Worker1 --> Results1[Result 1]
        Worker2 --> Results2[Result 2]
        WorkerN --> ResultsN[Result N]
    end

    subgraph "Sequential Execution"
        WriteQueue --> SeqExecutor[Sequential Executor]
        Results1 --> SeqExecutor
        Results2 --> SeqExecutor
        ResultsN --> SeqExecutor

        SeqExecutor --> WriteTool1[Write Tool 1]
        WriteTool1 --> WriteTool2[Write Tool 2]
        WriteTool2 --> FinalResults[All Results]
    end