LLM API Integration Guide (OpenAI, Anthropic Claude, Google Gemini)

This comprehensive guide provides a unified TypeScript-focused reference for accessing large language models (LLMs) from OpenAI, Anthropic (Claude), and Google Gemini. It covers the latest available models, pricing and token costs, API setup, usage examples in TypeScript, advanced features (like function calling and streaming), multi-modal I/O, and known quirks. The goal is to replace scattered documentation from each provider with a clear, side-by-side developer reference.

1. Model Catalog and Capabilities

Below is a list of the currently available LLM models from OpenAI, Anthropic, and Google, along with their capabilities, context token limits, and multi-modal support. Models are grouped by provider and categorized into reasoning models (optimized for complex, multi-step reasoning) versus lightweight models (faster, cost-efficient for simpler tasks).

OpenAI Models (GPT Series)

Note: OpenAI also offers multimodal models beyond text. For image generation, OpenAI’s latest is the GPT-Image model (based on DALL·E 3) which can create images from text promptszuplo.com. For speech-to-text, OpenAI provides Whisper models. These are separate endpoints but can be combined with the GPT models in your application for multi-modal experiences. The ChatGPT GPT-4 (GPT-4o) in the ChatGPT product can produce images and handle voice via built-in tools, but via API you would call the dedicated image or audio models.

Anthropic Claude Models

Anthropic’s Claude models are known for large context windows and a “safety-first” approach. The latest Claude 3 series introduced huge context lengths (200K tokens) and an extended thinking mode for reasoning. All current Claude 3 models support multilingual text and even accept images as input (vision) with text outputdocs.anthropic.comdocs.anthropic.com.

Note: All Claude models currently produce text-only outputs. They can take image inputs (which are internally converted to text descriptions or analyzed by the model’s vision component)docs.anthropic.com, but they do not generate images. If you need image output from an Anthropic model, you would have to integrate a separate vision model. Also, Anthropic models tend to be verbose and “helpful” in style, often giving detailed explanations. Developers can instruct them to be more concise via prompts if needed.

Google Gemini Models