LLM API Integration Guide (OpenAI, Anthropic Claude, Google Gemini)
This comprehensive guide provides a unified TypeScript-focused reference for accessing large language models (LLMs) from OpenAI, Anthropic (Claude), and Google Gemini. It covers the latest available models, pricing and token costs, API setup, usage examples in TypeScript, advanced features (like function calling and streaming), multi-modal I/O, and known quirks. The goal is to replace scattered documentation from each provider with a clear, side-by-side developer reference.
1. Model Catalog and Capabilities
Below is a list of the currently available LLM models from OpenAI, Anthropic, and Google, along with their capabilities, context token limits, and multi-modal support. Models are grouped by provider and categorized into reasoning models (optimized for complex, multi-step reasoning) versus lightweight models (faster, cost-efficient for simpler tasks).
OpenAI Models (GPT Series)
- GPT-4.1 – Flagship GPT model for complex tasks. This is OpenAI’s latest high-performance model, succeeding the original GPT-4. It supports up to 1,000,000 tokens of context (1M token window)openai.com, a huge leap from the 128k context of its predecessor GPT-4o. GPT-4.1 has strong coding, reasoning, and instruction-following capabilities, with an updated knowledge cutoff of June 2024openai.com. It handles multi-step reasoning very well and can maintain coherence even with extremely long inputsopenai.com. Multimodal support: It accepts text and images as input, and can process audio input/output in certain modes (e.g. via ChatGPT tools)help.openai.comhelp.openai.com. By default, its API outputs text; image generation is available via a separate image model (see GPT-Image below).
- GPT-4.1 Mini – Balanced model for speed vs. intelligence. A smaller version of GPT-4.1, offering faster responses at lower cost while still retaining strong performance. It also supports up to 1M context tokensopenai.com. GPT-4.1-mini is suitable for everyday tasks that need some reasoning but with lower latency. It handles text and image inputs similar to GPT-4.1help.openai.com, but does not have access to certain advanced tools (for example, ChatGPT’s data analysis or file upload features are only in the full model)help.openai.com.
- GPT-4.1 Nano – Fastest, most cost-effective model. An even smaller GPT-4.1 variant optimized for low latency and cost. It’s ideal for simple or high-volume tasks where speed is critical. It still supports large context windows (up to 1M tokens) and basic reasoning, but may not perform as well as the larger models on complex logic or coding challenges. Like other GPT-4.1 models, it is primarily a text model (with text input/output), and multimodal features may be limited.
- OpenAI o3 (GPT-4o) – Advanced reasoning model. Often referred to as GPT-4o, this was the original GPT-4 model enhanced for reasoning. It remains one of OpenAI’s most powerful reasoning models, with top-tier performance on complex problem solving, coding, math, science, and even vision tasksopenai.comopenai.com. It has a context window of up to 128k tokenshelp.openai.com. GPT-4o can accept text, image, and audio inputs and produce text outputs (ChatGPT’s GPT-4 can describe images, for instance)help.openai.com. Unlike experimental models from other providers, GPT-4o does not expose its chain-of-thought reasoning to the user – the model performs reasoning internally and only outputs the final answer (OpenAI’s design hides the “thinking” process)youtube.com. Use GPT-4o for the highest reliability on difficult queries where cost is less a concern.
- OpenAI o4-mini (GPT-4o Mini) – Cost-efficient reasoning model. A smaller, faster version of the advanced reasoning line. It delivers strong performance on tasks like math and coding but at a much lower cost than the full o3 modelopenai.comopenai.com. It also supports large context (128k tokens). This model aims to bridge the gap between GPT-4.1 and the heavy o3 in scenarios requiring some chain-of-thought reasoning without the full cost. Like GPT-4o, it can handle vision input and performs hidden reasoning steps to produce answers (no self-disclosing chain-of-thought).
- GPT-3.5 Turbo (legacy) – Previous generation chat model. While largely superseded by the GPT-4.1 series, OpenAI’s GPT-3.5 models are still available for compatibility and cost-sensitive applications. They handle up to 4k or 16k tokens of context (depending on the variant) and are suitable for straightforward conversations or tasks. They are faster and cheaper but less capable of complex reasoning compared to GPT-4.x models. GPT-3.5 models are text-only (no image input) in the API. For reference,
gpt-3.5-turbo
was the model behind ChatGPT early on. OpenAI also provided an instruct mode variant and fine-tuned versions for custom needs. Today, GPT-4.1-nano essentially takes over the role of a fast, cheap model with better quality than the original 3.5.
Note: OpenAI also offers multimodal models beyond text. For image generation, OpenAI’s latest is the GPT-Image model (based on DALL·E 3) which can create images from text promptszuplo.com. For speech-to-text, OpenAI provides Whisper models. These are separate endpoints but can be combined with the GPT models in your application for multi-modal experiences. The ChatGPT GPT-4 (GPT-4o) in the ChatGPT product can produce images and handle voice via built-in tools, but via API you would call the dedicated image or audio models.
Anthropic Claude Models
Anthropic’s Claude models are known for large context windows and a “safety-first” approach. The latest Claude 3 series introduced huge context lengths (200K tokens) and an extended thinking mode for reasoning. All current Claude 3 models support multilingual text and even accept images as input (vision) with text outputdocs.anthropic.comdocs.anthropic.com.
- Claude 3.7 “Sonnet” – Most intelligent Claude model (hybrid reasoning). Claude 3.7 is Anthropic’s flagship model as of early 2025, and the first hybrid reasoning model on the marketdocs.anthropic.com. It is extremely capable in complex tasks and features a toggleable extended thinking mode. It has a 200,000-token context window for inputdocs.anthropic.com – meaning you can supply very long documents or multiple transcripts, and it can utilize all that context. By default it can output up to 8192 tokens, but with a special setting it can produce up to 64k or even 128k tokens of outputdocs.anthropic.comdocs.anthropic.com. Claude 3.7 accepts both text and image input and produces text outputsdocs.anthropic.com. The image understanding (vision) allows it to analyze images you provide (e.g. describe an image or answer questions about it). With Extended Thinking enabled, Claude 3.7 will first output a step-by-step reasoning process (“thoughts”) before its final answer, giving insight into its chain-of-thoughtanthropic.com. This can improve accuracy on tricky problems at the cost of additional tokens. Use Claude 3.7 for the highest quality results on difficult questions, summarizing long documents, or tasks requiring deep reasoning with transparency.
- Claude 3.5 “Sonnet” – Previous flagship model. Claude 3.5 Sonnet was Anthropic’s top model before 3.7. It also offers 200K context and high intelligencedocs.anthropic.com, but does not support the new extended thinking modedocs.anthropic.com. It’s still powerful and has similar capabilities (multilingual, vision support) with slightly lower performance and an older knowledge cutoff (April 2024)docs.anthropic.com. In practice, 3.5 Sonnet is used if 3.7 is unavailable or for continuity with responses tuned on the older model. Pricing for 3.5 Sonnet is the same as 3.7 (see Pricing section)docs.anthropic.com.
- Claude 3.5 “Haiku” – Fast lightweight model. The Haiku models in Anthropic’s lineup are optimized for speed and cost. Claude 3.5 Haiku is described as “our fastest model”docs.anthropic.com. It retains the large 200K context capabilitydocs.anthropic.com and multilingual understanding, but with a smaller architecture that trades some raw “intelligence” for blazing fast responses. It’s ideal for tasks like rapid summarization, simple Q&A, or as an AI assistant that responds almost instantly. Max output length for Haiku is typically 8192 tokensdocs.anthropic.com. It does not have the extended thinking feature (only the 3.7 Sonnet does)docs.anthropic.com, so it won’t automatically show reasoning steps. Use Claude 3.5 Haiku when you need near-real-time answers and can sacrifice a bit of reasoning depth. It’s also significantly cheaper (see pricing: ~0.8$/M input vs $3/M for Sonnet)docs.anthropic.com.
- Claude 3 “Opus” and others – Legacy Claude 3 models. Anthropic also provides older Claude 3 variants such as Claude 3 Opus and Claude 3 Haiku. Claude 3 Opus was a “powerful model for complex tasks” with 200K contextdocs.anthropic.com, similar in scope to a Claude 3.0 flagship. Claude 3 Haiku was an earlier fast model. These have smaller max outputs (4096 tokens)docs.anthropic.com and older training data (up to mid-2023). They are mainly for continuity and backward compatibility. New development should target Claude 3.5/3.7 models for the best results. All Claude 3 family models allow very large contexts and are trained with Anthropic’s safety rules to avoid toxic or disallowed outputs.
Note: All Claude models currently produce text-only outputs. They can take image inputs (which are internally converted to text descriptions or analyzed by the model’s vision component)docs.anthropic.com, but they do not generate images. If you need image output from an Anthropic model, you would have to integrate a separate vision model. Also, Anthropic models tend to be verbose and “helpful” in style, often giving detailed explanations. Developers can instruct them to be more concise via prompts if needed.
Google Gemini Models