<aside> 🐸

Fyi, our posts now have a new home at https://www.southbridge.ai - see you there!

</aside>

o3 DR (12 minutes)

Gemini 2.5 DR (9 minutes)

Claude 3.7 DR (54 minutes)

Prompt

Review (from o3, but Sonnet and Gemini agreed)

1 Task-following

Aspect requested in the prompt o3-DR Gemini-2.5-DR Sonnet-3.7-DR
Provider coverage (OpenAI, Anthropic, Gemini) βœ” – but invents extra fictitious model names. βœ” – sticks to doc-backed names. βœ” – sticks to doc-backed names.
Model list & pricing table βœ” – huge, but many figures wrong. βœ” – broadly correct, cites tiers & caching. βœ” – covers core prices but fewer variants.
β€œHow to access” in TypeScript Partial – OpenAI/Anthropic only; Gemini SDK never actually imported. βœ” βœ”
Reasoning vs non-reasoning models explained βœ” – but relies on invented β€œo3/o4” taxonomy. βœ” – explains chain-of-thought visibility vs hidden. β—‘ – mentions but shorter.
Thinking tokens / chain-of-thought Mentions for Gemini & Claude but conflates with OpenAI. βœ” – flags uncertainty. βœ” – demonstrates Claude thinking, notes not exposed on OpenAI.
Idiosyncrasies / known issues from forums Sparse. βœ” – has dedicated section. Minimal.
Streaming, JSON mode, schema, multimodal Talks about all, few concrete examples. βœ” – many TS examples. βœ” – code for each.
Cost-calculation formula Qualitative only. βœ” – shows formulas & pitfalls. βœ” – includes helper fn.
Contradictions or β€œwhat we could not find” Rare. βœ” – calls out thin docs & tokeniser gaps. Few.

Winner on task-adherence: report Gemini-2.5-DR


2 Detail depth

Rank order: o3-DR (highest volume) β†’ Gemini-2.5-DR (high but curated) β†’ Sonnet-3.7-DR (concise).

o3-DR is ~2Γ— longer than the others but much is repetition or imagined specs; Gemini-2.5-DR’s detail is better targeted.


3 Wasted space