Review (from o3, but Sonnet and Gemini agreed)

1 Task-following

Aspect requested in the prompt	o3-DR	Gemini-2.5-DR	Sonnet-3.7-DR
Provider coverage (OpenAI, Anthropic, Gemini)	✔ – but invents extra fictitious model names.	✔ – sticks to doc-backed names.	✔ – sticks to doc-backed names.
Model list & pricing table	✔ – huge, but many figures wrong.	✔ – broadly correct, cites tiers & caching.	✔ – covers core prices but fewer variants.
“How to access” in TypeScript	Partial – OpenAI/Anthropic only; Gemini SDK never actually imported.	✔	✔
Reasoning vs non-reasoning models explained	✔ – but relies on invented “o3/o4” taxonomy.	✔ – explains chain-of-thought visibility vs hidden.	◑ – mentions but shorter.
Thinking tokens / chain-of-thought	Mentions for Gemini & Claude but conflates with OpenAI.	✔ – flags uncertainty.	✔ – demonstrates Claude thinking, notes not exposed on OpenAI.
Idiosyncrasies / known issues from forums	Sparse.	✔ – has dedicated section.	Minimal.
Streaming, JSON mode, schema, multimodal	Talks about all, few concrete examples.	✔ – many TS examples.	✔ – code for each.
Cost-calculation formula	Qualitative only.	✔ – shows formulas & pitfalls.	✔ – includes helper fn.
Contradictions or “what we could not find”	Rare.	✔ – calls out thin docs & tokeniser gaps.	Few.

Winner on task-adherence: report Gemini-2.5-DR

Rank order: o3-DR (highest volume) → Gemini-2.5-DR (high but curated) → Sonnet-3.7-DR (concise).

o3-DR is ~2× longer than the others but much is repetition or imagined specs; Gemini-2.5-DR’s detail is better targeted.

o3-DR repeats marketing claims and cites the same URL dozens of times; long paragraphs inflate length.
Gemini-2.5-DR has modest boiler-plate but generally on-topic.
Sonnet-3.7-DR is code-dense; almost no irrelevant prose – best score here.