Screenshot_20241203_142424_Grab.jpg

Our problem is simple. We need to estimate the macros in this thing - and this image is all we have to go on.

We’re going to give this problem to three multimodal models, then join the results with two CoT models, and see how we do.

Prompt

Initial estimations

Claude-3-5-sonnet-20241022

4o

Gemini-exp-1121

Results 1

Let’s plot the results (we’ll use an average for ranges) and look at deviation.

Variance across results

Joining

What if we have all three estimates to o1-preview and QwQ?

Prompt

Results 2

QwQ

O1-preview

So what do we get?

The results aren’t very conclusive, but from a cursory look there’s a lot more agreement across larger values.

The bagel’s here so I might go eat it.