Our problem is simple. We need to estimate the macros in this thing - and this image is all we have to go on.

We’re going to give this problem to three multimodal models, then join the results with two CoT models, and see how we do.

Initial estimations

Results 1

Let’s plot the results (we’ll use an average for ranges) and look at deviation.

What if we have all three estimates to o1-preview and QwQ?

The results aren’t very conclusive, but from a cursory look there’s a lot more agreement across larger values.

The bagel’s here so I might go eat it.