Published on [Permalink]
Posted in:

Casey Handmer on LLMs:

Every time one of the labs releases an updated model I give it a thorough shakedown on physics, in the style of the oral examination that is still used in Europe and a few other places. Claude, Grok, Gemini, and GPT are all advancing by leaps and bounds on a wide variety of evals, some of which include rather advanced or technical questions in both math and science, including Physics Olympiad-style problems, or grad school qualifying exams.

And yet, none of these models would be able to pass the physicist Turing test. It’s not even a matter of knowledge, I know of reasonably talented middle schoolers with no specialized physics training who could reason and infer on some of these basic questions in a much more fluent and intuitive way.

Alexander the Great had Aristotle, some poor kid will have a brain-dead version of Wheatley.

(Casey’s post is deeper than simple LLM-trashing for he gives the actual 8-step process of reasoning through physics problems, so please do read the whole thing.)

✍️ Reply by email

✴️ Also on Micro.blog