Casey Handmer on LLMs:
Every time one of the labs releases an updated model I give it a thorough shakedown on physics, in the style of the oral examination that is still used in Europe and a few other places. Claude, Grok, Gemini, and GPT are all advancing by leaps and bounds on a wide variety of evals, some of which include rather advanced or technical questions in both math and science, including Physics Olympiad-style problems, or grad school qualifying exams.
And yet, none of these models would be able to pass the physicist Turing test. It’s not even a matter of knowledge, I know of reasonably talented middle schoolers with no specialized physics training who could reason and infer on some of these basic questions in a much more fluent and intuitive way.
Alexander the Great had Aristotle, some poor kid will have a brain-dead version of Wheatley.
(Casey’s post is deeper than simple LLM-trashing for he gives the actual 8-step process of reasoning through physics problems, so please do read the whole thing.)