News distortion, a case study
The headline: ChatGPT appears to pass medical school exams, educators rethinking assessments.
The article:
- They were mock, abbreviated exams,
- done incorrectly, There are no open-ended questions on the real USMLE.
- which it didn’t actually pass,
- and which were reported in a pre-print. Which isn’t a complete knock against the study per se, but even a glance at it shows that some questionable choices have been made regarding the scope — there were only 376 publicly available questions instead of more than a 1,000 on the real exams — and the methods used to ensure the publicly available questions hadn’t already been indexed by the ChatGPT training algorithm.
To be clear: this is my complaining about misleading headlines, not saying that predictive AI wouldn’t at some point be able to ace the USMLE, that point not being now, for reasons stated above. And let’s not even get into whether having a high USMLE score means anything other than the person achieving a high score being a good test-taker (it doesn’t).
And with that, the Twitter chapter of my life has closed.
May 2008 to January 2023. Not a bad run, considering.
Adam Mastroianni’s Experimental History newsletter has enabled paid subscriptions today, and if there is one science-oriented Substack worth paying for, it’s Adam’s. I’m sold.
It is hard to watch an animation like this (a plot of England’s population versus GDP from 1270 onwards) without a sense of awe, followed by slight discomfort for what could come next.
I have seen many people sharing links to Maciej Cegłowski’s (excellent!) case against colonizing Mars.
Of course, Werner Herzog said it first, and more succinctly. Good luck with that, indeed.
Merry Christmas to all who celebrate!
Some beautiful charts in this Pew Research overview of their 2022 findings. What they show is not as beautiful, but you can’t win them all.
For those of you completely off Twitter, it is now in the impossible-to-avoid-Elon-Musk phase, where even if you block his account there will be people re-tweeting, quote-tweeting, subtweeting… if for nothing else then to complain.
Sadly, it is still the go-to place for medical conference updates, and right now there is a big one.
The cost of the ludic fallacy…
…is $1.5 million.
A few days ago, The Washington Post wrote about two medical students who are also identical twins being accused of cheating. Their school, the Medical University of South Carolina, apparently doesn’t have anyone on staff who is both versed in statistics and willing to participate in an investigation. Enter paid consultants:
The university sent their test scores to a data forensics company, Caveon, which reported that the chances of two tests that similar being completed independently was “less than a person winning four consecutive Power Ball drawings.”
Invocation of forensics is the first red flag (see: Calculated Risks by Gerd Gigerenzer). Comparing any real-life probability Rule of thumb: if what you are doing professionaly made it into xkcd you should stop doing it. to lottery is the second. The uncertanty of real-life probabilities has little to do with known odds of games of “chance”. Confusing the two leads to the ludic fallacy, or “misuse of games to model real-life situations”. Nassim Taleb, The Black Swan, 2007.
The twins, now lawyers, sued and won the said $1.5M. Good for them.
This is the perfect number of times a year to have cranberry sauce: one.
John Roderick (or was it Ken Jennings) on the Omnibus podcast.
Happy Thanksgiving to all who celebrate.