A brief note on AI peer review, education and bullshit
When I wrote about formalizing AI “peer” review I meant it as a tongue-in-cheek comment on the shoddy human peer review we are getting anyway. “Wittgenstein’s ruler: Unless you have confidence in the ruler’s reliability, if you use a ruler to measure a table you may also be using the table to measure the ruler. The less you trust a ruler’s reliability (in probability called the prior), the more information you are getting about the ruler and the less about the table.”, Nassim Taleb in Fooled by Randomness. Peer reviewers are the ruler, the articles are the table, and there is zero trust in the ruler’s reliability. It was also (1) a bet that the median AI review would soon be better than the median human review (and remember, the median journal article is not submitted to Nature or Cell but to a journal that’s teetering on being predatory), and (2) a prediction that the median journal is already getting “peer” reviews mostly or totally “written” by LLMs.
Things have progressed since January on both of these fronts. In a textbook example of the left hand not knowing what the right hand is doing, some journals are (unintentionally?) steering their reviewers towards using AI while at the same time prohibiting AI from being used. And some unscrupulous authors are using hidden prompts to steer LLM review their way (↬Andrew Gelman). On the other hand, I have just spent around 4 hours reviewing a paper without using any AI help whatsoever, and it was fun. More generally, despite occasionally writing about how useful LLMs can be, my use of ChatGPT has significantly decreased since I fawned over deep research.
Maybe I should be using it more. Doc Searls just wrote about LLM-driven “Education 3.0”, with some help from a sycophantic ChatGPT which framed eduction 1.0 as “deeply human, slow, and intimate” (think ancient Greeks, the Socratic method and the medieval Universities), 2.0 as “mechanized, fast, and impersonal” (from the industrial revolution until now), and 3.0 as “fast and personal”. Should I then just let my kids use LLMs whenever, unsupervised, like Neal Stephenson’s Primer (“an interactive book that will adapt as the user grows and learns”)? But then would I want my kids hanging out with a professional bullshitter? Helen Beetham has a completely contrarian stance — that AI is the opposite of education — and her argument is more salient, at least if we take AI to mean only LLMs. Hope lies eternal that somebody somewhere is developing actual artificial intelligence which could one day lead to such wonderful things as the “Young Lady’s Illustrated Primer”.
Note the emphasis on speed in the framing of Education 3.0. I am less concerned about LLM bullshit outside of education, in a professional setting, since part of becoming a professional is learning how to identify bullshitters in your area of expertise. But bullshit is an obstacle to learning: this is why during medical school in Serbia I opted for reading textbooks in English rather than inept translations to Serbian made by professors with an aptitude for bulshitting around ambiguity. This is, I suppose, the key reason why we need LLMs there in the first place for there is nothing stopping a motivated learner from browsing wikipedia, reading any number of freely available masterworks online, watching university lectures on YouTube, and interacting with professionals and fellow learners via email, social networks, Reddit and what not. But you need to be motivated either way: to be able to wait and learn without immediate feedback in a world without LLMs, or to be able to wade through hallucinations and bullshit that LLMs can generate immediately. Education faces a bootstrapping problem here, for how can you recognize LLM hallucinations in a field you yourself are just learning?
The through-line for all this is motivation. If you review papers in order to check a career development box, to get O1 visa/EB1 green card status, and/or get brownie points from a journal I suspect you would see it as a waste of time and take any possible shortcut. But if you review papers because of a sense of duty, for fun, or to satisfy a sadistic streak — perhaps all three! — why would you want to deprive yourself of the work? Education is the same: if you are learning for the sake of learning, why would you want to speed it up? Do you also listen to podcasts and watch YouTube lectures at 2x? Of course, many people are not into scientia gratia scientiae and are doing it to get somewhere or become something, in which case Education 2.0 should be right up their alley, along with the playback speed throttle.
A tale of two graphs
The FT and NYT both have stories about the dollar’s poor start to the year, which sounds alarming. But then NYT shows this graph to back up the claim and you know what, it really doesn’t seem to be all that dramatic. In fact, the very beginning of the year has been quite average, as have the last two months. It is only the period from March until mid-April that saw two unusual slumps, but does that count as “dollar having its worst start to a year since 1973”, as the NYT put it? It might, depending on your definition of “worst” and “start”, but hardly a foregone conclusion. I know that newspapers need to prepare for the slow news week with the holiday coming up, but come on. “Worst start to a year in more than 50 years” is a bit too dramatic for what the chart shows us.
What kind of data would deserve some drama? Well, again the NYT provides the perfect example with their front page news on April 2020 US unemployment data. The headline, in much deserved all-caps, says “U.S. UNEMPLOYMENT IS WORST SINCE DEPRESSION” and has the unemployed bard dip so far below anything in the past 50 years that it falls all the way down to the bottom of the front page. A true extreme value.
As an aside, if you thought you could call either “an outlier”, think again. Here is a 12-minute explainer on the difference from Pasquale Cirillo’s Log of Risk podcast but in short: outliers are impossible values, extreme values are, well, extreme but still in the realm of the possible. The dollar’s decline this year is neither but you wouldn’t know it if you just read the headlines.
Some of the best blog posts are rants, and Andrew Gelman just published one, about reckless disregard for the truth. Here is why he thinks the term “bullshit” does not apply:
In my post, I asked what do you call it when someone is lying but they’re doing it in such a socially-acceptable way that nobody ever calls them on it? Some commenters suggested the term “bullshit,” but that didn’t quite seem right to me, because these people seemed pretty deliberate in their factual misstatements.
I disagree. Whether the bullshitter is deliberate should not matter, and many do indeed BS with a specific goal in mind. In the examples he lists those are inflating the impact of a paper and getting paid for expert testimony in favor of big tobacco. Indeed, dig deep enough and you will find hunger for money and prestige to be at the root of much bullshit.
A few good links to start the week:
- Innovation and Repetition by René Girard
- Face it: you’re a crazy person by Adam Mastroianni
- How to build the perfect city by Chris Arnade (also in conversation with Tyler)
- Does the Pulitzer Prize Hate Substack? by Ted Gioia (note where or these articles live!)
I have mentioned before that I am not a fan of IQ as a measure of anything other than un-intelligence and have linked to Taleb’s short essay on it from way back in 2019. Well, that same year Sean McClure wrote an even more thorough account of why testing intelligence as done today is pseudoscience, and you get to learn much more about models, biases, and the scientific method. Recommended long read.
Some good links from the past week:
Once a decade, I am obligated to read a book from Eric Topol. Ten years ago it was during a rotation at Georgetown where they were handing around copies of The Creative Destruction of Medicine like candy. Of course, if those books had truly been candy they would have been of the sort that quickly congeals into an inedible hard lump because nothing in The Creative Destruction… aged well.
Well this year Topol has a book out on aging, and if it weren’t for some high-profile endorsments I would not be paying it two cents. But then I saw Nassim Taleb praising its rigor and scholarliness, highlighting as an example that Topol cites multiple trials for each claim. One can hope the trials he cites actually back up the claims, and to confirm that is indeed the case I now have Super Agers on the pile. Kindle version only: physical space in our library is too precious for Topol.
📚 Finished reading: Thinking in Systems by Donella H. Meadows. Much like Nassim Taleb who started with probability and statistics only to end in the territory of ethics and values, Meadows starts with algorithms and quantities but ends with higher purpose and transcendence. A book to be re-read.
If you say that “$1 of research investment yields $5 in returns to the economy” — as some do — but then clarify that under those $5 you have a lot of laboratory-building and infrastructure-supporting — as some did — what point exactly are you trying to make? As ever, there is much wisdom in r/Jokes.
If all we had to do is trust the scientific method, why does homeopathy still exist (but not lobotomies)?
Another good podcast episode: neurosurgeon Theodore Schwartz talking to Tyler Cowen. Dr Schwartz is a bigger believer in science than yours truly:
COWEN: Do you think there are areas of science, though, where the institutions are so screwed up that you don’t actually trust the product of what is coming out, and there’s some systematic bias in the ideas being generated?
SCHWARTZ: I think, yes, there’s always going to be politics involved, and we always come to any problem from a unique single perspective, and institutions are going to have their biases. Yes, that is true, but in the long run, the scientific method will figure it out, and there will be one right answer. That institution — whatever their bias is — will be proven wrong in the long run. Now, those people might be dead and won’t be able to apologize at that point.
The problem, of course, is even when the scientific method does figure something out, people still keep doing things the old way, and no, generational change does not help. Witness homeopathy, kyphoplasty, vitamin C for colds, and — more relevant to Tyler’s question — the amyloid plaque hypothesis of Alzheimer’s disease. Abandoning lobotomies was an aberration, zombie medicine is the rule.