Journals should formalize AI "peer" review as soon as possible — they are getting them anyway
Two days ago I may have done some venting about peer review. Today I want to provide a solution: uber-peer review, by LLM.
The process is simple: as soon as the editor receives a manuscript and after the usual process determines it should be sent out for review, they upload it to ChatGPT (model GPT-4o, alas, since o1 doesn’t take uploads) and write the following prompt(s):
This is a manuscript submitted to the journal ABC. Our Scope is XYZ and our impact factor is x. We publish y% of submissions. Please write a review of the manuscript as (choose one of the three options below):
- A neutral reviewer who is an expert in the topics covered by the article and will provide a fair and balanced review.
- A reviewer from a competing group who will focus and over-emphasize every fault of the work and minimize the positive aspects of the paper.
- A reviewer who is enthusiastic about the paper and will over-emphasize the work’s impact while neglecting to mention its shortcomings.
(the following applies to all three) The review should start with an overview of the paper, its potential impact to the field, and the overall quality (low, average or high-quality) of the idea, methodology, and the writing itself. It should follow with an itemized list of Major and Minor comments that the author(s) can respond to. All the comments should be grounded in the submitted work.
What comes out with prompt number 1 will be better than 80% of peer review performed by humans, and the cases number 2 and 3 are informative on it’s own. If the fawning review isn’t all that fawning, well that’s helpful information regardless. A biased result can still be useful if you know the bias! Will any of it be better than the best possible human review? Absolutely not, but how many experts give their 100% for a fair review — if such a thing is even possible — and after how much poking and prodding from an editor, even for high impact factor journals?
And how many peer reviewers are already uploading their manuscripts to ChatGPT anyway, then submitting them under their own name with more or less editing? What model are they using? What prompt? Wouldn’t editors want to be in control there?
Let’s formalize this now, because you can be sure as hell that it is already happening.
Today’s Stratechery update from Ben Thompson is about censorship and it is too bad that there is a paywall — email me if you’d like it forwarded — because it is the best overview of our current predicament. Ada Palmer’s Tools for Thinking about Censorship is still the best historical perspective.
Here are a few links to start off 2025 (see if you can spot a pattern):
- Things we learned about LLMs in 2024 (ᔥDaring Fireball)
- The new Turing test for AI video… is absolutely horrifying (ᔥMR)
- The Ghosts in the Machine
- On skilled immigration
Happy New Year, dear reader!
Additional notes from the future
I was peripherally aware that large language models have crossed a chasm in the last year or so, but I haven’t realized how large of a jump it was until I compared ChatGPT’s answer to my standard question: “How many lymphocytes are there in the human body?”.
Back in February of last year it took some effort to produce an over-inflated estimate. Today, I was served a well-reasoned and beautifuly formatted response after a single prompt. Sure, I have gotten better at writing prompts but the difference there is marginal. Not so marginal is the leap in usefulness and trustworthiness of the model, which went from being an overeager high school freshman to an all-star college senior.
And that is just the reasoning. Creating quick Word documents with tables and columns just the way I want them has become routine, even when/especially if I want to recreate a document from a badly scanned printout. My office document formatting skills are getting rusty and I couldn’t be happier for it.
In his Kefahuchi Tract trilogy, M. John Harrison conjures up alien algorithms floating around the human environment, mostly helpful, sometimes not, motives unknown. Back in the early 2000s when the first novel came out I was wondering what on earth he was talking about but for better or worse we are now headed towards that world. Whether we are inching or hurling, that depends entirely on your point of view.
(↬Tyler Cowen)
Mozi is a splendid idea for making serendipitous encounters happen. On the other hand, can you truly call these encounters serendipitous if they needed an app? (ᔥMatthew Haughey)
Woke up feeling like a steamroller ran me over and wondered “is this what middle age is like?” but no, Apple Watch soon notified me that my sleeping heart rate and respiratory rate both were higher than usual, so I am probably coming down with some virus or another. To which I say, bring it on.
Day one of Apple Intelligence. Trying out writing tools first and apparently the professional version of the sentence “This is a text about something, nothing in particular.” is — drumroll, please — “This text is a general discussion about various topics without a specific focus or subject matter.”
So “make professional” is code for “bullshittify”. How delightful.
I wanted the world to stop, and I wouldn’t stop until it did.
File this one under “sentences of note”. The entire essay is almost too good of a cautionary tale to be true, but who cares if it’s “true” as long as it’s good.
I tried watching a Tinderbox journaling tutorial on Youtube, and it was just way too much overhead for me. But the beauty of Tinderbox is that you can have as much or as little structure as fits my needs, and my needs are modest… for now.
Matthew Gasda for the Wisdom of Crowds:
If, in 1450, someone had gone around Florence saying, “No, no, no, we don’t live in a renaissance, culture is in decay,” I think it would have been possible to throw open the doors of the workshops, cathedrals, churches, and wealthy residences, and say, “Well, you know, I think you might be wrong about that. Take a look at this.” But in 2024, what do optimists see when you throw open the doors? Mr. Beast? Addison Rae? BAP? Talk Tuah?
I don’t know what three of those four are, but I’d show Teenage Engineering, Panic Software and this.