Posts in: science

A note on IQ

It has been more than six years ago now that Nassim Taleb rightfully called IQ a pseudoscientific swindle. Yet this zombie idea keeps coming back, most recently as a meandering essay by one Crémieux who, through a series of scatter plots and other data visualizations, attempts to persuade that “National IQs” are a valid concept and that yes, they are much lower in South Asia and Subsaharan Africa than the rest of the world.

This hogwash prompted another series of exchanges on IQ ending, for now, with this X post that recapped some points from Taleb’s original essay for a lay audience. That alone is worth reposting, but what I thought was even more interesting was one of the replies:

But I still prefer my doctor or pilot or professor to have an iq of over 120 (at least). I am sure it matters. Not as the only characteristic, but still.

While missing the point so completely that it wasn’t worth replying to, the post is a good example of another IQ-neutral human trait, to hypothesize on properties in isolation without considering nth-order effects. Let’s say your surgeon’s IQ is 160. What are the implications for their specialty of choice, fees, where they work, and bedside manner? Are they more or less of a risk-taker because of this? Does their intellectual savvy transfer more to their own bottom line, picking high-reimbursement procedures over a more conservative approach? Even if you said “all else being equal I’d prefer someone with a higher IQ”, well, why would you if everything else was equal? In that case would it not even make more sense to pick someone who did not have the benefit of acing multiple choice questions based on pure reasoning rather than knowledge? And yes, Taleb wrote about that as well.

Another set of replies was on the theme of “well I don’t think we could even have a test that measures IQ”, showing that they don’t know what IQ is — it is the thing measured by an IQ test. There is some serious confusion in terms here and X is the worst place to have a discussion about it, everyone shouting over each other.

Finally, since I agree with Taleb that IQ as used now is a bullshit concept, people may surmise as they did for him that I took the test and that I am now, disappointed in the result, trying to discredit it. I do think it’s BS for personal reasons, but of a different kind: some 25 years ago as a high school freshman in Serbia I took the test and was accepted to mensa. Having attended a single, tedious meeting in Belgrade shortly afterward I saw that the whole thing was indeed laughable and haven’t thought about it again until reading that 2019 essay.

Having a high IQ means you are good at taking tests, and correlates with success in life as much as your life is geared towards test-taking. There is nothing else “there” there and good test-takers unhappy with their lives should focus on other of life’s many questions, like how to execute a proper deadlift and whether home-made fresh pasta is better than the dried variant.


The Grim Rules of science

The founders of statistics were a bunch of drunk gamblers so I’m used to seeing parallels between games of all kinds and science. In that vein, Andrew Gelman writing about cheating at board games made me think of cheating in clinical trials, only the right parallel there would be cooperative games in the style of Pandemic and the many Arkham/Eldritch Horror games made by Fantasy Flight.

These are tough games — Eldritch Horror in particular — where players and rule keepers are one and the same and all on the same team. And they are easy to beat if the players are willing to fudge a rule here or re-roll the dice there. And to be clear some of the rules are punishing including — from my absolute favorite, Arkham Horror: The Card GameThe Grim Rule:

If players are unable to find the answer to a rules or timing conflict in this Rules Reference, resolve the conflict in the manner that the players perceive as the worst possible at that moment with regards to winning the scenario, and continue with the game.

The Bonferroni correction for multiplicity could be a form of TGR, as could sensitivity analyses with particularly harsh assumptions. Note that TGR is a shortcut intended to enhance enjoyment of the game. Sure, your investigators may be devoured by Cthulhu or all go insane, but even that is more fun than 30 minutes spent looking up and cross-checking arcane footnotes in thick rule books, or worse yet trawling through Reddit for rule tips. Note also that clinical trials and science are not there (only) for fun and enjoyment and that applying TGR has consequences more serious than having to restart a game.

Which is to say, it pays off to be more thoughtful about your analyses and assumptions when designing a study and think — am I making this assumption and doing things this way because I don’t want to fool myself, or because the alternative would take too much time? The same goes for assumptions that are too lenient, but of course you already know that.


Journals should formalize AI "peer" review as soon as possible — they are getting them anyway

Two days ago I may have done some venting about peer review. Today I want to provide a solution: uber-peer review, by LLM.

The process is simple: as soon as the editor receives a manuscript and after the usual process determines it should be sent out for review, they upload it to ChatGPT (model GPT-4o, alas, since o1 doesn’t take uploads) and write the following prompt(s):

This is a manuscript submitted to the journal ABC. Our Scope is XYZ and our impact factor is x. We publish y% of submissions. Please write a review of the manuscript as (choose one of the three options below):

  1. A neutral reviewer who is an expert in the topics covered by the article and will provide a fair and balanced review.
  2. A reviewer from a competing group who will focus and over-emphasize every fault of the work and minimize the positive aspects of the paper.
  3. A reviewer who is enthusiastic about the paper and will over-emphasize the work’s impact while neglecting to mention its shortcomings.

(the following applies to all three) The review should start with an overview of the paper, its potential impact to the field, and the overall quality (low, average or high-quality) of the idea, methodology, and the writing itself. It should follow with an itemized list of Major and Minor comments that the author(s) can respond to. All the comments should be grounded in the submitted work.

What comes out with prompt number 1 will be better than 80% of peer review performed by humans, and the cases number 2 and 3 are informative on it’s own. If the fawning review isn’t all that fawning, well that’s helpful information regardless. A biased result can still be useful if you know the bias! Will any of it be better than the best possible human review? Absolutely not, but how many experts give their 100% for a fair review — if such a thing is even possible — and after how much poking and prodding from an editor, even for high impact factor journals?

And how many peer reviewers are already uploading their manuscripts to ChatGPT anyway, then submitting them under their own name with more or less editing? What model are they using? What prompt? Wouldn’t editors want to be in control there?

Let’s formalize this now, because you can be sure as hell that it is already happening.


Much has been written and said about the faults of peer review but one thing I think hasn’t been emphasized enough so I’ll state it here: journal editors need to grow a spine. And they need to grow it in two ways, first by not sending obviously flawed studies out for peer review no matter where they come from, then by saying no to reviewers' unreasonable demands, not taking their comments at face value, and sometimes just not waiting 6+ months for a review to come back before making a decision.


An article from Matt Maldre about skipping to the popular parts of a YouTube video caught my eye:

Take this two-hour animation of a candy corn ablaze in a fireplace. This cute video is a simple loop that goes over and over. Certainly, in two hours, there’s got to be sort of Easter egg that happens, right? Maybe Santa comes down the chimney.

Roll over the Engagement Graph, and you’ll see some spikes.

I checked out the spikes. Nothing different happens. It’s the same loop. It’s just people clicking the same spikes that other people did because other people clicked it.

Because humans are humans and nature is nature. Now how many fields of science are made of people analyzing, explaining, narrating and writing millions upon millions of words about an equivalent of these spikes? Microbiome for sure. Much of genetics as currently practiced. Anything that relies on principle component analysis. What else?


Hippopotomonstrosesquippedaliophobia: a made-up word for a made-up condition but I'm OK with that

Thagomizer, “the distinctive arrangement of four spikes on the tails of stegosaurian dinosaurs”, is a word that made the most unusual jump from a cartoon panel into scientific texts. I recently learned of another word that is making a similar jump: hippopotomonstrosesquippedaliophobia, a faux Latin word to describe a made-up condition — fear of long words.

The wikipedia article shows to references, one reviewed by a Doctor of Psychology and a cursory internet search shows one more, “medically reviewed by an MD”. How these people approved these articles to come out and validate hippopoto… as a medical condition is beyond me. I heard about the word from my 2nd-grader who in turn heard it from her science teacher (who, I assume, gets her own scientific information on TikTok), so the damage is real. This phobia doesn’t exist, people, and if you do get symptoms listed here upon exposure to a long word, well, here is another word for you.

But here is the twist: the likely origin of the word as noted on a BBC website is this poem of the same name by one Aimee Nezhukumantathil (sic!) and you should click and read the whole thing but this is how it starts:

On the first day of classes, I secretly beg

my students Don’t be afraid of me. I know

my last name on your semester schedule

is chopped off or probably misspelled—

or both. I can’t help it. I know the panic

of too many consonants rubbed up

against each other, no room for vowels

to fan some air into the room of a box

marked Instructor…

I empathize. This should be a real word! But unlike the thagomizer which was a real part of actual dinosaurs there is no medical condition equivalent to “fear of long words”. So let’s please find a better definition for it.


Maybe science isn't completely a strong-link problem after all

I’m familiar with Adam Mastroianni’s thesis that science is a strong-link problem — here is at least one mention on this blog — and I am certainly familiar with the recently uncovered shenanigans in Alzheimer’s disease research, but I never thought to connect the dots in the latter refuting the former. Well, paleontologist Matt Wedel did just that:

I’m going to wane philosophical for a minute. In general I’m very sympathetic to Adam Mastroianni’s line “don’t worry about the flood of crap that will result if we let everyone publish, publishing is already a flood of crap, but science is a strong-link problem so the good stuff rises to the top”. I certainly don’t think we need stronger pre-publication review or any more barrier guardians (although I have reluctantly concluded that having some is useful). But when fraudulent stuff like this does in fact rise to the top in what seems to be a strong-link network — lots of NIH-funded labs, papers in top journals (or, apparently, “top” journals) — then I despair a bit. Science has gotten so specialized that almost anyone could invent facts or data within their subfield that might pass muster even with close colleagues (even if those colleagues aren’t on the take, he said cynically — there is a mind-boggling amount of money floating around in the drug-development world).

There is indeed. Lots more at the link, mostly about paleobiology, ending with these wise words:

So if you want to do good work — in this metaphor, to be at the top where the good science floats (eventually, alongside a seasoning of not-yet-unmasked bad science) — then I think you have to be aware that other cells exist, and occasionally peer into them, if for no other reason than to make sure you don’t accept an idea that’s already been debunked over there. And you need to read broadly and deeply in your own cell — there’s almost certainly valuable stuff you don’t know because the relevant works are stuck to the bottom of the pot. Go knock ’em loose.


Additional notes from the future

I was peripherally aware that large language models have crossed a chasm in the last year or so, but I haven’t realized how large of a jump it was until I compared ChatGPT’s answer to my standard question: “How many lymphocytes are there in the human body?”.

Back in February of last year it took some effort to produce an over-inflated estimate. Today, I was served a well-reasoned and beautifuly formatted response after a single prompt. Sure, I have gotten better at writing prompts but the difference there is marginal. Not so marginal is the leap in usefulness and trustworthiness of the model, which went from being an overeager high school freshman to an all-star college senior.

And that is just the reasoning. Creating quick Word documents with tables and columns just the way I want them has become routine, even when/especially if I want to recreate a document from a badly scanned printout. My office document formatting skills are getting rusty and I couldn’t be happier for it.

In his Kefahuchi Tractt trilogy, M. John Harrison conjures up alien algorithms floating around the human environment, mostly helpful, sometimes not, motives unknown. Back in the early 2000s when the first novel came out I was wondering what on earth he was talking about but for better or worse we are now headed towards that world. Whether we are inching or hurling, that depends entirely on your point of view.

(↬Tyler Cowen)


FT — Valencia floods: the scandal of a disaster foretold

For some Sunday pre-holiday week reading, here is a detailed analysis of what went wrong in Valencia from the Financial Times that shows both the human and technical side of the flooding there earlier this year. It is excellent throughout, and really got my blood boiling near the end with this series of paragraphs:

Cutting the risk of flash floods is not impossible. After the 1957 disaster, generalísimo Francisco Franco oversaw a vast engineering project to reroute the Turia river away from Valencia’s city centre. It is the reason why the capital was largely unscathed on October 29. But dictators do not have to consult stakeholders and such poured-concrete solutions are out of fashion today.

Still, Spain has not lacked modern proposals to stop the Poyo ravine flooding. But its slow-moving state has failed to implement them. The Júcar river basin authority put forward a risk reduction plan in 1994. Three of its four parts were blocked on environmental grounds, so it only stabilised the walls of the ravine from Paiporta to the coast — a job finished in 2005.

By then the basin authority had commissioned work on an alternative plan, which was authorised by the central government in 2009. It involved restoring forests to improve soil water absorption and building a “safety” channel to siphon water from the ravine to Franco’s rerouted river.

By the time it won environmental approval in 2011, Spain was heading into austerity. A new conservative government then shelved the plan. When the socialists returned to power in 2018, the environmental approval had expired. Pedro Sánchez’s government concluded a new plan was needed, but cost-benefit studies and new environmental demands at regional level threw up fresh obstacles. On the ground, nothing was done.

Valencia is a beautiful city as I saw for myself not long ago, and big part of it was the dry river bed-turned-park going straight down the center, orange groves and all. To think that what enabled it was a fascist dictator’s big project, when he probably didn’t care an iota about the park. And the people who care about the parks are clearly not capable of doing these large-scale projects. It’s the yin and yang of humanity.


A one-two punch on clinical trials from Ruxandra Teslo and Willy Chertman today: first their on-point agenda for clinical trial abundance as a guest post in Slow Boring, then Ruxandra’s longer essay which has been so thoroughly research that even yours truly gets a name-check. As I noted elsewhere, every US institution has made one bade tradeoff after another in how it conducts clinical trials to the point that it’s impossible to conduct a RECOVERY trial equivalent over here. That needs to change.