Posts in: science

Also on that other site: Grok 3. I wish I could say it was pure hype and bragging from a man-child in need of attention. Alas it’s the real deal: it gave me less BS, and faster, to my now standard set of querries. Brave new world, etc.


The “Real World Risk Institute” — RWRI — is Nassim Taleb’s answer to the question of what his Incerto would look like if it were a course. The twentieth workshop starts on July 7 and lasts for 2 weeks. This is what I wrote on that other site in response to the announcement:

Strongly endorse. Took the first one in July 2020 and if it weren’t for it I’d still be a federal employee on a visa. It’s not the knowledge you get (you have the Inecrto (sic!) for that), it’s the thinking

And I meant every misspelled word. Go if you have time: scholarships are available, math is not required.


The headline: “Cheap blood test detects pancreatic cancer before it spreads”.

The reality:

The nanosensor correctly identified healthy individuals 98% of the time, and identified people with pancreatic cancer with 73% accuracy. It always distinguished between individuals with cancer and those with other pancreatic diseases.

The 98% number means that two out of 100 healthy people who take the test would have a false positive result. It also misses more than 20 out of 100 people with cancer, giving them a false sense of security. If used in a mostly healthy population — a reasonable assumption to make for a screening test — a positive result would more likely than not be a false positive, and yet you would still miss plenty of actual cancers.

These are abysmal assay characteristics and the test should never see clinic, but you would never know it from the headline. (↬Tyler Cowen)


Deep Research is the real deal, big changes ahead

One query in, I am convinced of the value of Deep Research and think it is well worth the $200 per month. The sources are real, the narrative less fluffy, the points it makes cogent. The narrative review is not dead yet, but it is on its way out. Here I am thinking about those reviews that are made to pad junior researchers CVs while they are introducing themselves to a field, neutral in tone and seemingly comprehensive in scope. There will always be a place for an opinionated perspective from a leader in the field.

In a year, the AI algorithms went from an overeager undergrad to a competent graduate student in every field of science, natural or social. Would o3 make this into a post-doc able and willing to go down every and any rabbit hole? Even now two hundred dollars per month is a bargain — if the price stays the same with next generation models it will be a steal.

The one snag is that it is all centralized, and yes the not so open OpenAI sees all your questions and knows what you want. For now. Local processing is a few years behind, so what is preventing nVidia or Apple or whomever from putting all its efforts into catching up? How much would you pay for your own server that would give its in-depth reports more slowly — say 30 minutes instead of 5 — but be completely private? And without needing benefits, travel and lodging to conferences or any of the messy HR stuff.

The brave new world is galloping ahead.

(↬Tyler Cowen)


A note on IQ

It has been more than six years ago now that Nassim Taleb rightfully called IQ a pseudoscientific swindle. Yet this zombie idea keeps coming back, most recently as a meandering essay by one Crémieux who, through a series of scatter plots and other data visualizations, attempts to persuade that “National IQs” are a valid concept and that yes, they are much lower in South Asia and Subsaharan Africa than the rest of the world.

This hogwash prompted another series of exchanges on IQ ending, for now, with this X post that recapped some points from Taleb’s original essay for a lay audience. That alone is worth reposting, but what I thought was even more interesting was one of the replies:

But I still prefer my doctor or pilot or professor to have an iq of over 120 (at least). I am sure it matters. Not as the only characteristic, but still.

While missing the point so completely that it wasn’t worth replying to, the post is a good example of another IQ-neutral human trait, to hypothesize on properties in isolation without considering nth-order effects. Let’s say your surgeon’s IQ is 160. What are the implications for their specialty of choice, fees, where they work, and bedside manner? Are they more or less of a risk-taker because of this? Does their intellectual savvy transfer more to their own bottom line, picking high-reimbursement procedures over a more conservative approach? Even if you said “all else being equal I’d prefer someone with a higher IQ”, well, why would you if everything else was equal? In that case would it not even make more sense to pick someone who did not have the benefit of acing multiple choice questions based on pure reasoning rather than knowledge? And yes, Taleb wrote about that as well.

Another set of replies was on the theme of “well I don’t think we could even have a test that measures IQ”, showing that they don’t know what IQ is — it is the thing measured by an IQ test. There is some serious confusion in terms here and X is the worst place to have a discussion about it, everyone shouting over each other.

Finally, since I agree with Taleb that IQ as used now is a bullshit concept, people may surmise as they did for him that I took the test and that I am now, disappointed in the result, trying to discredit it. I do think it’s BS for personal reasons, but of a different kind: some 25 years ago as a high school freshman in Serbia I took the test and was accepted to mensa. Having attended a single, tedious meeting in Belgrade shortly afterward I saw that the whole thing was indeed laughable and haven’t thought about it again until reading that 2019 essay.

Having a high IQ means you are good at taking tests, and correlates with success in life as much as your life is geared towards test-taking. There is nothing else “there” there and good test-takers unhappy with their lives should focus on other of life’s many questions, like how to execute a proper deadlift and whether home-made fresh pasta is better than the dried variant.


The Grim Rules of science

The founders of statistics were a bunch of drunk gamblers so I’m used to seeing parallels between games of all kinds and science. In that vein, Andrew Gelman writing about cheating at board games made me think of cheating in clinical trials, only the right parallel there would be cooperative games in the style of Pandemic and the many Arkham/Eldritch Horror games made by Fantasy Flight.

These are tough games — Eldritch Horror in particular — where players and rule keepers are one and the same and all on the same team. And they are easy to beat if the players are willing to fudge a rule here or re-roll the dice there. And to be clear some of the rules are punishing including — from my absolute favorite, Arkham Horror: The Card GameThe Grim Rule:

If players are unable to find the answer to a rules or timing conflict in this Rules Reference, resolve the conflict in the manner that the players perceive as the worst possible at that moment with regards to winning the scenario, and continue with the game.

The Bonferroni correction for multiplicity could be a form of TGR, as could sensitivity analyses with particularly harsh assumptions. Note that TGR is a shortcut intended to enhance enjoyment of the game. Sure, your investigators may be devoured by Cthulhu or all go insane, but even that is more fun than 30 minutes spent looking up and cross-checking arcane footnotes in thick rule books, or worse yet trawling through Reddit for rule tips. Note also that clinical trials and science are not there (only) for fun and enjoyment and that applying TGR has consequences more serious than having to restart a game.

Which is to say, it pays off to be more thoughtful about your analyses and assumptions when designing a study and think — am I making this assumption and doing things this way because I don’t want to fool myself, or because the alternative would take too much time? The same goes for assumptions that are too lenient, but of course you already know that.


Journals should formalize AI "peer" review as soon as possible — they are getting them anyway

Two days ago I may have done some venting about peer review. Today I want to provide a solution: uber-peer review, by LLM.

The process is simple: as soon as the editor receives a manuscript and after the usual process determines it should be sent out for review, they upload it to ChatGPT (model GPT-4o, alas, since o1 doesn’t take uploads) and write the following prompt(s):

This is a manuscript submitted to the journal ABC. Our Scope is XYZ and our impact factor is x. We publish y% of submissions. Please write a review of the manuscript as (choose one of the three options below):

  1. A neutral reviewer who is an expert in the topics covered by the article and will provide a fair and balanced review.
  2. A reviewer from a competing group who will focus and over-emphasize every fault of the work and minimize the positive aspects of the paper.
  3. A reviewer who is enthusiastic about the paper and will over-emphasize the work’s impact while neglecting to mention its shortcomings.

(the following applies to all three) The review should start with an overview of the paper, its potential impact to the field, and the overall quality (low, average or high-quality) of the idea, methodology, and the writing itself. It should follow with an itemized list of Major and Minor comments that the author(s) can respond to. All the comments should be grounded in the submitted work.

What comes out with prompt number 1 will be better than 80% of peer review performed by humans, and the cases number 2 and 3 are informative on it’s own. If the fawning review isn’t all that fawning, well that’s helpful information regardless. A biased result can still be useful if you know the bias! Will any of it be better than the best possible human review? Absolutely not, but how many experts give their 100% for a fair review — if such a thing is even possible — and after how much poking and prodding from an editor, even for high impact factor journals?

And how many peer reviewers are already uploading their manuscripts to ChatGPT anyway, then submitting them under their own name with more or less editing? What model are they using? What prompt? Wouldn’t editors want to be in control there?

Let’s formalize this now, because you can be sure as hell that it is already happening.


Much has been written and said about the faults of peer review but one thing I think hasn’t been emphasized enough so I’ll state it here: journal editors need to grow a spine. And they need to grow it in two ways, first by not sending obviously flawed studies out for peer review no matter where they come from, then by saying no to reviewers' unreasonable demands, not taking their comments at face value, and sometimes just not waiting 6+ months for a review to come back before making a decision.


An article from Matt Maldre about skipping to the popular parts of a YouTube video caught my eye:

Take this two-hour animation of a candy corn ablaze in a fireplace. This cute video is a simple loop that goes over and over. Certainly, in two hours, there’s got to be sort of Easter egg that happens, right? Maybe Santa comes down the chimney.

Roll over the Engagement Graph, and you’ll see some spikes.

I checked out the spikes. Nothing different happens. It’s the same loop. It’s just people clicking the same spikes that other people did because other people clicked it.

Because humans are humans and nature is nature. Now how many fields of science are made of people analyzing, explaining, narrating and writing millions upon millions of words about an equivalent of these spikes? Microbiome for sure. Much of genetics as currently practiced. Anything that relies on principle component analysis. What else?


Hippopotomonstrosesquippedaliophobia: a made-up word for a made-up condition but I'm OK with that

Thagomizer, “the distinctive arrangement of four spikes on the tails of stegosaurian dinosaurs”, is a word that made the most unusual jump from a cartoon panel into scientific texts. I recently learned of another word that is making a similar jump: hippopotomonstrosesquippedaliophobia, a faux Latin word to describe a made-up condition — fear of long words.

The wikipedia article shows to references, one reviewed by a Doctor of Psychology and a cursory internet search shows one more, “medically reviewed by an MD”. How these people approved these articles to come out and validate hippopoto… as a medical condition is beyond me. I heard about the word from my 2nd-grader who in turn heard it from her science teacher (who, I assume, gets her own scientific information on TikTok), so the damage is real. This phobia doesn’t exist, people, and if you do get symptoms listed here upon exposure to a long word, well, here is another word for you.

But here is the twist: the likely origin of the word as noted on a BBC website is this poem of the same name by one Aimee Nezhukumantathil (sic!) and you should click and read the whole thing but this is how it starts:

On the first day of classes, I secretly beg

my students Don’t be afraid of me. I know

my last name on your semester schedule

is chopped off or probably misspelled—

or both. I can’t help it. I know the panic

of too many consonants rubbed up

against each other, no room for vowels

to fan some air into the room of a box

marked Instructor…

I empathize. This should be a real word! But unlike the thagomizer which was a real part of actual dinosaurs there is no medical condition equivalent to “fear of long words”. So let’s please find a better definition for it.