If it looks like a press release and reads like a press release, why is it being sold as a government report?
Doc in a Box from Alex Tabarrok links to an official state government document, from the Utah Department of Commerce. The document is titled “Key Statistics on the Doctronic Pilot Program” but reads more like a bulleted press release, full of percentages without a denominator, begging for a flow chart. Press releases are like that because you typically won’t add images — although this one randomly selected from today does indeed include it along with the full abstract submitted to the ASCO annual meeting, and good for them — but more importantly because you want to pick the best possible picture-perfect view of your shiny spotless data elephant without also acknowledging that it has a rear end, a bunch of flies buzzing around, smells a bit rank. Does your elephant not have an ass, Utah? Or did you just copy/paste what Doctronic — a startup whose wonky web page doesn’t even work — sent you?
Doctronic.
We have hit a technical snag. Go to Homepage to hit it again.
So how many patients could they have evaluated? This article in JAMA Forum says that “[p]hysicians hired by Doctronic will review the AI’s output for the first 250 patients before the system takes any action and will review the next 1000 patients retrospectively after the AI agent begins acting autonomously.” Are the key statistics from the first 250? The very first bullet point in the press release summary document says that the program is still in Phase One and that “the number of patients so far is limited”, so I guess not. Is it 100 at least? Surely they wouldn’t use a percentage as high as 97 if there were fewer than that involved. Except that as low as 30 will give you a percent roundable to 97. So, 30 to 249?
Why am I being so pedantic? Well, these techniques are par for the course in biotech world but coming from a state agency make me think there is a bit too much enthusiasm for it, coming from a government source. Compare and contrast to the shellacking LLMs got in this report from the Office of the Auditor General of Ontario, which reviewed AI Scribe functionality from 20 vendors. Their report even has absolute numbers in it! These state government officials should realize that they are prime targets for flim flam merchants and should behave accordingly.
Note that I am not against the idea in general. The project’s goal is in fact quite noble: there is no reason why plain ol’ machine learning shouldn’t be able to suss out majority of refill requests for chronic medications and flag patients who haven’t had their bloodwork or diabetic foot assessments done, or who’ve had abnormal office blood pressure readings at prior visits. Having that easy refill option available would mean a patient coming in for an in-person visit for what should be “only” prescription refills is even more of a signal that something else may be amiss, even if the patient can’t or won’t verbalize it. So yes, LLM refills, bring ’em on. Doctronic’s end-goal of actual autonomous Shoggoths putting on white coats and replacing MDs, PAs, NPs and other credentialed humans… not so much.
Wednesday links, science and medical
- Dynomight: Is “colorectal cancer” rising in “young people”? It is, and not just colorectal cancer, and not all of it is from overscreening. The anonymous author lists many potential causes; I would group all likely suspects under “new substances”, whether in the air we breathe, the food we eat, or in every friggin’ thing we touch. The price we pay for a modern life, eh?
- Vinay Prasad: Are “healthy people” an endangered species? Speaking of overscreening, Prasad makes a good point about how accessible all these different kinds of blood tests are that will tell you that something is wrong even though you may be feeling just fine. Sometimes this is warranted — a yearly blood count picks up a small decrease in hemoglobin that is the result of intestinal blood loss from an ulcer or (see above) colon cancer — but more often this is a form of medical divination. To repeat myself from 3 years ago, I am not a fan of the “wellness” visit.
- Niko McCarty and Noah Olsman: What’s the Point of Theory in Biology? An interesting exchange between McCarty, who has a degree in bioengineering but has been more focused on the journalism and policy side of science, and Olsman, who as a practicing postdoc with an interest in theoretical biology does well to explain its limits.
- Christos Lynteris for Nautilus: Poop Cruises Are No Laughing Matter. Oh but they are, even without the poop — has Lynteris never heard of a supposedly fun thing DFW will never do again? More seriously, the observation that people pay more attention when things happen to famous people is apt: remember Tom Hanks getting covid? More tangentially, Greg Wilson recently wrote about the upper/lower deck divide as it pertains to the general overhead of living.
- Brett & Kate McKay: The Cheapest, Easiest, Most Ridiculously Effective Way to Eradicate Mosquitoes From Your Property. I wish I knew about mosquito dunks and was able to test them while we still had a back yard which, as any back yard in DC, was nigh unusable in the summer owing to constant bloodsucking.
Tuesday links bonanza
- Kevin Kelly: Your Most Improbable Life.
Your life’s goal should be to become the most improbable person you can be. Your path, your character, your life, should be the most unlikely, the most unexpected, the least predictable version you can make. Improbable lives have fewer competitors, more unique rewards, and are harder to replace with AIs, since AIs run on the predictable. This is true whether you favor traditional humanist directions or work on a frontier.
This is a nice preamble to a bit of personal news I can finally share: I will soon be going back [Note: It is a qualified “back”, as I have never actually practiced medicine full time, being either in training, doing clinical research as my main job, or being out of clinic altogether save for a few hours a week doing charity work. ] to the practice of clinical medicine. This week is in fact the last in my current position, which had been a magnificent experience but was going, as the careful reader of this blog would have already noted, in a direction not entirely suited to my preferred lifestyle and more importantly — let’s not sugarcoat it — values and beliefs. Onwards and upwards!
- Derek Robertson: 5 questions for Meredith Whittaker. [Note: ᔥVoline on Mastodon ]
Whittaker, who is the president of the Signal Foundation (as in the app), had this to say about venture capital back in 2023:
Venture capital looks at valuations and growth, not necessarily at profit or revenue. So you don’t actually have to invest in technology that works, or that even makes a profit, you simply have to have a narrative that is compelling enough to float those valuations. So you see this repetitive and exhausting hype cycle as a feature in this industry. A couple of years ago, you would have been asking me about the metaverse, then last year, you would have asked me about Web3 and crypto, and for each of these inflection points there’s an Andreessen Horowitz manifesto.
It’s not simply that one piece of technology is overhyped, it’s that hype is a necessary ingredient of the current business ecosystem of the tech industry. We should examine how often the financial incentive for hype is rewarded without any real social returns, without any meaningful progress in technology, without these tools and services and worlds ever actually manifesting. That’s key to understanding the growing chasm between the narrative of techno-optimists and the reality of our tech-encumbered world.
Emphasis is mine, as it could be transposed word-for-word into the current world of drug development. Consider it a more polite rewording of prof. Taleb’s take.
- Venkatesh Rao: Commodity Intelligence.
Commodified knowledge is “general knowledge” in the sense tested by trivia/quiz contests. In grade school, we actually had a subject on the curriculum called “GK” and kids good at it (I was one of them) got put on quiz teams to represent their class or school. General intelligence of the sort we actually have today is simply AIs trained on general (ie commodified) knowledge.
But the theological motte-and-bailey move that conflates it with some totalizing-universal divine-omniscience idea of “Artificial General Intelligence” traps a great many of even the smartest people. A category error motivated by theological yearnings, validated by second-order Labatutian psychoses, sustained by epistemic bubbles, and encouraged by sketchy business roadmaps that need a story to justify trillion-dollar investments.
This is a charitable way of justifying the AI billionaire panhandlers’ selling of large language models as AGI, even putting the term in official titles. Less charitably, they all know what Yann LeCun has been saying for years: LLMs will never reach human level of intelligence (“ChatGPT, make me a sandwich”). Whether LeCun’s own pursuits are wise is a different matter.
Separately, Rao gives some good book tips and Benjamin Labatut’s When We Cease to Understand the World is now on the Pile.
- Andrew Gelman: Don’t cite sources you haven’t read, and don’t trust when people claim to be reporting something from the literature.
No quotes because, true to form, everything salient is already in the title. Natural continuation of the debate started last week (see the last link), although apparently written before the new arXiv policy for a 1-year ban for hallucinated references.
- Kieran Healy: Zero Sum Problems.
Healy wrote a book about data visualization so I feel somewhat foolish in writing this, but I do not find Apple Sports’ presentation least bit confusing: the numbers are absolute, the bars show percentage of the total. If the goal is to have more of each (assists, rebounds, steals, etc.) the bigger bar shows the opposing team’s dominance. It’s fine. Healy’s proposed solutions are all notably uglier and demote low-occurrence events like blocks and steals even though they may be crucial in a game. Shows how little both Healy and Gruber — on whose post Healy riffs — know about the game of basketball.
- Sarah Kliff, Margot Sanger-Katz, Erin Schaff and Asmaa Elkeurti for The NYT: Short Naps, Long Hours: How Autism Clinics Squeeze Medicaid Dollars Out of Preschoolers.
At Compleat Kidz, a fast-growing chain of autism clinics based in North Carolina, the policy is firm: Naps cannot be longer than seven minutes before children are awakened to resume therapy. The company says this is necessary to prevent fraud since clinics can be paid only when children are awake and getting services. But it also allows the clinic to bill insurers or Medicaid for more hours.
Yes, you have read that correctly. Waking up a child after a 7-minute nap to perform “therapy” — as if anything meaningful can be accomplished in that hypnagogic state — is both cruel and unusual. But not a punishment! It is merely a way to avoid fraud while optimizing revenue under the watchful eye of private equity:
Private equity firms have acquired at least 500 clinics over the past decade. “There’s just huge opportunities to grow these businesses and help increase access to care,” said Jon Krieger, a managing partner at Calex, a financial firm that assists with autism clinic mergers and acquisitions. He estimates the market could grow to $90 billion.
Mr. Market is a bad doctor, an even worse vet and, it seems, a most diabolical nanny.
The departure of Marty Makary is looking more and more like a Murder on the Orient Express situation: everyone wanted him out. Well, everyone except for uniQure, Capricor and ImmunityBio who were named in the original version of that Endpoints News story as some of the companies lobbying for Makary’s ouster, then asked for their mentions to be removed, as the Editor’s note now helpfully clarifies. C’mon, people. Own it.
First they came for the programmers… Then they came for the doctors. But not really.
Back in September 2023 I noted that the biggest hurdle for AI completely replacing physicians is the physicality of the job. Sure, LLMs are good at giving differential diagnoses and faking empathy once somebody’s problem has been reduced to text, but the art of medicine is in the act of seeing, feeling, smelling, etc. [Note: Although increasingly less so, as doctors and trainees are becoming experts at treating patients in the chart and not those in front of them, making themselves the perfect foils for replacement; queue photo of the old man yelling at clouds. ] If clankers have any hope of replacing humans, they’d better get some senses.
At first glance, a recent Nature Medicine paper aimed to do just that by introducing what the group of authors — all of them Google employees based in the UK and California — call “multimodal reasoning” but is in fact the chatbot being able to interpret images, ECGs and lab reports in addition to the pre-digested clinical pearl. The topline result, one that the journal itself felt obligated to headline, was that “AI had superior performance compared with physicians for almost every metric (29 of 32 axes)”. But at what?
You would think that the question would have been easy to answer, this being a peer-reviewed paper and all, but no. In fact, I am still not completely certain what interactions were performed and whether they completely match what was reported. What is certain is that a set of primary care physicians and patient-actors from Canada and India — countries different from the author’s own countries and let’s wonder conspiratorially for why that may be the case — interacted via an instant messaging-like service. This is the first oddity: even remote health visits are performed using video calls, and yes you may occasionally get a text through the EMR or if you are a VIP/boutique physician maybe your phone, but that is far from the norm.
The primary report is on what happened when the patients uploaded the skin photos, ECGs, lab results, etc. and then asked the physician or LLM on the other end questions about it. Pretty standard fare for a human-to-LLM interaction, but not exactly natural for a doctor-patient relationship which usually starts with questions being asked of the patient. This is the second way in which the setup was made to fit the computer and not the human.
But then the last section of the paper is about what happens when there is, in fact, a back-and-forth by the way of taking a history. The extended figures — “extended” here meaning not worthy enough of being included in the main paper — say it improves the performance of the LLM. They do not say how it affected the human performance, or how the patient-actors rated humans versus LLMs in history-taking. I would call that strike three.
To the journal’s credit, they did not allow Google to get away with it completely. “To evaluate the performance of our finalized system, we conducted a randomized, blinded human evaluation that emulates an objective structured clinical examination”, says the final paragraph of the introduction, only to end with:
We note, however, that our study is not a randomized clinical trial with prespecified endpoints and preregistered statistical analysis. Rather, it is an exploratory study investigating the properties of multimodal diagnostic dialogue.
Peer review is at least good for something, even if it does result in self-contradiction.
Meanwhile, in the world without motivating reasoning, more objective assessments of the usefulness of AI in medicine show that it is in fact still quite bad. This does not prevent the massively funded hordes of AI researchers from flooding the field with sloppy work, creating the impression that the rise of the machines is imminent. Comply or relegate yourself to the permanent underclass, serf MD. But of course, relegation will only be possible to the extent doctors — or any other profession, really — has already debased itself and abandoned its core professional principles in the service of electronic ease.
The altruist bait-and-switch
After dissecting the minutiae from the ongoing battle of the bozos [Note: To save you a click: it is about the Musk-Altman trial. ] , Andrew Sharp’s weekly column ends with this paragraph:
The reality is knottier. Had the OpenAI founders not launched with a nonprofit structure in 2015, they probably never recruit the talent required to compete with Google. And had they done anything else other than exactly what they did in 2018 and 2019, all of computing would be less interesting today, and the company probably wouldn’t exist eight years later. Musk’s trial has been clarifying on that point, at least for me.
The AI side of technology is one of those rare occasions where biotech may indeed be like tech: people with knowledge, skills and ambition to make the early steps towards creating something new generally don’t do it for the money. Accolades, titles, a few more increments on their h-indices sure, but unless they are seriously delusional a lab postdoc coming in on a weekend to split the cell culture generally has no hope of getting into the top percentile in income. Up until a few years ago AI research was much like that, until it wasn’t.
Sharp writes that OpenAI had to flip the switch if it were to survive in these shark Google-infested waters once they smelled blood profit an opportunity to tell a new story to investors. Same can be said about any biotech: become successful enough, and there will come a time when the academic founders are asked to step away and let someone with different motivations run the show, lest they be lost in a sea of copycats, smoke-peddlers and competitive intelligence officers. The whole business has just become too expensive for some Jonas Salk-wannabe to dabble in.
A person of bad intent may propose that the adults coming to run the show once it becomes too expensive are the ones making it expensive in the first place to justify their existence, contributing the health care cost ouroboros on the way. But that is of course nonsense. The proof is in the pudding, what with famously efficient drug development pipelines, low health care costs and improving lifespans.
So let’s do what a genuine financial scion once proposed: invert. Instead of asking ourselves how to make drug development more efficient and cost-effective, let’s see how we could make it more expensive. Number one thing to do would making it all about the money: let’s portray people who don’t capitalize on their inventions as losers not heroes, make Nobel Prize winners notable only if they are billionaires (who won the Nobel Prize in Physiology or Medicine last year, again?), measure success of drugs in dollars earned not lives improved, extended or saved, have everyone skim a percent or five of the money swishing around in the ecosystem as their primary source of income without any penalty for ultimate failure [Note: For more on this, do read Nassim Taleb’s Skin in the Game, which is about much more than the titular phrase which has become — much like his The Black Swan — a phrase people throw around without having any idea of the underlying concepts. ] guaranteeing that they will have every incentive possible to grow the pie, and I think you see where this is going because the system functions as designed so why should you complain? After all, there is no alternative.
Except that, of course, there is. It would be a big lift, to remove incentives of skimmers to inflate the balloon, stop various influencer platforms from inducing FOMO in everyone and anyone, recalibrate the median science journalist’s value system from Mr. Market to something more reality-based. Big, but not impossible, provided there is a will.
Therein lies the problem: that kind of thinking is somewhat at odds with the shared American culture, at least as recently described by Chris Arnade, that “you can live how you want, eat what you want, live (up to a point) how you want at a thin level, as long as you ultimately believe in making big money through hard work and playing by the rules.” Determining if the other two legs of the three-legged money/work/rules American stool are performing as intended I will leave as an exercise for the reader.
Wednesday links, with many uncertainties
- Kristen French for Nautilus: New Fathers Are Dying, and We Don’t Know Why.
Oh but we do, at least superficially: “of 130,000 men who became new fathers between 2017 and 2022, almost 800 died during that same 5-year period, and 60 percent of those deaths were from potentially preventable causes like homicide, accidental injury, and suicide” which is about what you would expect for a group of men that skews younger. The authors of the paper make a comparison between fathers who died and those that survived but a more interesting one would have been a demographically matched of childless men. Alas, all we have is all the men in Georgia and lo, for each age range the new fathers have a lower mortality and the discussion appropriately leads with “Fatherhood appeared to be associated with reduced mortality.“ [Note: Another reason to have more children. Though, if you are going to do it solely because of a misguided belief that you yourself would live longer, then perhaps don’t? ] Methinks French — or her headline writer — were fooled by randomness.
- Derek Lowe: What Success Can Look Like, Darn It.
Vepdegestrant for breast cancer seems to be another entry in the annals of approved drugs being considered failures by Mr. Market. Let it be noted that a chemist (Lowe) writing for a prestigious peer-reviewed journal (Science) dunks on a drug while citing millions and billions of dollars exchanged or promised to various stakeholders while barely mentioning, and wrongly at that, the actual trial results. “It did not really demonstrate any advantage versus the comparison in the trial, fulvestrant” is factually incorrect: median progression free survival was 5 versus 2.1 months, which, fine, is tiny and may have been the result of statistical shenanigans; but it may also be a true and meaningful incremental improvement and if we are going to dismiss it out of hand then what are we even doing here? The rot runs deep.
- Deena Mousa: We don’t know why Malawi is poor.
It is a genuine mystery of why a mostly agrarian functional democracy with no separatist movements, demographic catastrophes, curses of resource wealth and the other usual suspects of stalled growth should completely flatline their GDP. Mousa shows compelling data and many hypotheses, though I wonder whether there is something that isn’t and can’t be measured which is keeping the country where it is. And if you are thinking that oh, GDP can’t measure happiness, I bet that at least they are happy, think again: it was the 4th least happy country last year. But then the “Happiness Report” methodology takes GDP into account (!?) so it is almost impossible for a GDP-poor country to break through in the rankings.
- Dynomight: What’s with all the slide decks?
This is about slides shared via email, never meant to be presented, but rather serving as a landscape-oriented picture book for adults. I don’t know what is behind communication-by-slide, and as a seminar-attending Tufte acolyte I abhor it. Management consultants spreading them around like a viral respiratory disease — which is the thesis of the blog post — certainly has something to do with it, but the syndrome is now bottom-up as well. My third-grader asked me just this morning why they were forced to watch and make (!?) slides at school.
Medical links, Good, Bad and Ugly
The good: How an ‘Impossible’ Idea Led to a Pancreatic Cancer Breakthrough by Gina Kolata and Rebecca Robbins for The New York Times. The breakthrough discussed is the real deal, and they manage to do it in a measured tone which correctly identifies daraxonrasib as a stepping stone and not a miracle cure. It has this important note up top and not buried down at the end:
The pills, three taken daily, are not a cure — eventually, daraxonrasib stops working. Many patients do not respond. And it has side effects that can be harsh, including rash, diarrhea, fatigue, nausea and raw, split fingertips.
How refreshing — I hope Derek Thompson takes note.
The bad: The Human Body’s Hidden Pathways by Dr. Avraham Z. Cooper, who is a pulmonary/critical care physician at the Ohio State University, for The New York Times Magazine. For the life of me I can not figure out the point of this post-modern journalistic exercise.
Nominally it is about a peer-reviewed research article which came out in 2021 under the title “Evidence for continuity of interstitial spaces across tissue and organ boundaries in humans”. The NYT Magazine staff did not deem it worthy of being linked to, but here it is in its entirety. In it, the authors showed small fragments of tattoo pigment migrating into tissues — skin and colon — deeper than they expected. We are not talking about ink being injected into a bicep and showing up in someone’s rectum here, but rather a series of biopsies of tattooed skin or the lining of the colon where there is a lot of pigment up top, and much less and in smaller pieces down at the bottom of the slide, deeper in the tissue.
Let me pull out my rarely used master’s degree in histology and note that this is hardly surprising. Connections between cells are not exactly air-tight — other than maybe in the brain and the testes — so of course there is some gel-like fluid circulating in the space. Or did the original article’s authors not realize why people tend to rub their feet when they get swollen?
But that is only the introduction. The meat of the article is Dr. Cooper’s theoretizing that this has something to do with — drumroll, please — acupuncture. With no evidence, mind you, but a tingling sensation in the back of his neck or somesuch. By the time the 30th single-sentence screen scrolls by we are firmly in bullshit territory, in the formal sense of the word. Caveat lector.
The ugly: Longevity Medicine - An evidence based guide by Dr. Vinay Prasad who is out of the FDA and back making YouTube videos. And oh my, the contrast between the most recent thumbnail and the one posted just before he joined the FDA is striking. Has it only been a year? No wonder that his first topic back as an influencer is about longevity.
A sidenote here which I will put at the end: the increased interest of Silicon Valley types with longevity, and I am not thinking only about Bryan Johnson’s delusions here, reminds me of the recently quoted speech Charlie Chaplin gave at the end of The Great Dictator, the relevant quote being that “so long as men die, liberty will never perish.” Good for us that snake oil salesmen are still the longevity field’s most prevalent phenotype.
This week in hubris
What possessed me to type x.com into the address bar I can tell you not, but there I was, staring for the first time in weeks at the “For you” tab. And there it was, in all capital letters: “THIS IS HOW WE CURE PANCREATIC CANCER”, staring back.
That was the X-crement of one Derek Thompson, writer for The Atlantic, podcaster, abundance enthusiast. It was promoting his most recent blog post which, being on Substack rather than X, had a more subdued name: “How AI Could Help Cure Pancreatic Cancer”. It is, supposedly, an interview with a co-author of a paper with an ever-less-so boastful name: “Next-generation AI for visually occult pancreatic cancer detection in a low-prevalence setting with longitudinal stability and multi-institutional generalisability”. Most of the interview, however, is behind a paywall which I shall not climb.
Above the fold is Thompson’s exuberant, hyperoptimistic speculation. He approaches the problem from the perspective of the three recent developments — one from above, the other two previously discussed — but presents the areas which they are “solving”, targeting KRAS mutations, pancreatic cancer’s immune evasiveness, difficulties with early detection, as the sole reasons why the disease is so difficult to treat.
But that is disingenuous. There are so many more reasons why it is hard: the uniquely hostile, acidic, high-pressure environment of the tumor that makes drug delivery to it nigh-impossible. It’s propensity to metastasize — spread to distant organs — no matter what size the original tumor is. A biochemical storm it stirs up in the body leading to rapid weight loss, blood clots and horrendous pain which are distinct even among other cancers. Why not highlight those three as the “3 broad reasons why pancreatic cancer is so hard to treat”, to use Thompson’s terminology? Well, no recent high-profile studies for those, are there?
I understand that he has some personal reasons to be interested in pancreatic cancer, and I am sure it is coming from the best of intentions, but please.
Monday links, in concurrence
- Cory Doctorow: The enshittification multiverse, in which Doctorow proposes a general theory of enshittification to match his initial, special theory. I enthusiastically concur.
- Anonymous on the Marginal Revolution comments section: On health care price transparency. The only non-Xified content you can find on Marginal Revolution these days is in the comments, so I am glad that Cowen highlighted this minute dissection of the madness called American medical billing. Needless to say, I concur.
- Reese Richardson: A do-or-die moment for the scientific enterprise. [Note: ᔥAndrew Gelman, who sure loves his mile-long headlines. ] This is the author’s summary of a more detailed paper in the academic journal PNAS which points to a looming catastrophe of LLM-boosted scientific paper mills holding hands with pliant journal editors to decimate the signal-to-noise ratio of the literature. Of course I concur!
- Cory Doctorow, again: Ada Palmer’s “Inventing the Renaissance”. His review after actually reading the whole book, and yep.