Posts in: medicine

OpenEvidence is a technological Trojan horse at the gates of clinical practice

Go to openevidence.com and you will see, right under the elegant logo and a free text box prompting you to ask a medical question, an immodest tag line: “America’s Official Medical Knowledge Platform”. The boast sits above an enviable lineup of official partners: The New England Journal of Medicine, Journal of the American Medical Association, National Comprehensive Cancer Network, Cochrane Systematic Reviews. If you were a clinician in need of information these would be the first places to go, [Note: Save, perhaps, for a few journals in the JAMA network, and I write this as someone who has published in and reviewed for JAMA. ] but now there is no need because OpenEvidence will do it for you, for free and — unlike those poor community doctors whose practices can’t afford an NEJM subscription — with full access to all those journals.

Their About page is even more effusive. “Our mission is to help doctors save lives and improve patient care.” Great! It goes on:

This year, more than 100 million Americans will be treated by a clinician using OpenEvidence. As a product, OpenEvidence is an AI copilot for doctors that helps them make high-stakes decisions at the point of care. OpenEvidence is the most widely used medical AI among verified U.S. clinicians. To date, we have supported over 200 million AI-powered clinical consultations from U.S. doctors and other frontline clinicians.

In a remarkably short period of time, OpenEvidence has become the default operating system of medical knowledge in the United States.

Underneath lies the Team, laden with Harvard and MIT affiliations, and long list of medical advisors ranging from Mayo, Hopkins and Mass General staff to prominent YouTubers.

It was a rather obvious idea, to create a specialized LLM chatbot which restricts its data sources to medical literature only, so when I first saw OpenEvidence, the way it presented itself (partnership with NEJM and JAMA, MIT affiliation) and the price (free for everyone with an NPI) I was pleasantly surprised that these institutions came together for the common good, to create our generation’s PubMed.

Hardy har har.

Scroll further down and under another immodest headline — “Supported by the Best” — sit the logos of Sequoia Capital, Kleiner Perkins, Blackstone, Andreessen Horowitz, Nvidia, Google Ventures and the like. Not listed on the website because there is no “Investor relations” page — that may spook the clinicians! — is the financial history. Earlier this year it raised $250 million in a Series D round at $12 billion valuation. Just three months before that it raised $200 million at $6 billion valuation. In total, it has received close to $700 million in funding over its four years of existence.

Yes, OpenEvidence, “the default operating system of medical knowledge in the United States” (their words, emphasis included), is a tech startup zipping through the first phase of enshittification, i.e. attracting users with a high-quality offering. I would argue that even the “high-quality offering” is a bit of a crock, but we’ll come back to that shortly. Let’s, for the purposes of this paragraph, go with the premise that the unique thing that OE provides is the “artificial intelligence” portion. Well, from what I understand the company relies on OpenAI, Anthropic and others for the actual compute and if that is the case they are one-step removed from the absolute carnage whose genesis Ed Zitron and others have been diligently chronicling. The default operating system of American medicine is an earnings miss away from the blue screen of death.

I won’t cry for the billionaires involved. I will, however, mourn the opportunity cost of so many smart physicians and programmers on their medical and technical teams spending their time on point-one-percenter enrichment instead of truly building our generation’s PubMed. It would not even require compute! The true value of OE is the curated collection and unrestricted access to peer-reviewed journals, treatment guidelines, and systematic reviews, supplements and all. Let me google all that — or better yet, look it up on Kagi — and I will not care at all for the LLM-generated veneer glued onto man-made knowledge. But good luck having NEJM, JAMA et al. open their vaults without the VC-backed carrot of (I suspect) God knows how many millions of dollars for access rights combined with the FOMO stick that Anthropic and OpenAI’s PR teams have been so diligently whittling.

Trigger warning for an LLM-sounding phrase: the mounds of AI slop added to OE search results aren’t just wasteful, they are dangerous. Back in the Triassic era when shmucks like yours truly were nursing their middle-finger calluses writing progress notes by hand you knew that every part of that note contained useful knowledge. With the electronic medical record mandate — thanks, Obama — much of it became an unreadable mix of computer-generated charts and copypasta; you had to look at the end of the note to find actual human thought, whether it is in the Assessment and Plan or the Attending Addendum section. Well, I can report from the front lines that much of the time even that one meager paragraph has become a copy/paste job carrying with it that distinct LLM waft.

I am not against using LLMs for progress notes — we have been using human scribes for decades to write up the facts of the doctor-patient encounter. But those are costly and your rural primary care physician certainly won’t have one, so why not delegate that work to AI? The assessment and plan, however, are where you infuse those facts with meaning and then act on them, which is the entire purpose of the physician’s job. Writing is thinking and millions of US medical professionals have decided to delegate the one job they have to AI while keeping all the moral and legal responsibility, reverse-centauring themselves willingly and with eyes wide open.

This may seem like a “the food is horrible and the portions are too small” joke — have I not just wrote that the whole thing will soon be dead? If you are a physician who values their brain and doesn’t copy off a clanker why should you care if either start relying on them and then get a rug-pull? Three reasons:

  • Expectation-setting: those who copy will need 15 minutes per encounter, then 10, then 5, continuing to ingest slop and regurgitate it over patient notes even as it gets increasingly bad from more and more expensive compute.
  • Asbestos exposure: as in, AI is the asbestos we are shoveling into the walls of our society, only the asbestos here is in the form of regurgitated slop we are putting into patient medical records. That, too, will take our descendants some time to dig out, although human life span being what it is it should be less than a whole generation.
  • Thinking of the kids: some of my own highest yield learning moments were reading the attending addendum on my note, or the dictation of a particularly skilled specialist’s consult note; will the incoming generations of medical students and residents have the same opportunity?

So if your mission truly was to help doctors save lives and you weren’t a greedy son of a bitch would you not have made a non-profit to achieve that goal? It may not have been as slick as something coming out of Silicon Valley, but it would also not have the risk of blowing up if the financial winds turn and the funding flywheel stops spinning. After all, there have been many attempts to replace the government-funded Medline/PubMed combo, but none of them were that much (if at all) better to justify the cost.


Correcting a handful of misconceptions, inaccuracies and falsehoods in "The blood cancer that became solvable" by Ruxandra Teslo and Amol Punjabi

As a fan of Ruxandra Teslo’s writing — 25 mentions to date! — it pains me to write that her recent article in “Works in Progress”, for which she shares the byline with Open Evidence chief product officer Amol Punjabi, had me wince about a half-dozen times too many to ignore. Worse yet, I agree with the thrust of the article: that China is eating America’s lunch in cell and gene therapy and will soon come for the rest of biomedicine. Heck, that is one of the main reasons I am soon going back to clinical medicine, seeing too many business flights to Shanghai and Beijing time zone Zoom meetings in my future [Note: In case you were wondering, the correct number of each for me personally is exactly zero. ] had I continued down the industry path.

Alas, Teslo, Punjabi and whichever LLM did their research had cut too many corners on the way to the largely appropriate destination. Let’s count a few of them.

The old, cheap generic chemotherapy drugs still rock. A combination of two or three chemotherapy drugs developed in the 1970s and 80s is still the gold standard for treating testicular cancer. Chemotherapy has tamed what was once a pancreatic cancer-level death sentence into a diagnosis that doesn’t even have a “stage IV”. Speaking of pancreatic cancer: daraxonrasib, the K-Ras inhibitor which Teslo just a few weeks ago deemed a turning point, [Note: That article was, however, still of much higher quality than the one discussed here. Or maybe I don’t know as much about K-Ras and pancreatic cancer as I do about CAR T-cells and myeloma. Could this be a case of reverse Gell-Mann amnesia on my end? ] doesn’t even come close to what bleomycin, etoposide and cisplatin did for testicular. I guess they don’t make turning points like they used to.

The transformation of oncology started long before mid-2010s. The article paints a simplistic picture of oncology’s history. First there was surgery, followed by, in the 1890s and the discovery of X-rays, radiation therapy. Blunt and unsophisticated chemotherapy which relies purely on the cancer cells’ propensity to divide faster than non-cancer cells came in the 1940s and 1950s. Finally, in the mid-2010s, after we learned more about the molecular biology of cancer [Note: I guess that, if you wanted to show off your academic status, you should use a 10-dollar word like “underpinnings” here instead of the plain, grade school level “biology”, much in the same way you should find and replace every “use” with “utilize”. But that would, of course, make you a 10-dollar ass. ] came “immunotherapies” by which the article largely means CAR T-cell therapies in general and one in particular, ciltacabtagene autoleucel, known to friends as cilta-cel (generic name) or Carvykti (brand name and the one used throughout the essay; this is telling).

Look, I am no fan of Siddharta Mukherjee’s but at least his history of cancer, The Emperor of All Maladies got the sequence right. Rituximab, a monoclonal antibody which some still consider the original immunotherapy — after all, it acts mainly by siccing patients’ own immune cells and complement towards the target lymphoma and leukemia cells — was approved in 1997 after a Phase 1 trial that started in 1994. Trastuzumab, another monoclonal, was approved for Her2-positive breast cancer in 1998. Imatinib, a revolutionary wonder-drug which inspired dozens of me-too small molecule competitors, had its first-in-human study in 1998 and was approved just three years later, in 2001. Each needed just 3 years to get from the very first patient being dosed to FDA approval; remember that factoid it may become relevant in a few paragraphs. These were actual cures for lethal, aggressive cancers. But if the narrative is that China has accelerated the development of the first true advancement in cancer cures since the advent of chemotherapy let’s just pretend they don’t exist.

Myeloma treatment is not as brutal as painted. Although, of course, everything is in the eyes of the beholder, or rather the mind of the patient having to suffer through it. I do take issue with all three of the specific side effects that the essay highlights, as well as the time burden of myeloma is described. To wit:

Patients come in and out of the clinic for injections, take pills at home and undergo repeated blood tests, living according to a calendar organized around treatment days and recovery days. They also have to contend with the side effects of the medications. Dexamethasone can produce a sleepless agitation followed by a physical and emotional crash. Bortezomib often damages peripheral nerves, causing tingling and a burning pain in the hands. Daratumumab often leads to immune suppression, leaving patients more vulnerable to infections.

Dexamethasone is given in bursts and, thanks to the decidedly non-industry funded trials led by S. Vincent Rajkumar, at a much lower dose than before, minimizing these sorts of side effects. Similarly, bortezomib is now given less frequently and in different ways (under the skin instead of intravenously) to minimize nerve damage. And if you think immunosuppression is bad with daratumumab, well, try wiping out every antibody-producing cell in your body then waiting until you can get all of them back, and yes that includes needing to receive all your childhood vaccines again.

Separately, repeated blood tests are a sine qua non of multiple myeloma management, or really of any cancer management, even after a “cure”. If we aren’t monitoring for recurrence of the primary disease we are fussing over other cancers which may or may not be the result of the treatment itself, or of a person’s general propensity to have cancer. [Note: In fact, two biggest risk factors for having cancer, other than a genetic mutation/hereditary syndrome, are age and prior personal history of cancer. ] And yes, that goes even for patients whose CAR-T treatment leads to durable complete remissions. Especially with CAR-T treatments which are known to cause cancer.

Speaking of which, cilta-cel/Carvykti is not a walk in the park either. Cytokine release syndrome (CRS) and Immune effector cell-associated neurotoxicity syndrome (ICANS) are two particularly nice side effects of all conventional CAR-T therapies, Carvykti included. They are frequent and severe enough that most patients need to be treated in the hospital and be within driving distance for the next four weeks. Many end up being admitted to the critical care unit. CAR-Ts that target BCMA, like Carvykti, also cause profound immunosuppression (vide supra) and require patients to repeat their childhood vaccination series. Carvykti, however, is in a league of its own as on top of all that it can also cause Parkinsonism. This is not to throw shade at CAR-Ts, they truly are revolutionary. But let’s not condemn other myeloma treatments for their toxicity when the alternative is worse in some ways, about the same in others.

BCMA CAR-Ts are, for most patients with multiple myeloma, not a cure. The essay cites 12-month results of the CARTITUDE-1 trial, where 76% of participants who received the cells [Note: But not including those enrolled to the trial who never got them, whether because they couldn’t be made, they were too sick to get them, or just plain died. This is how you play the denominator game. ] had no signs of myeloma at 12 months. Quote:

But what happened afterwards is perhaps even more striking: in the Abecma progression free survival curve, the line falls continuously. By contrast, in Carvykti, the line starts to plateau. Extended follow-up at five years confirmed that 33 percent of Carvykti patients remained disease-free.

This is false: there is no plateau. Figure 2 of the NEJM article describing these results has some numbers at the bottom not included in the Works in Progress essay. These represent the “number at risk” — participants who were still available for follow-up at a given time point; others have either progressed, resulting in an unwanted “drop” in the curve, or have not yet been followed for that long [Note: There are actually more reason for a participant to be marked as a “tick” without dropping the line, i.e. to be “censored”, some more nefarious than others. For a good primer on this “informative censoring” see, for example, this article ] and are marked with a triangle here though more commonly they are merely a tick. The “plateau” is an artifact of too few participants getting to 24 months, only 9. It completely disappears in extended follow-up, with the curve continuing its descent at and past 24 months in Figure 2A, all the way to 60 months where a cluster of vertical tick marks precedes yet another mirage of a plateau, again with only a handful of patients being at risk. Let’s pray it ain’t so but I suspect that, if we were to continue following these participants to 10 years, the curve will continue going down and down and down.

You could make the same story about artefact plateaus about daratumumab as well. It, too, has been pushed up all the way to first-line treatment and even before overt disease; concerns about longer follow-up needed for what is usually a slow-burning disease remain. Compare and contrast to imatinib in CML in this recent essay from Vinay Prasad, who concludes with:

There is progress in both diseases but more in CML. CML is more clearly a success story. There is much room for progress in myeloma. Myeloma is not yet curative, sadly. Presenting survival over time is misleading and masks more complicated narratives.

Carvykti’s approval timeline was not gobsmackingly fast. Most misleading is the side-to-side comparison of the Chinese cilta-cel and its American predecessor, ide-cel development pathway. Ide-cel includes the development of CAR T-cells in general (1989–2012) and the first BCMA targeting proof-of-concept (2013). Cilta-cel emerged from Zeus’s head in 2014, like it didn’t require both CAR-Ts to be developed and BCMA to be validated as a target. The tag line of the figure is that “China’s BCMA CAR-T reached FDA approval just 11 months after the US, despite starting decades later.” Hogwash.

Note the development timelines: ide-cel’s first-in-human study started in 2014. [Note: I should know: I was there! Funnily enough I was the in-house fellow on call on the days when two of the first 3 participants received their cells and had the honor of escorting them to the intensive care unit that very night. Both had both their myeloma and all of bone marrow wiped out in the process. ] It received FDA approval in 2021, for a total of 7 years of clinical trials. Cilta-cel’s first-in-human was in 2016 with a 2022 approval; 6 years. Let’s finish up our mini-mental test: how long did it take for the FDA to approve rituximab, trastuzumab and imatinib, from the first patient dosed?


These are only the highlights, but going much deeper would be nitpicking. I don’t know whether this amount of laxity with the truth was intentional, but the essay is almost as misleading as a Seattle lady’s GPS, taking her straight onto light rail tracks. Once there, you can only go in two directions: forward, towards loosening up regulations to match China’s Wild West, or backwards, tightening up regulator requirements for Chinese assets and trials and punishing companies for doing business there. [Note: See how I ties going “forward” with less regulation and “backward” with more. This can easily be flipped to portray less regulation as going backwards, but I leave doing that in full as a fun exercise for you, dear Reader. ] What happened to going sideways? Diagonally? Up or down? What if it the time to approve revolutionary cancer treatments has doubled because the follow-ups aren’t as revolutionary? And then get drowned out further by the me-toos and the ghost drugs which make much better competitors in the biotech beauty pageant, where whom you know and where you came from is more important than the increasingly pliant, malleable and quicksand-appearing ground truth?

But sure. China.


If it looks like a press release and reads like a press release, why is it being sold as a government report?

Doc in a Box from Alex Tabarrok links to an official state government document, from the Utah Department of Commerce. The document is titled “Key Statistics on the Doctronic Pilot Program” but reads more like a bulleted press release, full of percentages without a denominator, begging for a flow chart. Press releases are like that because you typically won’t add images — although this one randomly selected from today does indeed include it along with the full abstract submitted to the ASCO annual meeting, and good for them — but more importantly because you want to pick the best possible picture-perfect view of your shiny spotless data elephant without also acknowledging that it has a rear end, a bunch of flies buzzing around, smells a bit rank. Does your elephant not have an ass, Utah? Or did you just copy/paste what Doctronic — a startup whose wonky web page doesn’t even work — sent you?

Screenshot of the Doctronic homepage with the message We hit a technical snag. Our engine encountered an issue; we are resolving it now. Doctronic. We have hit a technical snag. Go to Homepage to hit it again.

So how many patients could they have evaluated? This article in JAMA Forum says that “[p]hysicians hired by Doctronic will review the AI’s output for the first 250 patients before the system takes any action and will review the next 1000 patients retrospectively after the AI agent begins acting autonomously.” Are the key statistics from the first 250? The very first bullet point in the press release summary document says that the program is still in Phase One and that “the number of patients so far is limited”, so I guess not. Is it 100 at least? Surely they wouldn’t use a percentage as high as 97 if there were fewer than that involved. Except that as low as 30 will give you a percent roundable to 97. So, 30 to 249?

Why am I being so pedantic? Well, these techniques are par for the course in biotech world but coming from a state agency make me think there is a bit too much enthusiasm for it, coming from a government source. Compare and contrast to the shellacking LLMs got in this report from the Office of the Auditor General of Ontario, which reviewed AI Scribe functionality from 20 vendors. Their report even has absolute numbers in it! These state government officials should realize that they are prime targets for flim flam merchants and should behave accordingly.

Note that I am not against the idea in general. The project’s goal is in fact quite noble: there is no reason why plain ol’ machine learning shouldn’t be able to suss out majority of refill requests for chronic medications and flag patients who haven’t had their bloodwork or diabetic foot assessments done, or who’ve had abnormal office blood pressure readings at prior visits. Having that easy refill option available would mean a patient coming in for an in-person visit for what should be “only” prescription refills is even more of a signal that something else may be amiss, even if the patient can’t or won’t verbalize it. So yes, LLM refills, bring ’em on. Doctronic’s end-goal of actual autonomous Shoggoths putting on white coats and replacing MDs, PAs, NPs and other credentialed humans… not so much.


Wednesday links, science and medical


Tuesday links bonanza

Your life’s goal should be to become the most improbable person you can be. Your path, your character, your life, should be the most unlikely, the most unexpected, the least predictable version you can make. Improbable lives have fewer competitors, more unique rewards, and are harder to replace with AIs, since AIs run on the predictable. This is true whether you favor traditional humanist directions or work on a frontier.

This is a nice preamble to a bit of personal news I can finally share: I will soon be going back [Note: It is a qualified “back”, as I have never actually practiced medicine full time, being either in training, doing clinical research as my main job, or being out of clinic altogether save for a few hours a week doing charity work. ] to the practice of clinical medicine. This week is in fact the last in my current position, which had been a magnificent experience but was going, as the careful reader of this blog would have already noted, in a direction not entirely suited to my preferred lifestyle and more importantly — let’s not sugarcoat it — values and beliefs. Onwards and upwards!

Whittaker, who is the president of the Signal Foundation (as in the app), had this to say about venture capital back in 2023:

Venture capital looks at valuations and growth, not necessarily at profit or revenue. So you don’t actually have to invest in technology that works, or that even makes a profit, you simply have to have a narrative that is compelling enough to float those valuations. So you see this repetitive and exhausting hype cycle as a feature in this industry. A couple of years ago, you would have been asking me about the metaverse, then last year, you would have asked me about Web3 and crypto, and for each of these inflection points there’s an Andreessen Horowitz manifesto.

It’s not simply that one piece of technology is overhyped, it’s that hype is a necessary ingredient of the current business ecosystem of the tech industry. We should examine how often the financial incentive for hype is rewarded without any real social returns, without any meaningful progress in technology, without these tools and services and worlds ever actually manifesting. That’s key to understanding the growing chasm between the narrative of techno-optimists and the reality of our tech-encumbered world.

Emphasis is mine, as it could be transposed word-for-word into the current world of drug development. Consider it a more polite rewording of prof. Taleb’s take.

Commodified knowledge is “general knowledge” in the sense tested by trivia/quiz contests. In grade school, we actually had a subject on the curriculum called “GK” and kids good at it (I was one of them) got put on quiz teams to represent their class or school. General intelligence of the sort we actually have today is simply AIs trained on general (ie commodified) knowledge.

But the theological motte-and-bailey move that conflates it with some totalizing-universal divine-omniscience idea of “Artificial General Intelligence” traps a great many of even the smartest people. A category error motivated by theological yearnings, validated by second-order Labatutian psychoses, sustained by epistemic bubbles, and encouraged by sketchy business roadmaps that need a story to justify trillion-dollar investments.

This is a charitable way of justifying the AI billionaire panhandlers’ selling of large language models as AGI, even putting the term in official titles. Less charitably, they all know what Yann LeCun has been saying for years: LLMs will never reach human level of intelligence (“ChatGPT, make me a sandwich”). Whether LeCun’s own pursuits are wise is a different matter.

Separately, Rao gives some good book tips and Benjamin Labatut’s When We Cease to Understand the World is now on the Pile.

No quotes because, true to form, everything salient is already in the title. Natural continuation of the debate started last week (see the last link), although apparently written before the new arXiv policy for a 1-year ban for hallucinated references.

Healy wrote a book about data visualization so I feel somewhat foolish in writing this, but I do not find Apple Sports’ presentation least bit confusing: the numbers are absolute, the bars show percentage of the total. If the goal is to have more of each (assists, rebounds, steals, etc.) the bigger bar shows the opposing team’s dominance. It’s fine. Healy’s proposed solutions are all notably uglier and demote low-occurrence events like blocks and steals even though they may be crucial in a game. Shows how little both Healy and Gruber — on whose post Healy riffs — know about the game of basketball.

At Compleat Kidz, a fast-growing chain of autism clinics based in North Carolina, the policy is firm: Naps cannot be longer than seven minutes before children are awakened to resume therapy. The company says this is necessary to prevent fraud since clinics can be paid only when children are awake and getting services. But it also allows the clinic to bill insurers or Medicaid for more hours.

Yes, you have read that correctly. Waking up a child after a 7-minute nap to perform “therapy” — as if anything meaningful can be accomplished in that hypnagogic state — is both cruel and unusual. But not a punishment! It is merely a way to avoid fraud while optimizing revenue under the watchful eye of private equity:

Private equity firms have acquired at least 500 clinics over the past decade. “There’s just huge opportunities to grow these businesses and help increase access to care,” said Jon Krieger, a managing partner at Calex, a financial firm that assists with autism clinic mergers and acquisitions. He estimates the market could grow to $90 billion.

Mr. Market is a bad doctor, an even worse vet and, it seems, a most diabolical nanny.


The departure of Marty Makary is looking more and more like a Murder on the Orient Express situation: everyone wanted him out. Well, everyone except for uniQure, Capricor and ImmunityBio who were named in the original version of that Endpoints News story as some of the companies lobbying for Makary’s ouster, then asked for their mentions to be removed, as the Editor’s note now helpfully clarifies. C’mon, people. Own it.


First they came for the programmers… Then they came for the doctors. But not really.

Back in September 2023 I noted that the biggest hurdle for AI completely replacing physicians is the physicality of the job. Sure, LLMs are good at giving differential diagnoses and faking empathy once somebody’s problem has been reduced to text, but the art of medicine is in the act of seeing, feeling, smelling, etc. [Note: Although increasingly less so, as doctors and trainees are becoming experts at treating patients in the chart and not those in front of them, making themselves the perfect foils for replacement; queue photo of the old man yelling at clouds. ] If clankers have any hope of replacing humans, they’d better get some senses.

At first glance, a recent Nature Medicine paper aimed to do just that by introducing what the group of authors — all of them Google employees based in the UK and California — call “multimodal reasoning” but is in fact the chatbot being able to interpret images, ECGs and lab reports in addition to the pre-digested clinical pearl. The topline result, one that the journal itself felt obligated to headline, was that “AI had superior performance compared with physicians for almost every metric (29 of 32 axes)”. But at what?

You would think that the question would have been easy to answer, this being a peer-reviewed paper and all, but no. In fact, I am still not completely certain what interactions were performed and whether they completely match what was reported. What is certain is that a set of primary care physicians and patient-actors from Canada and India — countries different from the author’s own countries and let’s wonder conspiratorially for why that may be the case — interacted via an instant messaging-like service. This is the first oddity: even remote health visits are performed using video calls, and yes you may occasionally get a text through the EMR or if you are a VIP/boutique physician maybe your phone, but that is far from the norm.

The primary report is on what happened when the patients uploaded the skin photos, ECGs, lab results, etc. and then asked the physician or LLM on the other end questions about it. Pretty standard fare for a human-to-LLM interaction, but not exactly natural for a doctor-patient relationship which usually starts with questions being asked of the patient. This is the second way in which the setup was made to fit the computer and not the human.

But then the last section of the paper is about what happens when there is, in fact, a back-and-forth by the way of taking a history. The extended figures — “extended” here meaning not worthy enough of being included in the main paper — say it improves the performance of the LLM. They do not say how it affected the human performance, or how the patient-actors rated humans versus LLMs in history-taking. I would call that strike three.

To the journal’s credit, they did not allow Google to get away with it completely. “To evaluate the performance of our finalized system, we conducted a randomized, blinded human evaluation that emulates an objective structured clinical examination”, says the final paragraph of the introduction, only to end with:

We note, however, that our study is not a randomized clinical trial with prespecified endpoints and preregistered statistical analysis. Rather, it is an exploratory study investigating the properties of multimodal diagnostic dialogue.

Peer review is at least good for something, even if it does result in self-contradiction.

Meanwhile, in the world without motivating reasoning, more objective assessments of the usefulness of AI in medicine show that it is in fact still quite bad. This does not prevent the massively funded hordes of AI researchers from flooding the field with sloppy work, creating the impression that the rise of the machines is imminent. Comply or relegate yourself to the permanent underclass, serf MD. But of course, relegation will only be possible to the extent doctors — or any other profession, really — has already debased itself and abandoned its core professional principles in the service of electronic ease.


The altruist bait-and-switch

After dissecting the minutiae from the ongoing battle of the bozos [Note: To save you a click: it is about the Musk-Altman trial. ] , Andrew Sharp’s weekly column ends with this paragraph:

The reality is knottier. Had the OpenAI founders not launched with a nonprofit structure in 2015, they probably never recruit the talent required to compete with Google. And had they done anything else other than exactly what they did in 2018 and 2019, all of computing would be less interesting today, and the company probably wouldn’t exist eight years later. Musk’s trial has been clarifying on that point, at least for me.

The AI side of technology is one of those rare occasions where biotech may indeed be like tech: people with knowledge, skills and ambition to make the early steps towards creating something new generally don’t do it for the money. Accolades, titles, a few more increments on their h-indices sure, but unless they are seriously delusional a lab postdoc coming in on a weekend to split the cell culture generally has no hope of getting into the top percentile in income. Up until a few years ago AI research was much like that, until it wasn’t.

Sharp writes that OpenAI had to flip the switch if it were to survive in these shark Google-infested waters once they smelled blood profit an opportunity to tell a new story to investors. Same can be said about any biotech: become successful enough, and there will come a time when the academic founders are asked to step away and let someone with different motivations run the show, lest they be lost in a sea of copycats, smoke-peddlers and competitive intelligence officers. The whole business has just become too expensive for some Jonas Salk-wannabe to dabble in.

A person of bad intent may propose that the adults coming to run the show once it becomes too expensive are the ones making it expensive in the first place to justify their existence, contributing the health care cost ouroboros on the way. But that is of course nonsense. The proof is in the pudding, what with famously efficient drug development pipelines, low health care costs and improving lifespans.

So let’s do what a genuine financial scion once proposed: invert. Instead of asking ourselves how to make drug development more efficient and cost-effective, let’s see how we could make it more expensive. Number one thing to do would making it all about the money: let’s portray people who don’t capitalize on their inventions as losers not heroes, make Nobel Prize winners notable only if they are billionaires (who won the Nobel Prize in Physiology or Medicine last year, again?), measure success of drugs in dollars earned not lives improved, extended or saved, have everyone skim a percent or five of the money swishing around in the ecosystem as their primary source of income without any penalty for ultimate failure [Note: For more on this, do read Nassim Taleb’s Skin in the Game, which is about much more than the titular phrase which has become — much like his The Black Swan — a phrase people throw around without having any idea of the underlying concepts. ] guaranteeing that they will have every incentive possible to grow the pie, and I think you see where this is going because the system functions as designed so why should you complain? After all, there is no alternative.

Except that, of course, there is. It would be a big lift, to remove incentives of skimmers to inflate the balloon, stop various influencer platforms from inducing FOMO in everyone and anyone, recalibrate the median science journalist’s value system from Mr. Market to something more reality-based. Big, but not impossible, provided there is a will.

Therein lies the problem: that kind of thinking is somewhat at odds with the shared American culture, at least as recently described by Chris Arnade, that “you can live how you want, eat what you want, live (up to a point) how you want at a thin level, as long as you ultimately believe in making big money through hard work and playing by the rules.” Determining if the other two legs of the three-legged money/work/rules American stool are performing as intended I will leave as an exercise for the reader.


Wednesday links, with many uncertainties

Oh but we do, at least superficially: “of 130,000 men who became new fathers between 2017 and 2022, almost 800 died during that same 5-year period, and 60 percent of those deaths were from potentially preventable causes like homicide, accidental injury, and suicide” which is about what you would expect for a group of men that skews younger. The authors of the paper make a comparison between fathers who died and those that survived but a more interesting one would have been a demographically matched of childless men. Alas, all we have is all the men in Georgia and lo, for each age range the new fathers have a lower mortality and the discussion appropriately leads with “Fatherhood appeared to be associated with reduced mortality.“ [Note: Another reason to have more children. Though, if you are going to do it solely because of a misguided belief that you yourself would live longer, then perhaps don’t? ] Methinks French — or her headline writer — were fooled by randomness.

Vepdegestrant for breast cancer seems to be another entry in the annals of approved drugs being considered failures by Mr. Market. Let it be noted that a chemist (Lowe) writing for a prestigious peer-reviewed journal (Science) dunks on a drug while citing millions and billions of dollars exchanged or promised to various stakeholders while barely mentioning, and wrongly at that, the actual trial results. “It did not really demonstrate any advantage versus the comparison in the trial, fulvestrant” is factually incorrect: median progression free survival was 5 versus 2.1 months, which, fine, is tiny and may have been the result of statistical shenanigans; but it may also be a true and meaningful incremental improvement and if we are going to dismiss it out of hand then what are we even doing here? The rot runs deep.

It is a genuine mystery of why a mostly agrarian functional democracy with no separatist movements, demographic catastrophes, curses of resource wealth and the other usual suspects of stalled growth should completely flatline their GDP. Mousa shows compelling data and many hypotheses, though I wonder whether there is something that isn’t and can’t be measured which is keeping the country where it is. And if you are thinking that oh, GDP can’t measure happiness, I bet that at least they are happy, think again: it was the 4th least happy country last year. But then the “Happiness Report” methodology takes GDP into account (!?) so it is almost impossible for a GDP-poor country to break through in the rankings.

This is about slides shared via email, never meant to be presented, but rather serving as a landscape-oriented picture book for adults. I don’t know what is behind communication-by-slide, and as a seminar-attending Tufte acolyte I abhor it. Management consultants spreading them around like a viral respiratory disease — which is the thesis of the blog post — certainly has something to do with it, but the syndrome is now bottom-up as well. My third-grader asked me just this morning why they were forced to watch and make (!?) slides at school.


Medical links, Good, Bad and Ugly

The good: How an ‘Impossible’ Idea Led to a Pancreatic Cancer Breakthrough by Gina Kolata and Rebecca Robbins for The New York Times. The breakthrough discussed is the real deal, and they manage to do it in a measured tone which correctly identifies daraxonrasib as a stepping stone and not a miracle cure. It has this important note up top and not buried down at the end:

The pills, three taken daily, are not a cure — eventually, daraxonrasib stops working. Many patients do not respond. And it has side effects that can be harsh, including rash, diarrhea, fatigue, nausea and raw, split fingertips.

How refreshing — I hope Derek Thompson takes note.

The bad: The Human Body’s Hidden Pathways by Dr. Avraham Z. Cooper, who is a pulmonary/critical care physician at the Ohio State University, for The New York Times Magazine. For the life of me I can not figure out the point of this post-modern journalistic exercise.

Nominally it is about a peer-reviewed research article which came out in 2021 under the title “Evidence for continuity of interstitial spaces across tissue and organ boundaries in humans”. The NYT Magazine staff did not deem it worthy of being linked to, but here it is in its entirety. In it, the authors showed small fragments of tattoo pigment migrating into tissues — skin and colon — deeper than they expected. We are not talking about ink being injected into a bicep and showing up in someone’s rectum here, but rather a series of biopsies of tattooed skin or the lining of the colon where there is a lot of pigment up top, and much less and in smaller pieces down at the bottom of the slide, deeper in the tissue.

Let me pull out my rarely used master’s degree in histology and note that this is hardly surprising. Connections between cells are not exactly air-tight — other than maybe in the brain and the testes — so of course there is some gel-like fluid circulating in the space. Or did the original article’s authors not realize why people tend to rub their feet when they get swollen?

But that is only the introduction. The meat of the article is Dr. Cooper’s theoretizing that this has something to do with — drumroll, please — acupuncture. With no evidence, mind you, but a tingling sensation in the back of his neck or somesuch. By the time the 30th single-sentence screen scrolls by we are firmly in bullshit territory, in the formal sense of the word. Caveat lector.

The ugly: Longevity Medicine - An evidence based guide by Dr. Vinay Prasad who is out of the FDA and back making YouTube videos. And oh my, the contrast between the most recent thumbnail and the one posted just before he joined the FDA is striking. Has it only been a year? No wonder that his first topic back as an influencer is about longevity.

A sidenote here which I will put at the end: the increased interest of Silicon Valley types with longevity, and I am not thinking only about Bryan Johnson’s delusions here, reminds me of the recently quoted speech Charlie Chaplin gave at the end of The Great Dictator, the relevant quote being that “so long as men die, liberty will never perish.” Good for us that snake oil salesmen are still the longevity field’s most prevalent phenotype.