Posts in: science

There is both a science and an art to medicine. The “art” part usually comes into play when we talk about bedside manner and the doctor-patient relationship, but recognizing and naming diseases — diagnosis — is also up there. José Luis Ricón wrote recently about a fairly discrete entity, Alzheimer’s disease, and how several different paths may lead to a similar phenotype. This is true for most diseases.

But take something like “cytokine release syndrome”, or “HLH”, or any other syndromic disease that is more of a suitcase phrase than anything, and that can present as a spectrum of symptoms. Different paths to different phenotypes, with only a sleigh of molecular storytelling to tie them together. Yet somehow it (mostly) works. It’s quite an art.


No one is hiding the miracle cures

So, who wants to dismantle the FDA, you ask? Some patient advocacy groups, among others, aided by a few senators:

We need the FDA to be more insulated from these forces. Instead, every few years, legislators offer bills that amount to death by a thousand cuts for the agency. The latest is the Promising Pathways Act, which offers “conditional approval” of new drugs, without even the need for the preliminary evidence that accelerated approval requires (i.e., some indication that biomarkers associated with real outcomes like disease progression or survival are moving in the right direction in early drug studies).

This bill is being pushed by powerful patient groups and has the support of Democratic senators like Kristin Gillibrand and Raphael Warnock, who should know better.

The bill would codify using “real-world data” and unvalidated surrogate endpoints for something called “provisional approval”, a level below the already tenuous accelerated approval.

I can see how it may appeal to patients: you may get a promising new drug for your life-threatening, debilitating disease sooner via this pathway. On the other hand, there are already mechanisms in place that enable access to these: a clinical trial, for one. Or expanded access (a.k.a. “compassionate use”) for those who may not be eligible for a trial.

So how would “provisional approval” help? If anything, wouldn’t it transfer the risks and — importantly — costs of drug development from the drug manufacturer/sponsor/study investigator to the patient?

Ultimately, the reason why there aren’t many cures for rare, terminal diseases is not because the big bad FDA is keeping the already developed drugs away from patients but rather because they are devilishly difficult to develop at our current level of technology. Wouldn’t it then make more sense to work on advancing the technology The careful reader will note that the opposite is being done, and I write this as no great fan of AI. that would lead to those new cures? I worry that the Promising Pathways Act would solve a problem that doesn’t exist by adding to the already skyrocketing costs of American health care. But that could be just me.

(↬Derek Lowe)


Term confusion alert 2: outcome versus endpoint

Our clinical trials course at UMBC is well under way, and we are getting some terrific questions from students. Here is one!

Q: Are outcomes surrogate endpoints or is there a distinction between the two?

The terms “outcome” and “endpoint” are not strictly defined and some people use them interchangeably. However:

  • Outcomes are broader, and include any change in health that are considered important enough to measure in a patient (such as “overall survival” — the amount of time between enrolling onto the trial and death, or “quality of life” — a certain score on a specified scale that a patient fills out or the doctor performs).
  • Endpoints are more specific than outcomes, consider the whole study population instead of individual patients, and need to have a precisely defined way of measurement and time points when they are measured (e.g. “median overall survival”, “3-year overall survival rate”, and “5-year overall survival rate” are three different endpoints that are different ways of aggregating and evaluating the same individual patient outcome — overall survival).

It reminds me of the confusion between efficacy and effectivness, only it’s worse: there is no agreed-upon text that describes the distinction, so it is a really terminological free-for-all. Indeed, what I wrote above may end up not being true — caveat lector! As always, it is always best to ask people to clarify what they meant when they said this or that. Regardless, if someone tells you that “overall survival” (or, worse yet, “survival”) was the primary endpoint, it clearly can’t be the case. Endpoints need to be more specific than that.

Surrogate outcomes and surrogate endpoints are those which are stand-ins for what we actually care about. Here is a good video on surrogate endpoints in oncology.E.g. when we give chemotherapy to someone with cancer, we do it so that they would live longer and/or better. However, it is quicker and easier to measure if the tumor shrinks after chemotherapy (i.e. “responds” to treatment), and we believe that the tumor shrinking will lead to the patient living longer or better (which may not necessarily be the case!), so we use the response as a surrogate outcome for survival and quality of life (by how much did the tumor shrink? was it a complete or a partial response according to pre-specified criteria?). Study level surrogate endpoints would be the overall response rate, partial response rate, complete response rate, etc.

We have created so much confusion here that it is a small miracle we can communicate amongst ourselves at all.


As We May Think is one of the greatest essays ever written, and I am all for popularizing it, but one thing about the most recent mention just rubbed me the wrong way in how it presented its author, Vannevar Bush:

Bush was part of the Oppenheimer set; he was an engineer whose work was critical to the creation of the atomic bomb.

This paints the picture of an engineer working at Los Alamos under Oppenheimer to make the bomb, when in fact Bush was leading the United States' nuclear program for two whole years before Oppenheimer became involved. Oppenheimer’s predecessor? Sure. Part of his set? Misleading.

I suspect it was presented this way because of that movie; the more I keep seeing these kinds of distortions as a result, the less I think of it. This is why I will keep recommending The Making of the Atomic Bomb to everyone and anyone who was tickled by the Los Alamos scenes — the only ones worth watching.


November lectures of note

Next Wednesday looks busy, but there is a Thanksgiving-sized gap in the calendar.


As a long-time fan of Goodhart’s law — see here, here, and of course here — I can only nod my head in appreciation of Adam Mastroianni’s concept of self-Goodharting. My first encounter with the concept, and still an excellent introduction, was an article on Ribbonfarm.


Term confusion alert: efficacy versus effectivness

We like to do things in medicine, and medicine’s big contribution to science was figuring out how best to answer the question of whether the things we do actually work. But of course things aren’t so simple, because “Does it work?” is actually two questions: “Can it work?”, i.e. will an intervention do more harm than good under ideal circumstances, and “Does it work in practice?”, i.e. will an intervention do more good than harm in usual practice.

We also like to complicate things in medicine, so the person to first delinate this distinction, Archie Cochrane of the eponymous collaboration named them efficacy and effectiveness respectively — just similar enough to cause confusion. He also added efficiency for good measure (“Is it worth it?) Fifty years later, people are still grappling with these concepts and talking over each other’s heads when discussing value in health care. Which is to say, it’s best not to use the same prefix for overlapping terms, but if you had to, “eff” is most appropriate.

The most recent example is masks. Cochrane Colaboration’s review said they didn’t “work” The paper caused an uproar and language has since been toned down, but that was the gist. for preventing respiratory infections. Now, knowing what Cochrane was all about the first question to ask is: what sense of “work” did the authors intend, and this particular group is all about effectiveness (working in “the real world”), not about efficacy (working under ideal conditions). This caused some major cognitive dissonance among the covid-19 commenters. Vox had the typical sentiment:

Furthermore, neither of those studies [included in the meta-analysis] looked directly at whether people wear masks, but instead at whether people were encouraged or told to wear masks by researchers. If telling people to wear masks doesn’t lead to reduced infections, it may be because masks just don’t work, or it could be because people don’t wear masks when they’re told, or aren’t wearing them correctly.

There’s no clear way to distinguish between those possibilities without more original research — which is not what a meta-analysis of existing work can do.

But this is the difference between ideal (you force a person to wear a mask and monitor their compliance) and typical conditions (you tell the person to wear a mask and keep your fingers crossed), and Cochrane is interested in the latter, Though of course, the chasm between ideal and typical circumstances varies by country, and some can do more than others to bring the circumstances closer to ideal, by more or les savory means. which is the one more important to policy-makers.

This is an important point: policy makers make broad choices at a population level, and thus (do? should?) care more about effectiveness. Clinicians, on the other hand, make individual recommendations for which they generally need to know both things: how would this work under ideal conditions, how does it work typically, and — if there is a large discrepancy — what should I do to make the conditions for this particular person closer to the ideal? We could discuss bringing circumstances closer to ideal at the population level as well, but you an ask the people of Australia how well that went.

The great colonoscopy debate is another good example of efficacy versus effectivness. There is no doubt that a perfectly performed colonoscopy at regular intervals will bring the possibility of having colon cancer very close to zero, i.e. the efficacy is as good as you can hope for a medical intervention. But: perfection is contingent on anatomy, behavior, and technique; “regular intervals” can be anything from every 3 months to every 10 years; and there are risks of both the endoscopy and the sedation involved, or major discomfort without the sedation. And thus you get large randomized controlled trials with “negative” results Though they do provide plenty of fodder for podcasts and blogs, so, thanks? that don’t end up changing practice.

So with all that in mind, it was… amusing? to see some top-notch mathematicians — including Nassim Taleb! — trying to extrapolate efficacy data out of a data set created to analyze effectivness. The link is to the preprint. Yaneer Bar-Yam, the paper’s first author, has a good X thread as an overivew. To be clear, this is a worthwhile contribution and I’ll read the paper in depth to see whether its methods can be applied to cases where effectiveness data is easier to come by than efficacy (i.e. most of actual clinical practice.) But it is also an example of term confusion, where efficacy and effectiveness are for the most part used interchangeably, except in the legend for Table 1 which say, and I quote:

The two by two table provides the incidence rates of interest in a study of the efficacy (trial) or effectiveness (observational study) of an intervention to reduce risk of infection from an airborne pathogen.

Which seems to imply that you measure efficacy exclusively in trials and effectiveness in observational studies, but that is just not the case (the colonoscopy RCT being the perfect example of an effectiveness trial). And of course it is a spectrum, where efficacy can only be perfectly measured in impossible-to-achieve conditions of 100% adherence and a sample which is completely representative of the population in question so any clinical trial is “tainted” with effectiveness, though of course the further down you are on the Phase 1 to Phase 4 rollercoaster the closer you are to 100% effectivness.

I wonder how much less ill will there would be if the authors on either side realized they were talking about different things. The same amount, most likely, but one could hope…

Update: Not two seconds after I posted this, a JAMA Network Open article titled “Masks During Pandemics Caused by Respiratory Pathogens—Evidence and Implications for Action” popped into my timeline and wouldn’t you know it, it also uses efficacy and effectiveness interchangeably, as a matter of style. This is in a peer-reviewed publication, mind you. They shouldn’t have bothered.


This list of NCI’s Lasker Clinical Research Scholars has some familiar faces, and I couldn’t be more proud. These are all MDs and MD/PhDs who are forgoing lucrative careers in industry and private practice and exposing themselves to metric tonnes of federal red tape, all to find cures for rare and neglected cancers (looking at you, T-cell lymphomas and AIDS-related malignancies). May their Tartars show up.


I’ve just spent 45 minutes teaching a dozen and a half first-graders how to use a microscope — with mixed success — and it was the best time I’ve had all week. We looked at frog’s blood, the leg of a housefly, paramecia, and some pollen, all of which sound like something a witch would have on hand. Perfect for Halloween!


The Get Shorty style of peer review

Serbia in the 1990s had a peculiar mass media landscape in that movies were rarely officially released, yet were shown on TV days after premiering through the magic of pirating, practiced by both broadcast and cable networks. The most successful of these was TV Pink, now a horror show of reality TV, and one of the many peculiar features of TV Pink was that its daytime content would rely heavily on those E! making-of fillers and patter interviews with exhausted celebrities on their movie-promoting circuit. While home-bound sick kids of America filled their days with Bob Barker and Jerry Springer, in Serbia it was all Hollywood all the time — unless you were the weirdo who liked to watch reruns of 1960s kids shows and their poorly made hyperinflation-era remakes, which was the only thing state TV was capable of producing.

But that wasn’t me! So when I got bacterial pneumonia back in 6th grade and was stuck at home for two whole weeks while receiving twice-daily intramuscular right-into-the-gluteus aminoglycoside antibiotics — the lackadaisical attitude of Serbian pediatricians towards dosing and toxicities is a different story — Pink took up more of my time than I care to admit, and making-of videos from that period got engrained in my memory more so than the movies themselves. Topping the list was the 1995 comedy Get Shorty starring John Travolta, which back then I thought must have been the biggest blockbuster ever if they were talking about it so much.

All this is a preamble to what I heard said by Travolta, or his co-star Danny DeVito, or maybe it was the director Barry Sonnenfeld, and it was this: the movie was based on a book, and the book was outstanding and written by Elmor Leonard who had a way with writing dialogue, and whenever they had an urge to improvise their lines they would hold back because Leonard must have already thought about the things that came to the actors minds first and decided that, no, this thing on the page was better.

And I have heard that line so many times — my pre-frontal cortex still developing — that I have now completely internalized it and act on it unconsciously. When evaluating someone’s work — outside of grading papers, for that is a different matter entirely — I start with the assumption that they have thought long and hard about the paper they’ve submitted for my peer review, certainly longer than the few hours I can dedicate to reviewing it, and I give them the benefit of the doubt. My first impression is probably something they thought about and dismissed, to come up with what they are submitting. Now, some papers are so egregiously wrong that they will still be red all over after I’m done; but if there is a small difference of opinion, or a nit to pick with style, or something I would maybe have done slightly differently, I just let it be, in deference to the authors' work and respect to their, the editor’s, and — why hide it? — my own, time.

After being on the receiving end of quite a few paper and grant reviews myself Oh, and meetings. So many meetings., I am beginning to suspect that not everyone is following the Get Shorty ethos.

Now, the worst peer review I have ever received was also the shortest. It was for a paper about a clincial study in rare disese that had a one-sentence rejection from the first journal where it was submitted: “Only 11 patients, they need more”. But most other reviews are not “bad” in that sense, but rather overly verbose and nit-picky about the tiniest of details with dozens of comments per review, the purpose of which is not to improve the article, but rather to show to the editor of the prestigious journal — the higher the impact factor, the more nits to pick — that the reviewer was worthy of the invitation to provide his or her services free of charge to the academic publishing machine. Look at me, ma', I’m paying attention!

Which is fine for papers, I guess, since the reviewers will be in the ballpark of your field (those that aren’t won’t accept the review), and you may at least get a chance to respond. Grants are worse: not only are the reviewers forced into it for the prestige of being on a study section, there is little chance if any that they will have the knowledge of your field This is why, I suspect, the best predictor of receiving an NIH grant is already having received an NIH grant. Not only have you been stamped as a success for the “educated lay-people” on the study section, but if you reapply to the study section their knowledge of your field will have been what you told them, and the Program Officer, in prior grant applications. The problem is trebled if you apply with a clinical trial, because all the people with clinical trial expertise across all of the NIH study sections could probably all fit in a Mini. But how much more difficult could designing a clinical trial be from running a lab, eh?

The examples are many and I’m not at liberty to discuss most of them, but back when I was opening trials in T-cell lymphoma a grant was not funded mostly because they didn’t think we could enroll patients for this “rarest of the rare” disease in time (we were well over half-way done with enrollment by the time we heard of this decision). In case you were wondering why all the money goes to breast, colon, and lung cancer research, well, no one ever had a problem recruiting for those!

There is a role for peer review: to weed out the impossible and the truly un-fundable. But after that it may as well be a lottery: why would rolling the dice be worse than adding up laundry lists of small, irrelevant issues that could sink an application which gets assigned to two or three particularly detail-oriented reviewers. It would also save a hell of a lot of time for everyone involved.