Published on [Permalink]
Reading time: 4 minutes
Posted in:

Why not use machine learning to rank residency applicants?

I just finished attending a 1-hour career panel for UMBC undergrads thinking about medical school, and the one thing anyone interested in practicing medicine in America should know is that you really, really, really need to know how to answer multiple choice questions. It doesn’t matter how smart, knowledgable, or hard-working you are: if you don’t have the skills needed to pick the one correct answer out of the four to six usually given, be ready to take a hit on how, where, and whether at all you can practice medicine in the US.

To be clear, this is a condemnation of the current system! Yes, there are always tradeoffs: oral exams so prevalent in my own medical school in Serbia weight against the socially awkward and those who second-guessed themselves. But the MCQs are so pervasive in every aspect of evaluating doctors-to-be (and practicing physicians!) that you have to wonder about all the ways seen and unseen in which Goodhart’s law is affecting healthcare.

What would the ideal evaluation of medical students look like? It wouldn’t rely on a single method, for one. Or, to be more precise, it wouldn’t make a single method the only one that mattered. Whether it’s the MCAT to get into medical school, USMLE to get into residency and fellowship, or board exams to get and maintain certification, it is always the same method for the majority of (sub)specialties. Different organizations, at different levels of medical education, zeroing in on the same method could indeed mean that the method is really good — see: carcinisation To save you a click: it is “a form of convergent evolution in which non-crab crustaceans evolve a crab-like body plan”, as per Wikipedia. In other words, the crab-like body plan is so good that it evolved at least five different times. but then if it is so great to be shaped like a crab, where are our crab-like overlords?

Being a crab is a great solution for a beach-dwelling predatory crustacean with no great ambitions, and MCQs are a great solution to quickly triage the abysmal from everyone else when you are pressed for resources and time. But, both could also be signs of giving up on life, like how moving to your parents' basement is the convergence point for many different kinds of failed ambition.

Behind the overuse of MCQs is the urge to rank. Which, mind you, is not why tests like USMLE were created. They were, much like the IQ tests, meant to triage the low-performing students from the others. But the tests spits out a number, and since a higher number is by definition, well, higher than the lower ones, the ranking began, and with it the Goodhartization of medical education. The ranking became especially useful as every step of the process became more competitive and the programs started getting drowned in thousands of applications, all with different kinds of transcripts, personal statements, and letters of recommendation. The golden thread tying them all together, the one component to rule them all, was the number they all shared — the USMLE score.

But then the programs started competing for the same limited pool of good test-takers, neglecting the particulars of why a lower-scoring candidate may actually be a better match for their program. Bad experience all around, unlike you are good at taking tests, in which case good for you, but also look up bear favor. On the other hand, there is all this other information — words, not numbers — that gets misused or ignored. If only there was a way for medical schools and residency programs to analyze the applications of students/residents that they found successful by whatever metric and make a tailor-made prediction engine.

Which is kind of like what machine learning is, and it was such a logical thing to do that of course people tried it, several times, with mixed success. It was encouraging to see that two of these three papers were published in Academic Medicine, which is AAMCs own journal. One can only hope that this will lead to a multitude of different methods of analysis, a thousand flowers blooming, etc. The alternative — one algorithm to rule them all — could be as bad as USMLE.

The caveat is that Americans are litigious. Algorithmic hiring has already raised some alarm, so I can readily imagine the first lawsuit from an unmatched but well-moneyed candidate complaining about no human laying their eyes on the application. But if that’s the worst thing that could happen, it’s well-worth trying.

✍️ Reply by email

✴️ Also on