Posts in: tech

Why not use machine learning to rank residency applicants?

I just finished attending a 1-hour career panel for UMBC undergrads thinking about medical school, and the one thing anyone interested in practicing medicine in America should know is that you really, really, really need to know how to answer multiple choice questions. It doesn’t matter how smart, knowledgable, or hard-working you are: if you don’t have the skills needed to pick the one correct answer out of the four to six usually given, be ready to take a hit on how, where, and whether at all you can practice medicine in the US.

To be clear, this is a condemnation of the current system! Yes, there are always tradeoffs: oral exams so prevalent in my own medical school in Serbia weight against the socially awkward and those who second-guessed themselves. But the MCQs are so pervasive in every aspect of evaluating doctors-to-be (and practicing physicians!) that you have to wonder about all the ways seen and unseen in which Goodhart’s law is affecting healthcare.

What would the ideal evaluation of medical students look like? It wouldn’t rely on a single method, for one. Or, to be more precise, it wouldn’t make a single method the only one that mattered. Whether it’s the MCAT to get into medical school, USMLE to get into residency and fellowship, or board exams to get and maintain certification, it is always the same method for the majority of (sub)specialties. Different organizations, at different levels of medical education, zeroing in on the same method could indeed mean that the method is really good — see: carcinisation To save you a click: it is “a form of convergent evolution in which non-crab crustaceans evolve a crab-like body plan”, as per Wikipedia. In other words, the crab-like body plan is so good that it evolved at least five different times. but then if it is so great to be shaped like a crab, where are our crab-like overlords?

Being a crab is a great solution for a beach-dwelling predatory crustacean with no great ambitions, and MCQs are a great solution to quickly triage the abysmal from everyone else when you are pressed for resources and time. But, both could also be signs of giving up on life, like how moving to your parents' basement is the convergence point for many different kinds of failed ambition.

Behind the overuse of MCQs is the urge to rank. Which, mind you, is not why tests like USMLE were created. They were, much like the IQ tests, meant to triage the low-performing students from the others. But the tests spits out a number, and since a higher number is by definition, well, higher than the lower ones, the ranking began, and with it the Goodhartization of medical education. The ranking became especially useful as every step of the process became more competitive and the programs started getting drowned in thousands of applications, all with different kinds of transcripts, personal statements, and letters of recommendation. The golden thread tying them all together, the one component to rule them all, was the number they all shared — the USMLE score.

But then the programs started competing for the same limited pool of good test-takers, neglecting the particulars of why a lower-scoring candidate may actually be a better match for their program. Bad experience all around, unlike you are good at taking tests, in which case good for you, but also look up bear favor. On the other hand, there is all this other information — words, not numbers — that gets misused or ignored. If only there was a way for medical schools and residency programs to analyze the applications of students/residents that they found successful by whatever metric and make a tailor-made prediction engine.

Which is kind of like what machine learning is, and it was such a logical thing to do that of course people tried it, several times, with mixed success. It was encouraging to see that two of these three papers were published in Academic Medicine, which is AAMCs own journal. One can only hope that this will lead to a multitude of different methods of analysis, a thousand flowers blooming, etc. The alternative — one algorithm to rule them all — could be as bad as USMLE.

The caveat is that Americans are litigious. Algorithmic hiring has already raised some alarm, so I can readily imagine the first lawsuit from an unmatched but well-moneyed candidate complaining about no human laying their eyes on the application. But if that’s the worst thing that could happen, it’s well-worth trying.


Last week was my 1-year anniversary using MarsEdit, and I feel not an iota of urge to switch to anything else for publishing. Now, if only there were a general-purpose editor that was as fast but that I could use as an nvAlt replacement. When is nvUltra coming out, again?


I agree wholeheartedly with Nicolas Magand’s answer to why he liked monospaced fonts. Any time I’d change the editor to a Serif font it ended poorly. I would add the IBM Plex family to the list of favorites: they are clean, readable, and underused despite being widely available via (ick, I know) Google Fonts.


🕹️ The 25th anniversary of Valve’s Half-Life is tomorrow, and they have a one-hour documentary out. The whole game is also available on Steam for free. There goes my weekend! (ᔥwaxy.org)


Booting up an Intel MacBook Pro for the first time in months, and the fans spin up as soon as I enter my password, and they are louder than the AC, and the screen freezes before getting to the desktop, and how on Earth did we ever tolerate this garbage? Isn’t technology grand?


Janice Kai Chen at the Washington Post on pigeons versus the internet:

At certain data volumes and distances, the pigeon is a quicker option for large swaths of rural America, where internet speeds can lag far behind the national average.

And not just rural America. As I write this from the nation’s capital, speedtest.net reports 24 Mbps up. Federal agencies should bring back pigeons for sending large files back and forth.


The problem of optimization and scale

They are converting a modern office building into condos a few blocks down from my apartment, and by the looks of it they may as well have torn everything down and built it anew. I hope they will do that will all the brutalist federal garbage downtown, the FBI building first. Meanwhile, the late 19th-early 20th century townhouses scattered around DC have been switching seamlessly from commercial to residential and back for a hundred years now for little to no cost.

Optimization and scale: they work great, until they don’t. Just ask a salaried physician working for a conglomerate in the medical-industial complex, a large-scale operation which is being optimized to death (sadly not its own, but that of its component parts — patients and health care workers alike). All those large reptiles and mammals are extinct for a reason.

We discussed the problem of scale at the first RWRI I attended back in August 2020, the Beirut explosion still fresh in everyone’s mind. Less than a year later, a big ship blocked the Suez channel, as if to reinforce the message. I expect Nassim Taleb’s next book will have a chapter or three on the problem, even if “scale” doesn’t make it into the title.

What goes for biology, architecture, and logistics also goes for industry, and if there is one hyper-optimized massive-scale operation around, it’s Apple’s iPhone production. If and when its production chain comes toppling down, it will not be a black or a gray swan event, it will be snow-white, which is why I suspect (or, as an iPhone user, hope) they have contingencies.

And in practicing what I preach, I have slowly been transitioning away from GTD levels of hyper-productivity and into a 40,000 weeks mindset. Whether this is a sign of wisdom, experience, or just plain old age, well, who is to say? Why not all three?


The holiday season starts as soon as I get my first email from the Wikimedia Foundation, reminding me that “last year, you donated x dollars…”. And those emails work, despite some recent made-up controversies. Wikipedia is the wonder of the modern world.


Skandalfreude: on the joys of being scandalized

There are books you read once and toss out See also: anything by Malcolm Gladwell, Michael Lewis, or any other permanent airport book store resident. and those which keep on giving, and René Girard’s I See Satan Fall Like Lightning falls firmly in the latter category.

One concept I have come to appreciate more thanks to the book is that of the scandal. As in: “the reaction of moral outrage and indignation about a real or perceived transgression of social norms”, not the TV show. Though I’m sure the show has its fans. Girard has a book — or rather, a collection of essays — with “scandal” in the name, but both he and Luke Burgis focus more on mimetic desire and how it can lead to conflict; the build-up of scandal is “just” a stepping stone, something natural and completely expected of humans. This may be related to their catholicism, but I am neither a theologian nor a philosopher, so I’ll refrain from speculating further.

What I won’t refrain from, however, is flipping my thinking from human desire causing scandal to the human desire for scandal. It a phenomenon not exactly like, but closely related to, schadenfreude — the pleasure in the misfortune of others. In fact, a search for Skandalfreude does return some relevant hits — one of them from Stefan Zweig, no less — so let’s use that word to describe the pleasure in being scandalized or, more broadly, the desire to be scandalized. And I suspect that, similar to schadenfreude, there is a Gaussian curve of people’s propensity for experiencing it in general and, flipping the axes, a Gaussian curve of the number of people with the propensity for it when it comes to a particular topic.

The first bell curve pits the “Karens”, I am misusing the term Karen here, and contributing to it becoming a suitcase word. For this, I apologize. You could replace it with “the woke” in your mind’s eye — again, a misuse! — and get the same intended result. Funny how that works.scandalized by everything, opposite the phlegmatics, scandalized by nothing. On the ends of the second curve lie the haters — who think that a particular company, person, ideology, etc. is evil — and the fanboys, to whom that same entity can do no wrong.

Regarding the second, entity-based curve: the higher the profile, the fatter the tails. This we all know intuitively. When Apple causes an uproar for their “shot on the iPhone” Scary Fast event, it is the skandalfreude fat tail poking its head. Knowing how many people pour over every one of Apple’s actions just wanting to be scandalized — to the point of paying to be scandalized — it is a small miracle these storms in a teacup don’t happen more often.

So with those two curves in mind, Girard’s insight is this: mimetic desire makes only one of their ends “sticky”, the one that “likes” scandal. With a critical mass, it is no longer a bell-shaped curve at all, but leans towards the exponential. The entity to which this happens becomes a scapegoat and is stoned to death, in the true sense millennia ago, nowadays only metaphorically. This is why Apple and the rest of Big Tech should tread lightly: their tails are fat enough that even small missteps, or perceived missteps, cause controversy. A full-blown mistake or, worse yet, a true transgression, would hurl the skandalfreudeian mass Or has that already happened?towards the deep end and make them into a scapegoat for all of the society’s ills.

The practice of doomscrolling could be viewed in this light: trawling thorough the timeline, waiting for the next opportunity to experience some skandalfreude, maybe even jump onto a stoning bandwagon or two, at no personal cost. Much of online behavior seems less bizzare when viewed through this mental model, and for that alone it is a good one to have.


On this day 85 years ago, at 8pm Eastern Time, Orson Welles performed The War of the Worlds live on radio. Things escalated quickly. I will give major nerd points to Apple for even an oblique reference to the radio drama at tonight’s event, but I am not holding my breath.