Monday, 14 July 2025

Trying not to be a Gell-Mann Amnesiac

I sometimes wonder how much Gell-Mann Amnesia people experience. Paraphrasing Crichton, when you're a domain expert, you'll sometimes read an article that gets every aspect of your field completely and absurdly wrong, have a little laugh about it... then keep on reading and trusting articles that are about other fields, even from the same publication or writer.

As if they're some pure spring of wisdom which only coughed out a lump of mud when it came to the thing you happen to know about.

It's just an idea from a novelist, not the kind of cognitive bias that's supported by real-world studies that I know of, but you have to admit that it has a kind of... truthiness to it.

Stack this up with Dunning-Kruger and it's easy to become cynical. You might decide that actually, all the loudest voices are talking complete nonsense, all of the time. That might be too far. But I do think it pays to put deliberate hard effort into distinguishing domain experts from overconfident bullshitting pundits.

Now, anyone with their ear to the ground and a weather eye out for Gell-Mann Amnesia should have arrived at the obvious conclusion about generative AI. To wit, that the current state of the technology is that it is an overconfident bullshitter.

On being a piece of software and being confidently wrong

The case studies are easy to find, and the ones from domain experts sound pretty different from the ones from the tech industry and the reporters too busy and/or demoralised to do more than repackage their press releases as articles.

➡️ I am not a historian. The historians I've read say genAI gets softball history questions mostly right and deep ones mostly wrong. Sometimes subtly, sometimes dramatically. It just makes things up when the evidence is scarce. It makes errors of commission and omission as well as having misplaced focus and drawing weird conclusions from premises.

➡️ I am not an artist. The artists I listen to say genAI art looks bland and awful and organic because it doesn't understand composition or anatomy or separate objects (because it doesn't 'understand' anything). It can't make an image that isn't well-represented in the training data, like a camel and a steampunk automaton jousting from the backs of sumo wrestlers. Same in other kinds of media: filmmakers say genAI can't do film because it can't take direction or keep track of characters or have a consistent shot.

➡️ I am not a Wikipedia editor (except incidentally). Earlier this year there was a wretched moment when the Wikipedia editors were going to have genAI article summaries foisted on them, although I think that's turned around now. The skilled editors pointed out that the LLM summaries generally ranged from 'bad' to 'worthless' by Wiki standards: they didn't meet the tone requirements, left out key details or included incidental ones, injected "information" that wasn't in the article, and so on.

➡️ I am not a manager. The managers say genAI can't even collate timesheets reliably.

➡️ I am not a novelist. The novelists say a genAI book reads like a statistical summary of all creative writing anyone has ever done, including all the embarrassing teenage fanfiction. It sucks at originality. And because it doesn't have an internal model or understanding of its outputs, it can't keep track of things and make a coherent satisfying story. Things are vague, tropey, or contradictory.

➡️ I am not a lawyer. The lawyers are, um, well, by the sound of it a lot of them are being sanctioned for using generative AI to cite completely nonexistent caselaw. (☉__☉”)

➡️ I am not a public policy wonk. The bureaucratic wonks note that genAI can't summarise text. It shortens it and fills in the gaps with median seems-plausible-to-me pablum. The kind you get when you average out everything anyone has ever written on the internet. If you try to have an LLM summarise or draw conclusions from a study, it will usually do a bad job, fabricating statements more along the lines of what an average person would guess if they'd only read the study's title.

➡️ I am not a software engineer. The software engineers seem to have mixed opinions. They say that genAI works as code autocomplete (something that has existed for fifty years, but this new kind has pretty sophisticated lookahead, neat). At least some are saying it can't do principled software engineering, it introduces security flaws, its performance drops off for obscure languages, it overconfidently generates bad code, it plagiarises from code repositories that it doesn't have the rights to...

I could go on.

I'm no longer a domain expert in anything, this many years after my stint in academia. I think I'm halfway to being an expert in a few different areas, though. I deliberately concocted some thoughtful questions at the intersection of those areas, just to see.

For example, I asked about the (obvious) mapping of choose-your-path text adventure books onto mathematical graph structures, which the LLM chatbot identified. I followed up with technical questions about the features of those graphs in context: what would the game be like if they weren't digraphs, would you expect cyclic vs acyclic, would a finite state machine be more appropriate and if so why, etc.

And lo, the generative AI output was absurdly, hopelessly, and confidently wrong when given questions that needed expertise.

A lot of people with a lot of money would like you to think that genAI chatbots are going to fundamentally change the world by being brilliant at everything. From the sidelines, it doesn't feel like that's going to work out.

Sometimes I read posts from experts along the lines of

"I've noticed it's almost worthless at [my field], but it sounds like it's pretty useful for [other thing]."

But less so lately, maybe?

So I'm left wondering: are people experiencing massive Gell-Mann Amnesia about these chatbots? Or does everybody know that the emperor has no clothes?

(But oh no, we've invested so, so, so very much money into the emperor's finery, and all the wealthiest people at the imperial court agree: pleeeease could you keep squinting to see this amazing new clothing?)

 

No comments:

Post a Comment