Want to wade into the snowy surf of the abyss? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid.
Welcome to the Stubsack, your first port of call for learning fresh Awful youāll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cutānāpaste it into its own post ā thereās no quota for posting and the bar really isnāt that high.
The post Xitter web has spawned so many āesotericā right wing freaks, but thereās no appropriate sneer-space for them. Iām talking redscare-ish, reality challenged āculture criticsā who write about everything but understand nothing. Iām talking about reply-guys who make the same 6 tweets about the same 3 subjects. Theyāre inescapable at this point, yet I donāt see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldnāt be surgeons because they didnāt believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I canāt escape them, I would love to sneer at them.
(Credit and/or blame to David Gerard for starting this. A lot of people didnāt survive January, but at least we did. This also ended up going up on my accountās cake day, too, so thatās cool.)


I liked this takedown of METRās task horizon āresearchā: https://arachnemag.substack.com/p/the-metr-graph-is-hot-garbage
In addition to all the complaints I already knew of and had, METRās methodology for human baselining of tasks was even worse than I realized.
And you know⦠I actually kind of respect METR relative to a lot of boosters and doomers for at least attempting any hard numbers and not just vibes and anecdotes (METR is the ones that did the study showing LLMs actually reduced coders productivity even as it made them think it increased). But the standard for quantifying LLM performance in practical terms is absurdly low.
i thought they were yet another rationalist offshoot
They absolutely are. I am just giving them a tiny bit of credit for at least attempting academic research on LLM performance. But only a tiny bit, as they blog post I link discusses, their methodology is really sloppy and not to the level of most academic research and wouldnāt get through peer review of most decent journals.
that one METR graph was also a big part of the AI 2027 āpredictionā
It was basically the only āempiricalā (scare quotes well earned) data they actually used in their āmodelā, even then, they decided exponential improvement wasnāt good enough, they plugged it into a hyper-exponential model that went to infinity at just a few years regardless of the inputs.
yeah lmfao it was bad. I thoroughly enjoyed titotalās takedown of that graph. I canāt believe the documentary versions of that paper on youtube have millions of views and people eating it up
Those comment sections are gonna be a joy when 2027 and 2028 roll around