this post was submitted on 18 Nov 2024

21 points (100.0% liked)

TechTakes

1425 readers

289 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago

MODERATORS

dgerard@awful.systems

Stubsack: weekly thread for sneers not worth an entire post, week ending 24th November 2024 (awful.systems)

submitted 4 days ago by BlueMonday1984@awful.systems to c/techtakes@awful.systems

140 comments fedilink hide all child comments

Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

Last week's thread

(Semi-obligatory thanks to @dgerard for starting this)

top 50 comments

sorted by: hot top controversial new old

[–] khalid_salad@awful.systems 1 points 23 minutes ago

how come every academic I have worked with has given me some variation of

they already have all of my data, I don't really care about my privacy

i'm in computer science 🙃

[–] YourNetworkIsHaunted@awful.systems 4 points 2 hours ago (1 children)

Never thought I'd die fighting alongside a League of Legends fan.

How about an artist valuer?

Aye. That I could do.

[–] BlueMonday1984@awful.systems 3 points 2 hours ago (1 children)

You just know Netflix's inbox is getting flooded with the absolute worst shit League of Legends players can come up with right now

[–] YourNetworkIsHaunted@awful.systems 4 points 2 hours ago

And having played more LoL than I care to admit in high school, that's some truly vile shit. If only it actually made it through the filters to whoever actually made the relevant choices.

[–] gerikson@awful.systems 9 points 15 hours ago (5 children)

Dude discovers that one LLM model is not entirely shit at chess, spends time and tokens proving that other models are actually also not shit at chess.

The irony? He's comparing it against Stockfish, a computer chess engine. Computers playing chess at a superhuman level is a solved problem. LLMs have now slightly approached that level.

For one, gpt-3.5-turbo-instruct rarely suggests illegal moves,

Writeup https://dynomight.net/more-chess/

HN discussion https://news.ycombinator.com/item?id=42206817

[–] BigMuffin69@awful.systems 4 points 2 hours ago* (last edited 1 hour ago)

I remember when several months (a year ago?) when the news got out that gpt-3.5-turbo-papillion-grumpalumpgus could play chess around ~1600 elo. I was skeptical the apparent skill wasn't just a hacked-on patch to stop folks from clowning on their models on xitter. Like if an LLM had just read the instructions of chess and started playing like a competent player, that would be genuinely impressive. But if what happened is they generated 10^12 synthetic games of chess played by stonk fish and used that to train the model- that ain't an emergent ability, that's just brute forcing chess. The fact that larger, open-source models that perform better on other benchmarks, still flail at chess is just a glaring red flag that something funky was going on w/ gpt-3.5-turbo-instruct to drive home the "eMeRgEnCe" narrative. I'd bet decent odds if you played with modified rules, (knights move a one space longer L shape, you cannot move a pawn 2 moves after it last moved, etc), gpt-3.5 would fuckin suck.

Edit: the author asks "why skill go down tho" on later models. Like isn't it obvious? At that moment of time, chess skills weren't a priority so the trillions of synthetic games weren't included in the training? Like this isn't that big of a mystery...? It's not like other NN haven't been trained to play chess...

[–] YourNetworkIsHaunted@awful.systems 4 points 2 hours ago

Particularly hilarious at how thoroughly they're missing the point. The fact that it suggests illegal moves at all means that no matter how good it's openings are the scaling laws and emergent behaviors haven't magicked up an internal model of the game of Chess or even the state of the chess board it's working with. I feel like playing games is a particularly powerful example of this because the game rules provide a very clear structure to model and it's very obvious when that model doesn't exist.

[–] sailor_sega_saturn@awful.systems 5 points 4 hours ago* (last edited 4 hours ago)

Here are the results of these three models against Stockfish—a standard chess AI—on level 1, with a maximum of 0.01 seconds to make each move

I'm not a Chess person or familiar with Stockfish so take this with a grain of salt, but I found a few interesting things perusing the code / docs which I think makes useful context.

Skill Level

I assume "level" refers to Stockfish's Skill Level option.

If I mathed right, Stockfish roughly estimates Skill Level 1 to be around 1445 ELO (source). However it says "This Elo rating has been calibrated at a time control of 60s+0.6s" so it may be significantly lower here.

Skill Level affects the search depth (appears to use depth of 1 at Skill Level 1). It also enables MultiPV 4 to compute the four best principle variations and randomly pick from them (more randomly at lower skill levels).

Move Time & Hardware

This is all independent of move time. This author used a move time of 10 milliseconds (for stockfish, no mention on how much time the LLMs got). ... or at least they did if they accounted for the "Move Overhead" option defaulting to 10 milliseconds. If they left that at it's default then 10ms - 10ms = 0ms so 🤷‍♀️.

There is also no information about the hardware or number of threads they ran this one, which I feel is important information.

Evaluation Function

After the game was over, I calculated the score after each turn in “centipawns” where a pawn is worth 100 points, and ±1500 indicates a win or loss.

Stockfish's FAQ mentions that they have gone beyond centipawns for evaluating positions, because it's strong enough that material advantage is much less relevant than it used to be. I assume it doesn't really matter at level 1 with ~0 seconds to produce moves though.

Still since the author has Stockfish handy anyway, it'd be interesting to use it in it's not handicapped form to evaluate who won.

[–] sc_griffith@awful.systems 10 points 7 hours ago

LLMs sometimes struggle to give legal moves. In these experiments, I try 10 times and if there’s still no legal move, I just pick one at random.

uhh

[–] pikesley@mastodon.me.uk 6 points 15 hours ago

@gerikson @BlueMonday1984 the only analysis of computer chess anybody needs https://youtu.be/DpXy041BIlA?si=a1vU3zmOWs8UqlSQ

[–] misterbngo@awful.systems 9 points 20 hours ago

Stack overflow now with the sponsored crypto blogspam Joining forces: How Web2 and Web3 developers can build together

I really love the byline here. "Kindest view of one another". Seething rage at the bullshittery these "web3" fuckheads keep producing certainly isn't kind for sure.

[–] swlabr@awful.systems 8 points 1 day ago (1 children)

Strap in and start blasting the Depeche Mode.

[–] sailor_sega_saturn@awful.systems 10 points 21 hours ago

When the reporter entered the confessional, AI Jesus warned, “Do not disclose personal information under any circumstances. Use this service at your own risk.

Do not worry my child, for everything you say in this hallowed chamber is between you, AI Jesus, and the army of contractors OpenAI hires to evaluate the quality of their LLM output.

[–] self@awful.systems 11 points 1 day ago* (last edited 1 day ago)

a better-thought-out announcement is coming later today, but our WriteFreely instance at gibberish.awful.systems has reached a roughly production-ready state (and you can hack on its frontend by modifying the templates, pages, static, and less directories in this repo and opening a PR)! awful.systems regulars can ask for an account and I'll DM an invite link!

[–] swlabr@awful.systems 6 points 1 day ago

https://xcancel.com/booritney/status/1851717036424233437#m

[–] dgerard@awful.systems 14 points 1 day ago (7 children)

The mask comes off at LWN, as two editors (jake and corbet) dive in to frantically defend the honour of Justine fucking Tunney against multiple people pointing out she's a Nazi who fills her projects with racist dogwhistles

https://lwn.net/Articles/998196/

[–] saucerwizard@awful.systems 1 points 17 minutes ago

Not the only trans NRXer to pull this I’m afraid.

[–] slopjockey@awful.systems 8 points 7 hours ago

[–] slopjockey@awful.systems 6 points 7 hours ago

Is Google lacing their free coffee??? How could a woman with at least one college degree believe that the government is even mechanically capable of dissolving into a throne for Eric Schmidt.

[–] self@awful.systems 13 points 1 day ago (1 children)

fuck me that is some awful fucking moderation. I can’t imagine being so fucking bad at this that I:

dole out a ban for being rude to a fascist
dole out a second ban because somebody in the community did some basic fucking due diligence and found out one of the accounts defending the above fascist has been just a gigantic racist piece of shit elsewhere, surprise
in the process of the above, I create a safe space for a fascist and her friends

but for so many of these people, somehow that’s what moderation is? fucking wild, how the fuck did we get here

[–] YourNetworkIsHaunted@awful.systems 8 points 18 hours ago* (last edited 15 hours ago)

See, you're assuming the goal of moderation is to maintain a healthy social space online. By definition this excludes fascists. It's that old story about how to make sure your punk bar doesn't turn into a nazi punk bar. But what if instead my goal is to keep the peace in my nazi punk bar so that the normies and casuals keep filtering in and out and making me enough money that I can stay in business? Then this strategy makes more sense.

[–] FRACTRANS@awful.systems 6 points 1 day ago (1 children)

https://lwn.net/Articles/998435/ sigh

[–] dgerard@awful.systems 7 points 1 day ago (1 children)

Centrists Don't Fucking Be Like This challenge not achieved yet again

https://social.kernel.org/notice/AoGpED4fw3LSGhxTLU

[–] froztbyte@awful.systems 5 points 1 day ago (1 children)

fwiw this link didn't jump me to a specific reply (if you meant to highlight a particular one)

[–] FRACTRANS@awful.systems 7 points 1 day ago

It didn’t scroll for me either but there’s a reply by this corbet person with a highlighted background which I assume is the one intended to be linked to

[–] Soyweiser@awful.systems 8 points 1 day ago (2 children)

Post by Corbet the editor. "We get it: people wish that we had not highlighted work by this particular author. Had we known more about the person in question, we might have shied away from the topic. But the article is out now, it describes a bit of interesting technology, people have had their say, please let's leave it at that."

So you updated the article to reflect this right? padme.jpg

[–] mii@awful.systems 9 points 11 hours ago (1 children)

Seems like they've actually done this now. There's a preface note now.

This topic was chosen based on the technical merit of the project before we were aware of its author's political views and controversies. Our coverage of technical projects is never an endorsement of the developers' political views. The moderation of comments here is not meant to defend, or defame, anybody, but is in keeping with our longstanding policy against personal attacks. We could certainly have handled both topic selection and moderation better, and will endeavor to do so going forward.

Which is better than nothing, I guess, but still feels like a cheap cop-out.

Side-note: I can actually believe that they didn't know about Justine being a fucking nazi when publishing this, because I remember stumbling across some of her projects and actually being impressed by it, and then I found out what an absolute rabbit hole of weird shit this person is. So I kinda get seeing the portable executables project, thinking, wow, this is actually neat, and running with it.

Not that this is an excuse, because when you write articles for a website that should come with a bit of research about the people and topic you choose to cover and you have a bit more responsibility than someone who's just browsing around, but what do I know.

[–] Soyweiser@awful.systems 5 points 8 hours ago

Well, at least they put down something. More than I expected.

And doing research on people? In this economy?

[–] froztbyte@awful.systems 8 points 1 day ago

so is corbet the same kind of fucker that'll complain "everything is so political nowadays"? it seems like they are

[–] pikesley@mastodon.me.uk 7 points 1 day ago (1 children)

@dgerard @BlueMonday1984 also, and I know this is way beside the point, update the design of your website, motherfuckers

[–] BlueMonday1984@awful.systems 3 points 9 hours ago

I don't run any websites, what are you coming at me for

load more comments