how come every academic I have worked with has given me some variation of
they already have all of my data, I don't really care about my privacy
i'm in computer science 🙃
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
how come every academic I have worked with has given me some variation of
they already have all of my data, I don't really care about my privacy
i'm in computer science 🙃
Never thought I'd die fighting alongside a League of Legends fan.
Aye. That I could do.
You just know Netflix's inbox is getting flooded with the absolute worst shit League of Legends players can come up with right now
And having played more LoL than I care to admit in high school, that's some truly vile shit. If only it actually made it through the filters to whoever actually made the relevant choices.
Dude discovers that one LLM model is not entirely shit at chess, spends time and tokens proving that other models are actually also not shit at chess.
The irony? He's comparing it against Stockfish, a computer chess engine. Computers playing chess at a superhuman level is a solved problem. LLMs have now slightly approached that level.
For one, gpt-3.5-turbo-instruct rarely suggests illegal moves,
Writeup https://dynomight.net/more-chess/
HN discussion https://news.ycombinator.com/item?id=42206817
I remember when several months (a year ago?) when the news got out that gpt-3.5-turbo-papillion-grumpalumpgus could play chess around ~1600 elo. I was skeptical the apparent skill wasn't just a hacked-on patch to stop folks from clowning on their models on xitter. Like if an LLM had just read the instructions of chess and started playing like a competent player, that would be genuinely impressive. But if what happened is they generated 10^12 synthetic games of chess played by stonk fish and used that to train the model- that ain't an emergent ability, that's just brute forcing chess. The fact that larger, open-source models that perform better on other benchmarks, still flail at chess is just a glaring red flag that something funky was going on w/ gpt-3.5-turbo-instruct to drive home the "eMeRgEnCe" narrative. I'd bet decent odds if you played with modified rules, (knights move a one space longer L shape, you cannot move a pawn 2 moves after it last moved, etc), gpt-3.5 would fuckin suck.
Edit: the author asks "why skill go down tho" on later models. Like isn't it obvious? At that moment of time, chess skills weren't a priority so the trillions of synthetic games weren't included in the training? Like this isn't that big of a mystery...? It's not like other NN haven't been trained to play chess...
Particularly hilarious at how thoroughly they're missing the point. The fact that it suggests illegal moves at all means that no matter how good it's openings are the scaling laws and emergent behaviors haven't magicked up an internal model of the game of Chess or even the state of the chess board it's working with. I feel like playing games is a particularly powerful example of this because the game rules provide a very clear structure to model and it's very obvious when that model doesn't exist.
Here are the results of these three models against Stockfish—a standard chess AI—on level 1, with a maximum of 0.01 seconds to make each move
I'm not a Chess person or familiar with Stockfish so take this with a grain of salt, but I found a few interesting things perusing the code / docs which I think makes useful context.
Skill Level
I assume "level" refers to Stockfish's Skill Level option.
If I mathed right, Stockfish roughly estimates Skill Level 1 to be around 1445 ELO (source). However it says "This Elo rating has been calibrated at a time control of 60s+0.6s" so it may be significantly lower here.
Skill Level affects the search depth (appears to use depth of 1 at Skill Level 1). It also enables MultiPV 4 to compute the four best principle variations and randomly pick from them (more randomly at lower skill levels).
Move Time & Hardware
This is all independent of move time. This author used a move time of 10 milliseconds (for stockfish, no mention on how much time the LLMs got). ... or at least they did if they accounted for the "Move Overhead" option defaulting to 10 milliseconds. If they left that at it's default then 10ms - 10ms = 0ms so 🤷♀️.
There is also no information about the hardware or number of threads they ran this one, which I feel is important information.
Evaluation Function
After the game was over, I calculated the score after each turn in “centipawns” where a pawn is worth 100 points, and ±1500 indicates a win or loss.
Stockfish's FAQ mentions that they have gone beyond centipawns for evaluating positions, because it's strong enough that material advantage is much less relevant than it used to be. I assume it doesn't really matter at level 1 with ~0 seconds to produce moves though.
Still since the author has Stockfish handy anyway, it'd be interesting to use it in it's not handicapped form to evaluate who won.
LLMs sometimes struggle to give legal moves. In these experiments, I try 10 times and if there’s still no legal move, I just pick one at random.
uhh
@gerikson @BlueMonday1984 the only analysis of computer chess anybody needs https://youtu.be/DpXy041BIlA?si=a1vU3zmOWs8UqlSQ
Stack overflow now with the sponsored crypto blogspam Joining forces: How Web2 and Web3 developers can build together
I really love the byline here. "Kindest view of one another". Seething rage at the bullshittery these "web3" fuckheads keep producing certainly isn't kind for sure.
Strap in and start blasting the Depeche Mode.
When the reporter entered the confessional, AI Jesus warned, “Do not disclose personal information under any circumstances. Use this service at your own risk.
Do not worry my child, for everything you say in this hallowed chamber is between you, AI Jesus, and the army of contractors OpenAI hires to evaluate the quality of their LLM output.
a better-thought-out announcement is coming later today, but our WriteFreely instance at gibberish.awful.systems has reached a roughly production-ready state (and you can hack on its frontend by modifying the templates
, pages
, static
, and less
directories in this repo and opening a PR)! awful.systems regulars can ask for an account and I'll DM an invite link!
The mask comes off at LWN, as two editors (jake and corbet) dive in to frantically defend the honour of Justine fucking Tunney against multiple people pointing out she's a Nazi who fills her projects with racist dogwhistles
Not the only trans NRXer to pull this I’m afraid.
Is Google lacing their free coffee??? How could a woman with at least one college degree believe that the government is even mechanically capable of dissolving into a throne for Eric Schmidt.
fuck me that is some awful fucking moderation. I can’t imagine being so fucking bad at this that I:
but for so many of these people, somehow that’s what moderation is? fucking wild, how the fuck did we get here
See, you're assuming the goal of moderation is to maintain a healthy social space online. By definition this excludes fascists. It's that old story about how to make sure your punk bar doesn't turn into a nazi punk bar. But what if instead my goal is to keep the peace in my nazi punk bar so that the normies and casuals keep filtering in and out and making me enough money that I can stay in business? Then this strategy makes more sense.
Centrists Don't Fucking Be Like This challenge not achieved yet again
fwiw this link didn't jump me to a specific reply (if you meant to highlight a particular one)
It didn’t scroll for me either but there’s a reply by this corbet person with a highlighted background which I assume is the one intended to be linked to
Post by Corbet the editor. "We get it: people wish that we had not highlighted work by this particular author. Had we known more about the person in question, we might have shied away from the topic. But the article is out now, it describes a bit of interesting technology, people have had their say, please let's leave it at that."
So you updated the article to reflect this right? padme.jpg
Seems like they've actually done this now. There's a preface note now.
This topic was chosen based on the technical merit of the project before we were aware of its author's political views and controversies. Our coverage of technical projects is never an endorsement of the developers' political views. The moderation of comments here is not meant to defend, or defame, anybody, but is in keeping with our longstanding policy against personal attacks. We could certainly have handled both topic selection and moderation better, and will endeavor to do so going forward.
Which is better than nothing, I guess, but still feels like a cheap cop-out.
Side-note: I can actually believe that they didn't know about Justine being a fucking nazi when publishing this, because I remember stumbling across some of her projects and actually being impressed by it, and then I found out what an absolute rabbit hole of weird shit this person is. So I kinda get seeing the portable executables project, thinking, wow, this is actually neat, and running with it.
Not that this is an excuse, because when you write articles for a website that should come with a bit of research about the people and topic you choose to cover and you have a bit more responsibility than someone who's just browsing around, but what do I know.
Well, at least they put down something. More than I expected.
And doing research on people? In this economy?
so is corbet the same kind of fucker that'll complain "everything is so political nowadays"? it seems like they are
@dgerard @BlueMonday1984 also, and I know this is way beside the point, update the design of your website, motherfuckers
I don't run any websites, what are you coming at me for