TechTakes

1427 readers

115 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago

MODERATORS

dgerard@awful.systems

come see all the popular super-duper-autocomplete systems failing hard at really simple reasoning questions and babbling nonsense from latent space! (arxiv.org)

submitted 5 months ago by dgerard@awful.systems to c/techtakes@awful.systems

12 comments fedilink hide all child comments

top 12 comments

sorted by: hot top controversial new old

[–] 200fifty@awful.systems 31 points 5 months ago* (last edited 5 months ago)

This is my favorite LLM response from the paper I think:

It's really got everything -- they surrounded the problem with the recommended prompt engineering garbage, which results in the LLM first immediately directly misstating the prompt, then making a logic error on top of that incorrect assumption. Then when it tries to consider alternate possibilities it devolves into some kind of corporate-speak nonsense about 'inclusive language', misinterprets the phrase 'inclusive language', gets distracted and starts talking about gender identity, then makes another reasoning error on top of that! (Three to five? What? Why?)

And then as the icing on the cake, it goes back to its initial faulty restatement of the problem and confidently plonks that down as the correct answer surrounded by a bunch of irrelevant waffle that doesn't even relate to the question but sounds superficially thoughtful. (It doesn't matter how many of her nb siblings might identify as sisters because we already know exactly how many sisters she has! Their precise gender identity completely doesn't matter!)

Truly a perfect storm of AI nonsense.

[–] gerikson@awful.systems 24 points 5 months ago

But ChatGPT is like a really bright high-schooler, according to the AGI investment firm bro with the lin log chart!

[–] kbal@fedia.io 23 points 5 months ago (4 children)

This is why it's best to never admit that you're wrong on the Internet. If we start doing that the LLMs trained on our comments might learn to do the same, and then where would we be?

[–] MotoAsh@lemmy.world 13 points 5 months ago

It's OK, the pride of stupid people will guarantee there is always a large swathe of confidantly wrong answers out there even if the "AI"s don't hallucinate them.

That's why I knew LLMs alone would never cut it. They do ZERO logic, and humans who DO execute logic sometimes still get it horribly wrong a lot. It takes more than the equivalent of a dreamer's illogical dreamscape of relationships to produce logic, and LLMs are a far cry short of a dreamer...

[–] Soyweiser@awful.systems 12 points 5 months ago

Soon saying 'GPT, write me a speech' will end up giving you a speech that ends with "please like an subscribe, and don't forget to click the bell"

[–] froztbyte@awful.systems 8 points 5 months ago

Nah it’s all good. You can trip the dumb pieces of shit up with simple math - imagine what you could do with double negatives. And that’s presuming you stick to a single language…

the copypasta machine is just real bad in many ways, and it doesn’t take much to shove it over the edge[0]

[0] - reducing the surface area of this is one of oai’s primary actions/tasks, but it’s a losing battle: there’s always more humanity than they’ll have gotten around to coding synth rules for

[–] prex@aussie.zone 3 points 5 months ago

That does it: I'm boycotting /s

[–] swlabr@awful.systems 10 points 5 months ago* (last edited 5 months ago) (1 children)

Let them cook bro, another few billion dollars, maybe a few 10↑↑10 watt-hours, see what it says then

[–] sinedpick@awful.systems 4 points 5 months ago (1 children)

more ooms! MORE OOMS!

[–] blakestacey@awful.systems 7 points 5 months ago

OOM-pa lOOM-pa dOOM-pa dee doo / I've got a waste of carbon for you

[–] sinedpick@awful.systems 9 points 5 months ago (1 children)

This all but confirms that all those benchmark evals are in the training set right?

[–] dgerard@awful.systems 13 points 5 months ago

Some forms are - but many are not! The fun stuff is in Appendix 2, the responses.