this post was submitted on 18 Mar 2024

550 points (98.1% liked)

Science Memes

10531 readers

2148 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules

Don't throw mud. Behave like an intellectual and remember the human.
Keep it rooted (on topic).
No spam.
Infographics welcome, get schooled.

Research Committee

!spiders@lemmy.world

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago

MODERATORS

Sal@mander.xyz

fossilesque@mander.xyz

SciBot@mander.xyz

550

Huh (sh.itjust.works)

submitted 6 months ago by Sixth0795@sh.itjust.works to c/science_memes@mander.xyz

48 comments fedilink hide all child comments

top 48 comments

sorted by: hot top controversial new old

[–] Pyro@programming.dev 172 points 6 months ago* (last edited 6 months ago) (3 children)

GPT doesn't really learn from people, it's the over-correction by OpenAI in the name of "safety" which is likely to have caused this.

[–] lugal@sopuli.xyz 67 points 6 months ago (1 children)

I assumed they reduced capacity to save power due to the high demand

[–] MalReynolds@slrpnk.net 49 points 6 months ago (1 children)

This. They could obviously reset to original performance (what, they don't have backups?), it's just more cost-efficient to have crappier answers. Yay, turbo AI enshittification...

[–] CommanderCloon@lemmy.ml 40 points 6 months ago (1 children)

Well they probably did power down the performance a bit but censorship is known to nuke LLM's performance as well

[–] MalReynolds@slrpnk.net 11 points 6 months ago

True, but it's hard to separate, I guess.

[–] rtxn@lemmy.world 46 points 6 months ago* (last edited 6 months ago) (1 children)

Sounds good, let's put it in charge of cars, bombs, and nuclear power plants!

[–] OpenStars@startrek.website 10 points 6 months ago (2 children)

Even getting 2+2=2 98% of the time is good enough for that. :-P

spoiler

(wait, 2+2 is what now?)

[–] lugal@sopuli.xyz 18 points 6 months ago (1 children)

2+2 isn't 5 anymore? Literally 1985

[–] OpenStars@startrek.website 9 points 6 months ago

Stop trying to tell the computer what to do - it should be free to act however it wants to! :-P

[–] FiniteBanjo@lemmy.today 2 points 6 months ago (1 children)

It used to get 98%, now it only gets 2%.

2% is not good enough.

[–] OpenStars@startrek.website 2 points 6 months ago

I mean... some might argue that even 98% wasn't enough!? :-D

What are people supposed to - ask every question 3 times and take the best 2 out of 3, like this was kindergarten? (and that is the best-case scenario, where the errors are entirely evenly distributed across the entire problem space, which is the absolute lowest likelihood model there - much more often some problems would be wrong 100% of the time, while others may be correct more like 99% of the time, but importantly you will never know in advance which is which)

Actually that does on a real issue: some schools teach the model of "upholding standards" where like the kids actually have to know stuff (& like, junk, yeah totally) - whereas conversely another, competing model is where if they just learn something, anything at all during the year, that that is good enough to pass them and make them someone else's problem down the line (it's a good thing that professionals don't need to uh... "uphold standards", right? anyway, the important thing there is that the school still receives the federal funding in the latter case but not the former, and I am sure that we all can agree that when it comes to the next generation of our children, the profits for the school administrators are all that matters... right? /s)

All of this came up when Trump appointed one of his top donors, Betsy Devos to be in charge of all edumacashium in America, and she had literally never stepped foot inside of a public school in her entire lifetime. I am not kidding you, watch the Barbara Walters special to hear it from her own mouth. Appropriately (somehow), she had never even so much as heard of either of these two main competing models. Yet she still stepped up and acknowledged that somehow she, as an extremely wealthy (read: successful) white woman, she could do that task better than literally all of the educators in the entire nation - plus all those with PhDs in education too, ~~jeering~~ cheering her on from the sidelines.

Anyway, why we should expect "correctness" from an artificial intelligence, when we cannot seem to find it anywhere among humans either, is beyond me. These were marketing gimmicks to begin with, then we all rushed to ask it to save us from the enshittification of the internet. It was never going to happen - not this soon, not this easily, not this painlessly. Results take real effort.

[–] Redward@yiffit.net 16 points 6 months ago

Just for the fun of it, I argued with chatgpt saying it’s not really a self learning ai, 3.5 agreed that it’s a not a fully function ai with limited powers. 4.0 on the other hand was very adamant about being fully fleshed Ai

[–] AnUnusualRelic@lemmy.world 95 points 6 months ago (2 children)

Amazing, it's getting closer to human intelligence all the time!

[–] MotoAsh@lemmy.world 13 points 6 months ago

The more I talk to people the more I realize how low that bar is. If AI doesn't take over soon, we'll kill ourselves anyways.

[–] Dasus@lemmy.world 1 points 6 months ago

I mean, I could argue that it learned not to piss off stupid people by showing them how math the stoopids didn't understand.

[–] Limeey@lemmy.world 75 points 6 months ago (1 children)

It all comes down to the fact that LLMs are not AGI - they have no clue what they’re saying or why or to whom. They have no concept of “context” and as a result have no ability to “know” if they’re giving right info or just hallucinating.

[–] Benaaasaaas@lemmy.world 1 points 6 months ago

Hey, but if Sam says it might be AGI he might get a trillion dollars so shut it /s

[–] UnRelatedBurner@sh.itjust.works 30 points 6 months ago (1 children)

Kind of a clickbait title

"In March, GPT-4 correctly identified the number 17077 as a prime number in 97.6% of the cases. Surprisingly, just three months later, this accuracy plunged dramatically to a mere 2.4%. Conversely, the GPT-3.5 model showed contrasting results. The March version only managed to answer the same question correctly 7.4% of the time, while the June version exhibited a remarkable improvement, achieving an 86.8% accuracy rate."

source: https://techstartups.com/2023/07/20/chatgpts-accuracy-in-solving-basic-math-declined-drastically-dropping-from-98-to-2-within-a-few-months-study-finds/

[–] angrymouse@lemmy.world 46 points 6 months ago (4 children)

Not everything is a click bait. Your explanation is great but the tittle is not lying, is just an simplification, titles could not contain every detail of the news, they are still tittles, and what the tittle says can be confirmed in your explanation. The only think I could've made different is specified that was a gpt-4 issue.

Click bait would be "chat gpt is dying" or so.

[–] andrewta@lemmy.world 9 points 6 months ago (1 children)

I think that's title not tittle

[–] Krauerking@lemy.lol 5 points 6 months ago

I said tittle out loud for each tittle in that comment. I think they got it right cause it was very titillating.

[–] TrickDacy@lemmy.world 9 points 6 months ago* (last edited 6 months ago)

Mmmmm, titt les

[–] overcast5348@lemmy.world 8 points 6 months ago (1 children)

Tittles are the little dots above i and j, that's why you weren't autocorrected. You're looking for "title" though.

[–] angrymouse@lemmy.world 4 points 6 months ago

Thanks for pointing out, I actually learned something.

[–] A_Very_Big_Fan@lemmy.world 2 points 6 months ago

Oversimplified to the point of lying you could say

[–] BennyHill@lemmy.ml 30 points 6 months ago

ChatGPT went from high school student to boomer brain in record time.

[–] Hotzilla@sopuli.xyz 22 points 6 months ago* (last edited 6 months ago)

I have seen the same thing, gpt4 was originally able to handle more complex coding tasks, GPT4-turbo is not able to do it anymore. I have creative coding test that I have tested many LLM's with, and only original gpt4 was able to solve it. Current one fails miserable with it.

[–] helpImTrappedOnline@lemmy.world 16 points 6 months ago

Perhaps this AI thing is just a sham and there are tiny gnomes in the servers answering all the questions as fast as they can. Unfortuanlty, there are not enough qualified tiny gnomes to handle the increased work load. They have begun to outsource to the leprechauns who run the random text generators.

Luckily the artistic hypersonic orcs seem to be doing fine...for the most part

[–] Mikufan@ani.social 15 points 6 months ago (1 children)

Yeah it now shows the mathematics as a python script so you can see where it does wrong.

[–] OpenStars@startrek.website 11 points 6 months ago (2 children)

How ironic... people now need to learn a computer language in order to understand the computer? (instead of so that the computer can understand people)

[–] Mikufan@ani.social 8 points 6 months ago

Eh its not that hard to understand that scrips, its basically math...

But yes.

[–] Wanderer@lemm.ee 1 points 6 months ago (1 children)

I get how chat GPT works [really I don't] but what I don't get is why they don't put add ons into it.

Like a: is this a math question? Okay it goes to the wolfram alpha system otherwise it goes to the LLM.

[–] OpenStars@startrek.website 2 points 6 months ago

That would only solve the purely math parts. So it would solve "2+2=?", but it would not solve "two plus two equals?".

And even if it did, don't miss the fact that this is an indicator of more foundational problems that lie beneath. Like if you ever wake up and your clock is wrong, you might want to find out why - perhaps its battery is low, and if so, it will never get any better as long as you and it live, until you deal with that. Or maybe you had a power outage, and a bunch of things could have gone wrong in relation to that (is your pilot light out, are you now leaking gas everywhere?)

Here's a funny popular-culture take on that: https://www.youtube.com/watch?v=VemLkVbsmz0.

[–] shiroininja@lemmy.world 15 points 6 months ago (2 children)

Originally, it was people answering the questions. Now it’s the actual tech doing it Lmao

[–] Omega_Haxors@lemmy.ml 6 points 6 months ago* (last edited 6 months ago) (1 children)

AI fudging is notoriously common. Just ask anyone who lived in the 3rd world what working was like in their country and they'll animate with stories of how many times they were approached by big tech companies to roleplay as an AI.

[–] NigelFrobisher@aussie.zone 3 points 6 months ago (1 children)

A colleague of mine worked for an AI firm a few years back. The AI was a big room of older women with keyboards.

[–] Omega_Haxors@lemmy.ml 2 points 6 months ago

PLEASE IGNORE THE NANS BEHIND THE CURTAIN

[–] TurtleJoe@lemmy.world 2 points 6 months ago

It's often still people in developing countries answering the questions.

[–] Omega_Haxors@lemmy.ml 8 points 6 months ago (1 children)

This is a result of what is known as oversampling. When you zoom in really close and make one part of a wave look good, it makes the rest of the wave go crazy. This is what you're seeing; the team at OpenAI tried super hard to make a good first impression and nailed that, but then once some time started to pass things started to quickly fall apart.

[–] someguy3@lemmy.ca 1 points 6 months ago (1 children)

So they made the math good, and now that they're trying to make the rest good it's screwing up the math?

[–] Omega_Haxors@lemmy.ml 1 points 6 months ago

It's more they focused super hard on making it have a good first impression that they gave no consideration to what would happen long-term.

[–] Jumi@lemmy.world 4 points 6 months ago

The AI feels good, much slower than before

[–] pewgar_seemsimandroid@lemmy.blahaj.zone 4 points 6 months ago

human want make thing dumb

[–] EarMaster@lemmy.world 2 points 6 months ago (2 children)

I am wondering why it adds up to exactly 100%. There has to be some creative data handling happened with these numbers.

[–] MeatPilot@lemmy.world 5 points 6 months ago* (last edited 6 months ago)

Maybe the article/stats were generated using another LLM?

[–] Gabu@lemmy.world 1 points 6 months ago* (last edited 6 months ago) (1 children)

People like you are why Mt. Everest had two feet added to its actual height so as to not seem too perfect.

[–] EarMaster@lemmy.world 1 points 6 months ago (1 children)

No I'm not. Why would I use feet to measure a mountain's height?

[–] Gabu@lemmy.world 1 points 6 months ago

Peak XV (measured in feet) was calculated to be exactly 29,000 ft (8,839.2 m) high, but was publicly declared to be 29,002 ft (8,839.8 m) in order to avoid the impression that an exact height of 29,000 feet (8,839.2 m) was nothing more than a rounded estimate.

https://en.wikipedia.org/wiki/Mount_Everest#surveys