this post was submitted on 15 Dec 2023
504 points (97.9% liked)

memes

9999 readers
2443 users here now

Community rules

1. Be civilNo trolling, bigotry or other insulting / annoying behaviour

2. No politicsThis is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent repostsCheck for reposts when posting a meme, you can only repost after 1 month

4. No botsNo bots without the express approval of the mods or the admins

5. No Spam/AdsNo advertisements or spam. This is an instance rule and the only way to live.

Sister communities

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] AllonzeeLV@lemmy.world 33 points 10 months ago (3 children)

As humanity has found yet another way to pass the buck, it'll be interesting to see the diminishing returns of LLMs as they begin to feed more and more on derivative content made by LLMs.

[–] jacksilver@lemmy.world 25 points 10 months ago (3 children)

It's interesting, because people say they can only get better, but I'm not sure that's true. What happens when most new text data is being generated by LLMs or we accidentally start labeling images created through diffusion as real. Seems like there is a potential for these models to implode.

[–] FierySpectre@lemmy.world 11 points 10 months ago (2 children)

They actually tested that, trained a model using only the outputs of the previous generation of model. It takes less iterations of that to completely lose quality than you'd think.

[–] jacksilver@lemmy.world 4 points 10 months ago

Do you have any links on that, it was something I had wanted to explore, but never had the time or money.

[–] WarmSoda@lemm.ee 3 points 10 months ago

They go insane pretty quickly don't they? As in it all just become a jumble.

[–] Ilovethebomb@lemm.ee 5 points 10 months ago

Given that people quite frequently try and present AI generated content as real, I'd say this will be a huge problem in the future.

[–] danielbln@lemmy.world 1 points 10 months ago

Microsoft has shown with Phi-2 (https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) that synthetic data generation can be a great source for training data.

[–] livus@kbin.social 13 points 10 months ago

Even before the LLMs, back when I was on reddit I would sometimes see conversations between bots that were 3 or 4 bots replying to each other with scraped content (usually in the personal advice subs) and getting upvotes.

I only noticed because I used to hunt bots as a hobby.

[–] Diplomjodler@feddit.de 6 points 10 months ago

It's cat farts all the way down.