this post was submitted on 06 Feb 2025
659 points (99.7% liked)

Technology

61850 readers
5212 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] meowmeowbeanz@sh.itjust.works 9 points 1 hour ago

Oh look, another tech giant treating open knowledge initiatives like their personal data buffet. Let me translate this corporate nonsense for you:

Meta: "We need training data for our AI!" Also Meta: Let's leech 81.7TB from a community project without contributing anything back.

The absolute audacity of downloading terabytes through torrents while their employees were internally admitting it was "legally problematic". And the best part? They couldn't even be bothered to seed properly - just grab and go, classic corporate behavior.

Remember when companies actually contributed to open source instead of just parasitically consuming it? But no, they'd rather burden volunteer-run projects with massive bandwidth costs while their lawyers probably bill more per hour than these projects' entire monthly budget.

Pro tip Meta: If you're going to pilfer knowledge from the commons, at least seed back properly. Your "move fast and break things" motto isn't supposed to apply to community archives.

[–] ad_on_is@lemm.ee 4 points 4 hours ago

If buying ain't owning, than downloading...

oh wait, that's our slogan

[–] bungalowtill@lemmy.dbzer0.com 14 points 7 hours ago

The Pirates of the Crown

[–] drascus@sh.itjust.works 45 points 10 hours ago

Just gotta love these big tech companies and their bullshit double standards.

[–] njordomir@lemmy.world 46 points 20 hours ago (2 children)

If someone was to acquire a few hundred gigs of books and feed them to something like paperless-ngx, would it work as a sort of google of books? Are there any software projects better suited for doing thisand understand synonyms and perhaps some context? I guess AI search but guided for the intermediate user.

Google is so bad lately. Basically every result is official sponsored corporate biased BS. It would be nice to be able to instantly query a bunch of ebooks.

[–] rumba@lemmy.zip 6 points 4 hours ago

GPT, Meta, Deepseek and Google have probably all been trained on the data.

The problem is, training on the data, and actually training for knowledge of the data are VERY different things.

https://www.youtube.com/watch?v=_GkHZQYFOGM

[–] werefreeatlast@lemmy.world 4 points 17 hours ago

Yes. This exactly.

[–] SpikesOtherDog@ani.social 73 points 23 hours ago (2 children)

https://phys.org/news/2010-11-million-dollar-verdict-music-piracy-case.html

In all fairness, meta should be assessed a fee of 250k per EACH pirated work.

This would amount to forfeiting all assets to doge.

[–] Grunt4019@lemm.ee 2 points 3 hours ago (1 children)

Assuming 2.6 MB per book.

81 TB would be 32,667,175 books.

At $250k per book that would come out to:

$8.17 trillion.

[–] nyan@lemmy.cafe 6 points 8 hours ago (1 children)

They might end up having to pay more money than exists on the planet at that rate.

[–] SpikesOtherDog@ani.social 5 points 4 hours ago* (last edited 4 hours ago)

Good

Edit - See Gary Bowser

[–] LEVI@feddit.org 146 points 1 day ago (1 children)

Anna's Archive: Mirror our database, help us preserve Humanity's knowledge

Facebook: I'll just torrent what I need, see yaa

These big tech monopolies are a curse to humanity..

[–] mox@lemmy.sdf.org 47 points 23 hours ago (1 children)

Facebook: I’ll just ~~torrent what I need~~ burden your underfunded project and volunteers with over 81 TB of bandwidth costs without contributing anything in return, see yaa

FTFY

[–] C126@sh.itjust.works 10 points 9 hours ago (1 children)

Yeah the least they could do is seed forever.

[–] Knock_Knock_Lemmy_In@lemmy.world 1 points 3 hours ago* (last edited 3 hours ago)

Agreed. Seed forever and release the AI weights and model. That would be fair payment.

The entirely of Annas archive would be an excellent benchmark training set. Particularly a cleaned processed dataset.

[–] Telorand@reddthat.com 287 points 1 day ago (5 children)

Do it, Judge. Protect the wealthy and say it's not piracy. Do it.

[–] roofuskit@lemmy.world 3 points 4 hours ago

He already referred them to the justice department, this is a civil case, he cannot sentence them criminally.

[–] Lexam@lemmy.world 146 points 1 day ago (3 children)

It's not piracy. For corporations. For you and me believe it or not, straight to jail!

load more comments (3 replies)
load more comments (3 replies)
[–] shittydwarf@lemmy.dbzer0.com 193 points 1 day ago (1 children)
[–] personalthought381@lemm.ee 21 points 17 hours ago

Rules for thee, not for me

[–] akilou@sh.itjust.works 158 points 1 day ago (7 children)

But did they keep a good ratio though?

[–] empireOfLove2@lemmy.dbzer0.com 135 points 1 day ago (4 children)

1000% guarantee those mf's had their upload choked to 20kbps

[–] Tregetour@lemdro.id 3 points 11 hours ago* (last edited 11 hours ago)

20 was the lead engineer 'mishearing' Zuck after he said 2.

load more comments (3 replies)
load more comments (6 replies)
[–] SnotFlickerman@lemmy.blahaj.zone 112 points 1 day ago (3 children)

“Meta downloaded millions of pirated books from LibGen through the bit torrent protocol using a platform called LibTorrent. Internally, Meta acknowledged that using this protocol was legally problematic,” the third amended complaint noted.

Just want to make clear that Libtorrent is just the torrent application they were using, while the Libgen torrents are easily accessible on the libgen site, not through a separate "platform" called Libtorrent.

I wish people like us could help with these complaints, because then they might actually get the details more accurate to reality.

https://libgen.is/repository_torrent/

https://www.libtorrent.org/

The amended complaint makes it sound like Libtorrent is a private tracker website when its just the application they were using on the publicly available torrents.

load more comments (3 replies)
[–] jaybone@lemmy.world 32 points 1 day ago (1 children)
[–] misk@sopuli.xyz 45 points 1 day ago (7 children)

It’s a popular search engine that works with shadow libraries like Sci-Hub or Library Genesis. Shadow libraries are hosts to copies of works of literature and science. Their legal status is murky at best but it’s incredibly impractical to persecute those accessing them.

[–] MonkderVierte@lemmy.ml 3 points 10 hours ago

it’s incredibly impractical to persecute those accessing them.

Always was. If you're serious, persecute those hosting it.

load more comments (6 replies)
load more comments
view more: next ›