this post was submitted on 04 Jul 2023

199 points (97.6% liked)

Technology

59657 readers

2961 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

199

Google Says It'll Scrape Everything You Post Online for AI (gizmodo.com)

submitted 1 year ago by L4s@lemmy.world to c/technology@lemmy.world

41 comments fedilink hide all child comments

An update to Google's privacy policy suggests that the entire public internet is fair game for it's AI projects.

you are viewing a single comment's thread
view the rest of the comments

[–] Gsus4@lemmy.one 26 points 1 year ago (3 children)

I don't see why this is a problem (apart from supposedly private data like email), it's not just Google that can do this, all this data is available to everyone for everyone who can use it to benefit. If you want to make Google pay for a publicly available good, tax them accordingly. That's the point of taxes: if you are successful enough to take advantage in any way from a country's public roads, education system, access to a labour market and a functioning society generally, taxing the massive profits from using that system is fair, not enclosing everything and holding access to the content we contributed hostage.

[–] drmoose@lemmy.world 10 points 1 year ago (1 children)

Yeah public data is public. If anyone doesn't want their shitty comments or whatever to be used for AI training then put it behind a login or something.

[–] CIA_chatbot@lemmy.world 1 points 1 year ago (1 children)

Except that’s not true, public posting of content does not trump copyright protection. Google using content for AI purposes is almost certainly a copyright issue. I may post content for human consumption but that does not mean I allow it to be used by a private corporation for profit purposes

[–] drmoose@lemmy.world 1 points 1 year ago (1 children)

Can we please not empower copyright to such silly extent? Copyright is already utter garbage and some want to extend to tweets, comments and whatnot.

Also, AI is copying the same way we copy everything - by learning. So we shouldn't be allowed to quote and refer to stuff we learn about online? In no way this argument makes sense - down with the copyright.

[–] CIA_chatbot@lemmy.world 2 points 1 year ago (1 children)

That’s not empowering copyright. That’s literally how it works. Copyright is automatic, and if you do not have a prior agreement assigning copyright it is awarded to the person who created said content, be it a tweet, blog post, etc.

If I make a blog post and google scrapes the data and uses that day for profit, that’s copyright infringement, unless they can prove fair use, which has narrow definitions that training an AI for profit purposes definitely doesn’t fall under

[–] drmoose@lemmy.world 0 points 1 year ago (1 children)

I dread reality you're describing where every bit of information is propriatory. I think the world is a better place with free information. What you're describing sounds whole lot like throwing the baby out with the bath water - just because big tech corporations are "bad".

[–] CIA_chatbot@lemmy.world 1 points 1 year ago (1 children)

I mean you can dread it all you want, because that is LITERALLY how it works today. Google, OpenAI and Microsoft already have multiple lawsuits for stealing people’s copyrights to train their LLMs.

Copyright is assigned automatically. If I make a blog post, that is automatically my copyrighted material. As the creator I get to choose how it’s used, not Google

If I took some proprietary Google code and used it without permission you know damn well they would sue my ass into oblivion. Copyright has to protect the small as well as the giant.

[–] drmoose@lemmy.world 1 points 1 year ago* (last edited 1 year ago) (1 children)

I don't think you understand.

Let's imagine everything is copyrighted. Who will be able to create LLMs now? Google/Meta who can afford to literally hire thousands of people on below minimum wage creating annotations or smaller companies and free projects? You are literally empowering the thing you're complaining about.

Public data is public and that's good for general balance. It removed the moats.

[–] CIA_chatbot@lemmy.world 1 points 1 year ago (1 children)

I don’t think you understand? You’re talking about some “information must be free Star Trek future” that doesn’t exist. I’m talking about the exact legal framework that exists today.

If I write a short story somewhere, why the fuck should someone be able to profit off of it because they pointed a bot at my site? How do you prevent giant corps from eventually squashing and owning everything?

Just because something is publicly accessible doesn’t mean it’s public. I would maybe start here

https://www.copyright.gov/help/faq/faq-general.html#protect

If Google or Meta wants to make an LLM off my content, they can fucking have the decency to ask or pound sand. Adding a clause to some policy somewhere doesn’t auto-magically remove my legal rights

[–] drmoose@lemmy.world 1 points 1 year ago

If I write a short story somewhere,

it's a two way street - if you want to benefit from the free flow of information (your story being public) you should also bear the costs. I feel we've reached the end of this thread so lets just agree to disagree. Maybe my distaste for copyrighting information is too great here for you to convince me otherwise :)

[–] boonhet@lemm.ee 4 points 1 year ago (1 children)

If you want to make Google pay for a publicly available good, tax them accordingly.

Tax them where? In the US? But a lot of the content they scrape would be European. So does EU get to tax them for content scraped from EU users and US for content scraped from US users? Actually, how DO we define the locality of online content? By host server? Site owning company/person's legal location? Content poster's location?

Much as I'd love to see Google pay more taxes, I'm not sure how this would play out.

[–] Cordoro@lemmy.world 2 points 1 year ago (1 children)

I was with the post until the taxes. Came out of nowhere

[–] Gsus4@lemmy.one 1 points 1 year ago* (last edited 1 year ago)

It did not come out of nowhere, it's right in there: I mentioned taxes because using a public good/service is only freeloading (like people imply with google scraping public data or Elon when he talks about data pillaging) if you don't pay for its upkeep.

[–] SamB@lemmy.world 2 points 1 year ago

As long as they don’t present that data as their own, I am fine with it. But wait, that’s exactly what they’re doing.. I have a vision of a thousand lawsuits shoved down the throat of the mighty Alphabet.