this post was submitted on 22 Dec 2024

1611 points (97.4% liked)

Technology

61632 readers

4503 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

1611

Make illegally trained LLMs public domain as punishment (www.theregister.com)

submitted 1 month ago by Joker@sh.itjust.works to c/technology@lemmy.world

201 comments fedilink hide all child comments

It's all made from our data, anyway, so it should be ours to use as we want

you are viewing a single comment's thread
view the rest of the comments

[–] gazter@aussie.zone 4 points 1 month ago (5 children)

I've never really delved into the AI copyright debate before, so forgive my ignorance on the matter.

I don't understand how an AI reading a bunch of books and rearranging some of those words into a new story, is different to a human author reading a bunch of books and rearranging those words into a new story.

Most AI art I've seen has been... Unique, to say the least. To me, they tend to be different enough to the art they were trained in to not be a direct ripoff, so personally I don't see the issue.

[–] lemmyaccount01@lemm.ee 4 points 1 month ago* (last edited 1 month ago)

I think the the main difference is one being a human author and this is how humans function. We can not unsee or unhear things but we can be compelled to not use that information if the law requires so company secrets/inadmissible evidence in jury duty/plagiarism laws that already exist. And the other being a machine that do not have agency or personhood that has this information being fed to it ( created by other people ) for the sole purpose of creating a closed system for a company so it's shareholders can make money. It's this open for me but not for thee approach is the main problem people have. You have this proprietary "open ai" that microsoft invested 25 or so billion in so they can scrape other peoples work and charge you money for variations of it. I don't mind abolishing ip or patent laws all together so everyone can use and improve chatgpt with whatever they have. If you yourself are hiding behind ip laws to protect your software and disrespecting other peoples copyright laws that's what people see as problematic.

[–] ClamDrinker@lemmy.world 3 points 1 month ago* (last edited 1 month ago)

Yes, this is my exact issue with some framing of AI. Creative people love their influences to the point you can ask them and they will point to parts that they reference or nudged to an influence they partially credit to getting to that result. It's also extremely normal that when you make something new, you brainstorm and analyze any kind of material (copyrighted or not) you can find that gives the same feelings you desire to create. As is ironically said to give comfort to starting creatives that it's okay to be inspired by others: "Good artists copy, great artists steal."

And often people very anti AI don't see an issue with this, yet it is in essence the same as the AI does, which is to detach the work from the ideas it was built on, and then re-using those ideas. And just like anyone who has the ability to create has the ability to plagiarize or infringe, so does the AI. As human users of AI we must be the ones to ethically guide it away from that (Since it can't do that itself), just like you would not copy-paste your influences into a new human made work.

[–] catloaf@lemm.ee 2 points 1 month ago (1 children)

The for-profit large-scale media blender is the problem. When it's a human writing Harry Potter fan fiction, it's fine. When a company sells a tool for you to write thousands of trash "books" for profit, it's a problem.

[–] ClamDrinker@lemmy.world 2 points 1 month ago

Which is why the technology itself isn't the issue, but those willing to use it in unethical ways. AI is an invaluable tool to those with limited means, unlike big corporations.

[–] patatahooligan@lemmy.world 0 points 1 month ago

I don’t understand how an AI reading a bunch of books and rearranging some of those words into a new story, is different to a human author reading a bunch of books and rearranging those words into a new story.

Ok, let's say for now that these things are actually similar. Is a human legally allowed to "rearrange those words" in any way they want? Not really, because they can't copy stuff like characters or plot structure. Even if the copy is not verbatim, it has to avoid being "too similar". It's not always clear where the threshold is; that will be judged in court. But imagine if your were being sued for copyright infringement because of perceived similarities between your work and another creator's. You go to court and say "Well I torrented the plaintiff's work and studied it with the express intent to copy discernible patterns in it, then sell my work based on those patterns". As long as the similarities are found to be valid, you're most likely to lose. The fact that you've spent years campaigning how companies can save a lot of money by firing artists and hiring your pattern-replicating service instead probably wouldn't help your case either. Well, that's basically what an honest defense of AI against copyright infringement would be. So the question is, does AI actually produce output too similar to its training data? Well, this is an example of articles you can find on the topic...

So based on the above thoughts, do you feel like we hold AI generation to the same standard as we do human creators? It doesn't seem so to me.

But there's a lot of reasons why we should hold AIs to higher standards instead. Off the top of my head:

AIs have been created exclusively to replicate patterns in existing works. This is not the only function people have. So we don't have to wonder whether similarities between AI inputs and outputs are coincidental. We don't have to worry about whether overbearing restrictions might inadvertently affect some other function.
AIs have no feelings or needs. We don't have to worry about causing direct harm to them and about protecting their rights. Forbidding a person from reading a book just in case they copy elements from it is obviously problematic, but restricting AI's access to copyrighted work is not directly harmful in the same way.

[–] trashgirlfriend@lemmy.world -2 points 1 month ago (1 children)

ML algorithms aren't capable of producing anything new, they can only ever produce a mishmash of copies of existing works.

If you feed a generative model a bunch of physics research papers, it won't create a new valid physics research paper, just a mishmash of jargon from existing papers.

[–] ClamDrinker@lemmy.world 1 points 1 month ago* (last edited 1 month ago) (1 children)

You say it's not capable of producing anything new, but then give an example of it creating something new. You just changed the goal from "new" to "valid" in the next sentence. Looking at AI for "valid" information is silly, but looking at it for "new" information is not. Humans do this kind of information mixing all the time. It's why fan works are a thing, and why most creative people have influences they credit with being where they are today.

Nobody alive today isn't tainted by the ideas they've consumed in copyrighted works, but we do not bat an eye if you use that in a transformative manner. And AI already does this transformation much better than humans do since it's trained on that much more information, diluting the pool of sources, which effectively means less information from a single source is used.

[–] trashgirlfriend@lemmy.world 1 points 1 month ago* (last edited 1 month ago) (1 children)

It doesn't give you new information.

If I write the sentence "Hello, I just got home" and use an algorithm to jumble it into "got Hello, just I home" there's nothing new there.

There's no transformation, it's not capable of transformation, it's just a very complicated text jumbler that's supposed to jumble text so that the output is readable by humans.

You're taking investment advice from a parrot that had the entirety of reddit investment meme subreddits beamed into its brain.

[–] ClamDrinker@lemmy.world 3 points 1 month ago* (last edited 1 month ago) (1 children)

That's a very short example, but it is a new arrangement of the existing information. It's not a new valuable arrangement of information, but new nonetheless. And yes, rearrangement is transformation. It's very low entropy transformation, but transformation nonetheless. Collages and summaries are in fact, a thing that humans make too.

Unless you mean "new" as in, something nobody's ever written before, in which case not even you can create new information, since pretty much everything you will ever say or write down can be broken down into pieces that have been spoken or written before, which is not exactly a useful distinction.

There’s no transformation, it’s not capable of transformation, it’s just a very complicated text jumbler that’s supposed to jumble text so that the output is readable by humans.

Saying it doesn't make it true, especially when you follow it up with a self-debunk by saying it transforms the text by jumbling it in specific ways that keep it readable to humans, which requires transformation as like you just demonstrated, randomly swapping words does not make legible text..

You’re taking investment advice from a parrot that had the entirety of reddit investment meme subreddits beamed into its brain.

???

[–] trashgirlfriend@lemmy.world 0 points 1 month ago

https://youtu.be/2TRmaAxHDDU?si=Vp_xuXzNEOqOOmSA