this post was submitted on 30 Jul 2023
224 points (100.0% liked)
Technology
37735 readers
394 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This is factually untrue. For example, Stable Diffusion models are in the range of 2GB to 8GB, trained on a set of 5.85 billion images. If it was storing the images, that would allow approximately 1 byte for each image, and there are only 256 possibilities for a single byte. Images are downloaded as part of training the model, but they're eventually "destroyed"; the model doesn't contain them at all, and it doesn't need to refer back to them to generate new images.
It's absolutely true that the training process requires downloading and storing images, but the product of training is a model that doesn't contain any of the original images.
None of that is to say that there is absolutely no valid copyright claim, but it seems like either option is pretty bad, long term. AI generated content is going to put a lot of people out of work and result in a lot of money for a few rich people, based off of the work of others who aren't getting a cut. That's bad.
But the converse, where we say that copyright is maintained even if a work is only stored as weights in a neural network is also pretty bad; you're going to have a very hard time defining that in such a way that it doesn't cover the way humans store information and integrate it to create new art. That's also bad. I'm pretty sure that nobody who creates art wants to have to pay Disney a cut because one time you looked at some images they own.
The best you're likely to do in that situation is say it's ok if a human does it, but not a computer. But that still hits a lot of stumbling blocks around definitions, especially where computers are used to create art constantly. And if we ever hit the point where digital consciousness is possible, that adds a whole host of civil rights issues.
This is the process I was referring to when I said it makes copies. We're on the same page there.
I don't know what the solution to the problem is, and I doubt I'm the right person to propose one. I don't think copyright law applies here, but I'm certainly not arguing that copyright should be expanded to include the statistical matrices used in LLMs and DPMs. I suppose plagiarism law might apply for copying a specific style, but that's not the argument I'm trying to make, either.
The argument I'm trying to make is that while it might be true that artificial minds should have the same rights as human minds, the LLMs and DPMs of today absolutely aren't artificial minds. Allowing them to run amok as if they were is not just unfair to living artists... it could deal irreparable damage to our culture because those LLMs and DPMs of today cannot take up the mantle of the artists they hedge out or pass down their knowledge to the next generation.
Thanks for clarifying. There are a lot of misconceptions about how this technology works, and I think it's worth making sure that everyone in these thorny conversations has the right information.
I completely agree with your larger point about culture; to the best of my knowledge we haven't seen any real ability to innovate, because the current models are built to replicate the form and structure of what they've seen before. They're getting extremely good at combining those elements, but they can't really create anything new without a person involved. There's a risk of significant stagnation if we leave art to the machines, especially since we're already seeing issues with new models including the output of existing models in their training data. I don't know how likely that is; I think it's much more likely that we see these tools used to replace humans for more mundane, "boring" tasks, not really creative work.
And you're absolutely right that these are not artificial minds; the language models remind me of a quote from David Langford in his short story Answering Machine: "It's so very hard to realize something that talks is not intelligent." But we are getting to the point where the question of "how will we know" isn't purely theoretical anymore.