Shortly after their first public release, text-to-image AI models such as Stable Diffusion and Midjourney also became the subject of debate about the ethics of their use. Anton Troinikov is the co-founder of Chroma, a startup that is working to improve AI interpretability, that is, to make what goes on under the hood of AI systems less mysterious. AI graphic generators Troinikov and others at Chroma saw an opportunity to create a tool that would make it easier to solve some of the burning attribution problems that were emerging. Troinikov answered five quick questions about a project called Stable Attribution and how he thinks artists and AI engineers can stop talking to each other about AI-generated art.
What were your first impressions of AI art generators when they were released?
Anton Troinikov: I started paying attention to the AI art discourse after the release of Stable Diffusion, and a lot more people got access to the model. And I quickly began to understand that people on both sides of the conversation were talking against each other. I wanted to see if there was a technical solution to the problem to make sure that technologists and creatives are not antagonistic to each other.
What is your goal with stable attribution?
Troinikov: I wanted to demonstrate that this problem is technically impossible to solve. After talking to a lot of people, especially from the creative side, but also in terms of technology and research, we decided that it would be the right thing to just go ahead and see what kind of reaction we get when we launch it.
What is the short version of how stable attribution works?
Troinikov: Stable diffusion belongs to a class of models called latent diffusion models. Latent diffusion models encode images and their text captions into vectors (essentially a unique numeric representation for each image). During training, the model adds random values (noise) to the vectors. And then you train the model to go from a slightly noisier vector to a slightly less noisy vector. In other words, the model attempts to reproduce the original numerical representation of each image in its training set based on the accompanying text caption for that image.
The thought was that since these numerical representations come from these pre-trained models that turn images into vectors and back again, the idea is basically “OK, it’s trying to reproduce the images as similarly as possible.” Thus, the generated image should be similar to the images that have had the greatest impact on it, at the expense of a similar numerical representation. This is a very short explanation.
How to take the last step and determine who the artists and creators are?
Troinikov: We would really like to be able to directly correlate the person who created the original images. What we have – and what’s available in the Stable Diffusion public training dataset – are URLs for images, and these URLs often come from a CDN. [content delivery network]. The owners of the sites where these images appear, as well as the owners and operators of these CDNs. could make this connection.
We have a small submission form on the site. If people find out who the creator is, they can send it to us and we’ll try to link it back.
How do you think such generative AI — along with the ability to attribute original images to their creators — impacts artistic creation?
Troinikov: I think you could do two things. First, with the ability to be attributable, you can proportionally reward members of your training set based on their contribution to any given generation. Another really cool thing is if you have attribution in generative models it turns them from a simple generator to a search engine. You can repeatedly find an aesthetic you like and then link to things that contribute to that image.
Anton Troinikov is the co-founder of Chroma, an artificial intelligence company that studies AI behavior through data. Previously, Troinikov worked in robotics with a focus on 3D computer vision. He doesn’t believe the AI will kill us all.