DALL-E 2 uses descriptions to produce images, but is it art?
A picture can be worth a thousand words, but thanks to an artificial intelligence program called DALL-E 2, you can have a professional-looking picture with a lot less.
DALL-E 2 is a new neural network algorithm that creates an image from a short phrase or phrase that you provide. The program, which was announced by artificial intelligence research lab OpenAI in April 2022, has not been made public. But a small and growing number of people – myself included – have had access to experimentation.
As a researcher studying the connection between technology and art, I was eager to see how well the program worked. After hours of experimentation, it is clear that DALL-E, while not without flaws, is a step ahead of existing image generation technology. This raises immediate questions about how these technologies will change the way art is made and consumed. It also raises questions about what it means to be creative when DALL-E 2 seems to automate much of the creative process itself.
A Staggering Range of Styles and Subjects
OpenAI researchers built DALL-E 2 from a huge collection of images with captions. They collected some of the images online and licensed others.
Using DALL-E 2 is a lot like searching the web for an image: you type a short phrase into a text box and it returns six images.
But instead of being pulled from the web, the program creates six new images, each reflecting a version of the entered phrase. (Until recently, the program produced 10 images per prompt.) For example, when friends and I gave DALL-E 2 the text prompt “cats in devo hats”, he produced 10 images which came in different styles.
Almost all of them could plausibly pass for professional photographs or drawings. While the algorithm didn’t quite grasp the “Devo hat” – the strange helmets worn by the New Wave Devo group – the headgear in the images it produced came close.
In recent years, a small community of artists have used neural network algorithms to produce art. Many of these works of art have distinctive qualities that almost look like real pictures, but with strange distortions of space – a kind of cyberpunk cubism. Newer text-to-image systems often produce dreamlike, fantastical images that may be delightful but rarely seem real.
DALL-E 2 offers a significant leap in image quality and realism. He can also imitate specific styles with remarkable precision. If you want images that look like real photographs, this will produce six realistic images. If you want prehistoric cave paintings of Shrek, this will generate six images of Shrek as if drawn by a prehistoric artist.
It’s amazing that an algorithm can do this. Each set of images takes less than a minute to generate. Not all images will be pleasing to the eye or necessarily reflect what you had in mind. But, even with the need to sift through lots of outputs or try different text prompts, there’s no other way to get so many results so quickly, not even by hiring an artist. And, sometimes, unexpected results are the best.
In principle, anyone with enough resources and expertise can create a system like this. Google Research recently announced an impressive similar text-to-image conversion system, and an independent developer is publicly developing their own version that anyone can try right now on the web, although it’s still not as good as DALL- E or Google’s system.
It’s easy to imagine these tools transforming the way people create images and communicate, whether through memes, greeting cards, advertising and, yes, art.
Where’s the art in that?
At first I had a moment using DALL-E 2 to generate different types of paintings, in all different styles – like “Odilon Redon painting from Seattle” – when I realized it was better than any any paint algorithm I’ve ever developed. Then I realized that he is, in a way, a better painter than me.
In fact, no human being can do what DALL-E 2 does: create such a wide range of high-quality images in just seconds. If someone told you that a person created all these images, of course you would say that they were creative.
But that does not make DALL-E 2 an artist. Even though it sometimes looks like magic, under the hood it is still a computer algorithm, which strictly follows the instructions of the algorithm authors at OpenAI.
If these images succeed as art, they are a product of how the algorithm was designed, the images it was trained on, and more importantly, how the artists use it.
You might be inclined to say that there is little artistic value in an image produced by a few keystrokes. But in my opinion, this line of thinking echoes the classic idea that photography can’t be art because a machine has done all the work. Today, the human authorship and craftsmanship involved in fine art photography is recognized, and critics understand that the best photography involves more than just pushing a button.
Even so, we often discuss artworks as if they came directly from the artist’s intent. The artist intended to show something, or express an emotion, and so he created this image. DALL-E 2 seems to shorten this process entirely: you have an idea, you grab it, and you’re done.
But when I paint the old-fashioned way, I found that my paintings came from the exploratory process, not just from executing my initial goals. And this is true for many artists.
Take Paul McCartney, who coined the track “Get Back” during a jam session. He didn’t start with a plan for the song; he just started playing the violin and experimenting and the band developed it from there.
Picasso describes his process in the same way: “I don’t know in advance what I’m going to put on the canvas any more than I decide in advance what colors I’m going to use. . . Each time I undertake to paint a picture, I have the feeling of jumping into space.
In my own explorations with DALL-E 2, one idea led to another which led to another, and eventually I found myself in a completely unexpected and magical new land, very far from where I had started.
Incitement as art
I would say that the art, using a system like DALL-E 2, comes not just from the final text prompt, but from the whole creative process leading up to that prompt. Different artists will follow different processes and achieve different results that reflect their own approaches, skills and obsessions.
I started to see my experiments as a set of series, each a cohesive dive into a single theme, rather than a set of stand-alone wacky images.
The ideas for these images and series came from everywhere, often linked by a set of stepping stones. At one point, while making images based on the work of contemporary artists, I wanted to generate a site-specific installation art image in the style of contemporary Japanese artist Yayoi Kusama. After trying a few unsatisfactory places, I came up with the idea of placing it in La Mezquita, an old mosque and church in Cordoba, Spain. I sent the photo to a fellow architect, Manuel Ladron de Guevara, who is from Córdoba, and we started thinking about other architectural ideas together.
This became a series about imaginary new buildings in different styles of architects.
So I started to see what I was doing with DALL-E 2 as both a form of exploration and an art form, even if it’s often amateur art like the drawings I do. on my iPad.
Indeed, some artists, like ryan murdoch, advocated for prompt-based image-making to be recognized as an art. He points the Helena Sarin, experienced artificial intelligence artist for example.
“When I look at most stuff from Midjourney” – another popular text-to-image conversion system – “a lot of it will be interesting or funny,” Murdoch told me in an interview. “But with [Sarin’s] work, there is a direct line. It’s easy to see that she put a lot of thought into it and worked on the craft, as the result is more visually appealing and interesting, and follows her style continuously.
Working with DALL-E 2, or one of the new text-to-image conversion systems, means learning its quirks and developing strategies to avoid common pitfalls. It’s also important to be aware of its potential harms, such as its reliance on stereotyping and potential uses for misinformation. By using DALL-E 2 you will also discover surprising correlations, like how everything becomes old when you use the style of an old painter, filmmaker or photographer.
When I want to do something very specific, DALL-E 2 often can’t do it. The results would require a lot of difficult manual editing afterwards. It’s when my goals are vague that the process is most enjoyable, providing surprises that lead to new ideas which in turn lead to more ideas and so on.
Likewise, the artist Mario Klingemann architectural renderings with the tents of the homeless could be considered a replica of my architectural renderings of whimsical dream homes.
It is too early to judge the importance of this art form. I keep thinking of a line from the excellent book “Art in the Post-Culture” – “The dominant aesthetic of AI is novelty”.
This would surely be true, to some extent, for any new technology used for art. The Lumière brothers’ early films in the 1890s were novelties, not cinematic masterpieces; it amazed people to see images moving at all.
AI art software is developing so rapidly that there is continuous technical and artistic novelty. It’s as if, every year, there’s an opportunity to explore an exciting new technology, each one more powerful than the last, and each one seemingly poised to transform art and society.
Aaron Hertzmann is an Affiliate Professor of Computer Science at the University of Washington.
This article is republished from The Conversation under a Creative Commons license. Read the original article.