ChatGPT-3 X Midjourney
Dear Mensch subscribers and readers, do you remember our ‘Fireside Chat: Toast to the Art in Artificial Intelligence’? I want to say this post will be a dotted line connecting now and then. I got inspiration from our previous talks and took a more fun-side approach.
Recently we’ve been seeing a lot of posts and news, whatever kind of media, talking about ChatGPT-3 or the like. But you might wonder, what is that exactly? GPT stands for Generative(What) Pre-trained(How) Transformer(Who), developed by OpenAI.
More detailed:
- Generative: ChatGPT can generate new responses independently rather than just choosing from pre-existing options.
- Pre-trained: ChatGPT is trained on massive amounts of text data, allowing it to learn patterns and structures in natural language.
- Transformer: This is the name of the specific neural network architecture that ChatGPT is based on.
As a designer myself, I regularly discussed in the past months with my peers: How can we use this flood of technology? How can we collaborate with it?
Cadavre Exquis
Okay, as a designer, and coder, I need a little bit of metaphor to explore and structuralize the thinking. Do you know “Cadavre Exquis”? It is a collaborative drawing game invented by Surrealist artists in the 1920s, such as André Breton, Yves Tanguy, and Salvador Dalí. In the game, each artist would draw a section of an image without seeing the previous sections, resulting in a surreal and often bizarre final image.
Instead of collaborating solely with other designers and artists, imagine creating an image by having ChatGPT3 and another generative AI converse with each other. A designer’s creative input would still be incorporated, but ChatGPT3 and another AI would generate the detailed prompt. CLIP would then translate the image into keywords, and the process would continue until the desired image is achieved.
CLIP – ChatGPT – Midjourney
What is CLIP?
CLIP (Contrastive Language-Image Pre-Training) is an artificial intelligence model developed by OpenAI trained to understand the relationship between natural language and visual content. CLIP can read and understand natural language descriptions of images and videos, and it can also classify and recognize objects and scenes within visual content. In this article, CLIP Interrogator 2.3 by @pharmapsychotic was used.
We’re essentially creating a human-driven version of the process used to develop DALL-E, which relied on CLIP, but with added creative input. In this exploration, CLIP is utilized to extract text from an image, and then ChatGPT-3 modifies the text to create a prompt for AI image generation. This enables the creation of interactive and iterative AI-generated content with human-like creativity. This approach can be part of a more extensive pipeline, like the CLIP-VQGAN image synthesis pipeline, that can generate unique and creative images based on the modified prompts. Below, you can see the rough structure of the process.
The CLIP model can identify and extract text from images, and then ChatGPT-3 can be used to modify that text in a way that produces a new prompt. This prompt can be used as input for the generative AI image model to create a new image, which can then be processed again by the CLIP model to extract new text and the cycle can continue.
It is important to note, however, that this iterative process may only sometimes produce coherent or meaningful results. The images need not be identified accurately. Otherwise, it leads to chaos from the first iteration. Ensuring that the generated images correspond to the modified prompts and that the prompts align with the original images can be challenging. The following picture shows what occurs when the facial massager is the initial image. I assume it is because the image set of facial massagers is smaller than the image set of a commodity like a camera. You can see from the first iteration it got chaotic, and the design became a different object.
Invite your AI as a third person.
There are many ways you can think of using tools like those mentioned above, using real-time feedback to create stunning designs. During a client meeting, you can use a product sample image to mix with keywords and create a prompt for generative images from the client’s input. Or create a mood board with actual product images to start with. Using real-time feedback can be a powerful tool for collaboration and creativity. By working together in real-time, designers and clients can exchange ideas and refine their vision, resulting in a final product that truly reflects their shared goals.
Technology is a fascinating and ever-changing aspect of our lives, but it can also be daunting. However, if we take a moment to reflect, we can see how new technologies have shaped our world and the way we live. Just think about when the train was implemented (this led to the famous ‘Gedankenexperiment’ by Einstein – Relativity of simultaneity) or when the iPhone hit the market – we gained access to incredible capabilities we couldn’t have imagined before. Some people argue that this rapid pace of technological advancement is both good and bad. But when there is a positive flow or strong drive behind it, why should we be afraid?
I am personally less terrified than thrilled. The symbiosis of artificial intelligence and human gut feeling and experience-based, rational decision-making could lead to something great.
Keep on reading
How do pre-trained models work? – https://towardsdatascience.com/how-do-pretrained-models-work-11fe2f64eaa2
Transformer, Google Blog – https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
A few stories to understand tomorrow in the AI era – https://www.youtube.com/watch?v=g9iWYxNfYpo
More of CADAVRE EXQUIS (in German) –https://www.kunstlinks.de/material/walch/galerie/cadavre_exquis/
Jeongwoo Jang
Senior Expert