Stability AI Announces Stable Diffusion 3: All We Know So Far

Find out about the new updates to Stable Diffusion and discover the capabilities of the version 3 text-to-image model.

Feb 2024

Stability AI announced an early preview of Stable Diffusion 3, their text-to-image generative AI model. Unlike last week's Sora text-to-video announcement from OpenAI, there were limited demonstrations of the model's new capabilities, but some details were provided. Here, we explore what the announcement means, how the new model works, and some implications for the advancement of image generation.

What is Stable Diffusion 3?

Stable Diffusion is a series of text-to-image generative AI models. That is, you write a prompt describing what you want to see, and the model creates an image matching your description. There is a web user interface for easy access to the AI.

One major difference to OpenAI's rival DALL·E image generation AI is that it has "open weights". That is, the details of the neural network that provides the computations of the model are publicly available. That means that some transparency in how the model works, and it is possible for researchers to adapt and build on the work of Stability AI.

Stable Diffusion 3 is not one model, but a whole family of models, with sizes ranging from 800 million parameters to 8 billion parameters. More parameters result in a higher quality of output, but have the side-effect that images are more expensive and take longer to create. Versions of the model with fewer parameters are better for creating simple images, and versions with more parameters are better suited to creating higher quality or more complex images.

How does Stable Diffusion 3 work?

Stable Diffusion 3 uses a diffusion transformer architecture, similar to the one used by Sora. Previous versions of Stable Diffusion—and most current image generation AIs—use a diffusion model. Large language models for text generation, like GPT, use a transformer architecture. Being able to combine the two models is a recent innovation and promises to harness the best of both architectures.

Diffusion models perform well at creating detail in small regions but are poor at generating the overall layout of an image. Conversely, transformers are good at layout but poor at creating detail. So it is likely that Stable Diffusion will use a transformer to lay out the overall picture and then use diffusers to generate patches.

That means that we can expect Stable Diffusion 3 to perform better than its predecessors in organizing complex scenes.

The announcement also states that Stable Diffusion 3 uses a technique called flow matching. This is a more computationally efficient way of training models, and creating images from those models, than the current diffusion path technique. That means that the AI is cheaper to create, and images created with the AI are also cheaper to create, resulting in lower costs for the AI.

What are the limitations of Stable Diffusion 3?

One of the current limitations of image generation AI is the ability to generate text. Notably, the Stability AI announcement began with an image that included the name of the model, "Stable Diffusion 3". The positioning of the letters in the text is good but not perfect: notice that the distance between the "B" and the "L" in Stable is wider than the distance between the "L" and the "E". Similarly, the two "F"s in Diffusion are too close together. However, overall, this is a noticeable improvement over the previous generation of models.

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

Another issue with the models is that because diffusers generate patches of the image separately, inconsistencies can occur between regions of the image. This is mostly a problem when trying to generate realistic images. The announcement post didn't include many realistic examples, but an image of a bus in a city street reveals a few instances of these problems. Notice that the shadow underneath the bus suggests light coming from behind the bus, but the shadow of a building on the street indicates light coming from the left of the image. Similarly, the positioning of the windows in the building at the top right of the image is slightly inconsistent across different regions of the building. The bus also has no driver, though this may be fixable with more careful prompting.

How can I access Stable Diffusion 3?

Stable Diffusion 3 is in an "early preview" state. That means it is only available to researchers for testing purposes. The preview state is to allow Stability AI to gather feedback about the performance and safety of the model before it is released to the public.

You can join the waiting list for access to the AI here.

What are the use cases of Stable Diffusion 3?

Image generation AIs have already found many use cases, from illustrations to graphic design to marketing materials. Stable Diffusion promises to be useable in the same ways, with the added advantage that it is likely to be able to create images with more complex layouts.

What are the risks of Stable Diffusion 3?

The dataset that Stable Diffusion was trained on included some copyrighted images, which has resulted in several as-yet-unresolved lawsuits. It is unclear what the outcome of these lawsuits will be, but it is theoretically possible that any images created by Stable Diffusion will also be considered in breach of copyright.

What Don't We Know Yet?

The full technical details of Stable Diffusion 3 have not been released yet, and in particular, there is no way to test the performance of the AI. Once the model is publicly available and benchmarks are established, it will be possible to determine how much of an improvement the AI is over previous models. Other factors such as the time and cost to generate an image will also become clear.

One technical development that was heavily championed by OpenAI in their DALL·E 3 paper, but was not mentioned in the Stability AI announcement was recaptioning. This is a form of automatic prompt engineering, where the text written by the user is restructured and given extra detail to provide clearer instructions to the model. It is unknown whether Stable Diffusion 3 makes use of this technique or not.

Closing thoughts

Stable Diffusion 3 promises to be another step forward in the progress of text-to-image generative AI. Once the AI is publicly released, we'll be able to test it further and discover new use cases. If you’re eager to get started in the world of generative AI, our AI Fundamentals skill track will help you get up to speed with machine learning, deep learning, NLP, generative models, and more.

For more resources on the latest in the world of AI, check out the list below:

Author

Richie Cotton

Topics

Artificial Intelligence (AI)

Start Your AI Journey Today!

course

Introduction to ChatGPT

1 hour

211.3K

Learn how to use ChatGPT. Discover best practices for writing prompts and explore common business use cases for the powerful AI tool.

See Details

Start Course

track

AI Fundamentals

10hrs hours

Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.

See Details

Start Course

course

ChatGPT Prompt Engineering for Developers

4 hours

Dive deep into the principles and best practices of prompt engineering to leverage powerful language models like ChatGPT to solve real-world problems.

See Details

Start Course

blog

What is Stable Code 3B?

Discover everything you need to know about Stable Code 3B, the latest product of Stability AI, specifically designed for accurate and responsive coding.

Javier Canales Luna

11 min

blog

Everything We Know About GPT-5

Predicting what the next evolution in OpenAI's AI technology might look like and what advancements the GPT-5 model might have.

Josep Ferrer

10 min

tutorial

How to Use the Stable Diffusion 3 API

Learn how to use the Stable Diffusion 3 API for image generation with practical steps and insights on new features and enhancements.

Kurtis Pykes

12 min

tutorial

StableDiffusion Web UI: A Comprehensive User Guide for Beginners

Learn how to easily install and use Stable Diffusion Web UI for generating high-quality images on your laptop.

Abid Ali Awan

13 min

tutorial

How to Run Stable Diffusion

Explore generative AI with our introductory tutorial on Stable Diffusion. Learn how to run the deep learning model online and locally to generate detailed images.

Kurtis Pykes

7 min

tutorial

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA

Learn how to successfully fine-tune Stable Diffusion XL on personal photos using Hugging Face AutoTrain Advance, DreamBooth, and LoRA for customized, high-quality image generation.

Abid Ali Awan

14 min

See More See More

What is Stable Diffusion 3?

How does Stable Diffusion 3 work?

What are the limitations of Stable Diffusion 3?

How can I access Stable Diffusion 3?

What are the use cases of Stable Diffusion 3?

What are the risks of Stable Diffusion 3?

What Don't We Know Yet?