Skip to main content
HomeBlogArtificial Intelligence (AI)

Stability AI Announces Stable Diffusion 3: All We Know So Far

Find out about the new updates to Stable Diffusion and discover the capabilities of the version 3 text-to-image model.
Feb 2024

Stability AI announced an early preview of Stable Diffusion 3, their text-to-image generative AI model. Unlike last week's Sora text-to-video announcement from OpenAI, there were limited demonstrations of the model's new capabilities, but some details were provided. Here, we explore what the announcement means, how the new model works, and some implications for the advancement of image generation.

What is Stable Diffusion 3?

Stable Diffusion is a series of text-to-image generative AI models. That is, you write a prompt describing what you want to see, and the model creates an image matching your description. There is a web user interface for easy access to the AI.

One major difference to OpenAI's rival DALL·E image generation AI is that it has "open weights". That is, the details of the neural network that provides the computations of the model are publicly available. That means that some transparency in how the model works, and it is possible for researchers to adapt and build on the work of Stability AI.

Stable Diffusion 3 is not one model, but a whole family of models, with sizes ranging from 800 million parameters to 8 billion parameters. More parameters result in a higher quality of output, but have the side-effect that images are more expensive and take longer to create. Versions of the model with fewer parameters are better for creating simple images, and versions with more parameters are better suited to creating higher quality or more complex images.

How does Stable Diffusion 3 work?

Stable Diffusion 3 uses a diffusion transformer architecture, similar to the one used by Sora. Previous versions of Stable Diffusion—and most current image generation AIs—use a diffusion model. Large language models for text generation, like GPT, use a transformer architecture. Being able to combine the two models is a recent innovation and promises to harness the best of both architectures.

Diffusion models perform well at creating detail in small regions but are poor at generating the overall layout of an image. Conversely, transformers are good at layout but poor at creating detail. So it is likely that Stable Diffusion will use a transformer to lay out the overall picture and then use diffusers to generate patches.

That means that we can expect Stable Diffusion 3 to perform better than its predecessors in organizing complex scenes.

The announcement also states that Stable Diffusion 3 uses a technique called flow matching. This is a more computationally efficient way of training models, and creating images from those models, than the current diffusion path technique. That means that the AI is cheaper to create, and images created with the AI are also cheaper to create, resulting in lower costs for the AI.

What are the limitations of Stable Diffusion 3?

One of the current limitations of image generation AI is the ability to generate text. Notably, the Stability AI announcement began with an image that included the name of the model, "Stable Diffusion 3". The positioning of the letters in the text is good but not perfect: notice that the distance between the "B" and the "L" in Stable is wider than the distance between the "L" and the "E". Similarly, the two "F"s in Diffusion are too close together. However, overall, this is a noticeable improvement over the previous generation of models.

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says

Prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

Another issue with the models is that because diffusers generate patches of the image separately, inconsistencies can occur between regions of the image. This is mostly a problem when trying to generate realistic images. The announcement post didn't include many realistic examples, but an image of a bus in a city street reveals a few instances of these problems. Notice that the shadow underneath the bus suggests light coming from behind the bus, but the shadow of a building on the street indicates light coming from the left of the image. Similarly, the positioning of the windows in the building at the top right of the image is slightly inconsistent across different regions of the building. The bus also has no driver, though this may be fixable with more careful prompting.


How can I access Stable Diffusion 3?

Stable Diffusion 3 is in an "early preview" state. That means it is only available to researchers for testing purposes. The preview state is to allow Stability AI to gather feedback about the performance and safety of the model before it is released to the public.

You can join the waiting list for access to the AI here.

What are the use cases of Stable Diffusion 3?

Image generation AIs have already found many use cases, from illustrations to graphic design to marketing materials. Stable Diffusion promises to be useable in the same ways, with the added advantage that it is likely to be able to create images with more complex layouts.

What are the risks of Stable Diffusion 3?

The dataset that Stable Diffusion was trained on included some copyrighted images, which has resulted in several as-yet-unresolved lawsuits. It is unclear what the outcome of these lawsuits will be, but it is theoretically possible that any images created by Stable Diffusion will also be considered in breach of copyright.

What Don't We Know Yet?

The full technical details of Stable Diffusion 3 have not been released yet, and in particular, there is no way to test the performance of the AI. Once the model is publicly available and benchmarks are established, it will be possible to determine how much of an improvement the AI is over previous models. Other factors such as the time and cost to generate an image will also become clear.

One technical development that was heavily championed by OpenAI in their DALL·E 3 paper, but was not mentioned in the Stability AI announcement was recaptioning. This is a form of automatic prompt engineering, where the text written by the user is restructured and given extra detail to provide clearer instructions to the model. It is unknown whether Stable Diffusion 3 makes use of this technique or not.

Closing thoughts

Stable Diffusion 3 promises to be another step forward in the progress of text-to-image generative AI. Once the AI is publicly released, we'll be able to test it further and discover new use cases. If you’re eager to get started in the world of generative AI, our AI Fundamentals skill track will help you get up to speed with ​​machine learning, deep learning, NLP, generative models, and more.

For more resources on the latest in the world of AI, check out the list below:

Photo of Richie Cotton
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.


Start Your AI Journey Today!


Introduction to ChatGPT

1 hr
Learn how to use ChatGPT. Discover best practices for writing prompts and explore common business use cases for the powerful AI tool.
See DetailsRight Arrow
Start Course
See MoreRight Arrow

You’re invited! Join us for Radar: AI Edition

Join us for two days of events sharing best practices from thought leaders in the AI space
DataCamp Team's photo

DataCamp Team

2 min

The Art of Prompt Engineering with Alex Banks, Founder and Educator, Sunday Signal

Alex and Adel cover Alex’s journey into AI and what led him to create Sunday Signal, the potential of AI, prompt engineering at its most basic level, chain of thought prompting, the future of LLMs and much more.
Adel Nehme's photo

Adel Nehme

44 min

The Future of Programming with Kyle Daigle, COO at GitHub

Adel and Kyle explore Kyle’s journey into development and AI, how he became the COO at GitHub, GitHub’s approach to AI, the impact of CoPilot on software development and much more.
Adel Nehme's photo

Adel Nehme

48 min

A Comprehensive Guide to Working with the Mistral Large Model

A detailed tutorial on the functionalities, comparisons, and practical applications of the Mistral Large Model.
Josep Ferrer's photo

Josep Ferrer

12 min

Serving an LLM Application as an API Endpoint using FastAPI in Python

Unlock the power of Large Language Models (LLMs) in your applications with our latest blog on "Serving LLM Application as an API Endpoint Using FastAPI in Python." LLMs like GPT, Claude, and LLaMA are revolutionizing chatbots, content creation, and many more use-cases. Discover how APIs act as crucial bridges, enabling seamless integration of sophisticated language understanding and generation features into your projects.
Moez Ali's photo

Moez Ali

How to Improve RAG Performance: 5 Key Techniques with Examples

Explore different approaches to enhance RAG systems: Chunking, Reranking, and Query Transformations.
Eugenia Anello's photo

Eugenia Anello

See MoreSee More