Skip to main content
HomeTutorialsArtificial Intelligence (AI)

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA

Learn how to successfully fine-tune Stable Diffusion XL on personal photos using Hugging Face AutoTrain Advance, DreamBooth, and LoRA for customized, high-quality image generation.
Nov 2023  · 14 min read

In this tutorial, we will learn about Stable Diffusion XL and Dream Booth, and how to access the image generation model using the diffuser library. Additionally, we will learn to fine-tune the model on personal photos and evaluate its performance. If you're a newcomer to AI, we suggest taking an AI Fundamentals course to give you a primer. 

Understanding Stable Diffusion XL

The Stability AI team has released Stable Diffusion XL (SDXL) 1.0, representing the next evolution in AI text-to-image generation models. This open-source model builds on the previous research-only SDXL 0.9 model to become the world's most capable publicly available image creation model.

Both quantitative analysis and qualitative human evaluations over weeks of experimentation demonstrate SDXL's capabilities to generate the highest quality and most preferred images compared to other open-source models.

Image from arxiv.org

Image from arxiv.org

This high quality was achieved by using an ensemble of two models - a 3.5 billion parameter base generator and a 6.6 billion parameter refiner. This dual pipeline maximizes image quality while remaining efficient enough to run on consumer GPUs.

Now, with SDXL 1.0, users no longer need long, complex prompts to generate stunning images. Its intelligence allows the creation of intricate images from just a few words.

Fine-tuning SDXL on custom datasets and tasks has become even easier. It allows us granular control over structure, style, and composition.

What is DreamBooth?

DreamBooth, introduced in 2022 by the Google research team, represents a significant advancement in the field of generative AI, particularly in the realm of text-to-image models like Stable Diffusion.

It is called DreamBooth because, in the words of the Google researchers:

"It's like a photo booth but captures the subject in a way that allows it to be synthesized wherever your dreams take you."

image1.png

Image from DreamBooth

DreamBooth allows you to inject a specific custom subject that the fine-tuned model then becomes specialized at rendering in different ways. So, in a sense, it opens up the possibility to create your own image generator focused on a particular person, character, object, or scene.

DreamBooth requires only a few (typically 3-5) images of the subject to train the model effectively. Once trained, the model can place the subject in a myriad of settings, scenes, and poses, limited only by the user's imagination.

DreamBooth Use Cases

Fine-tuning the image generation model with DreamBooth can be beneficial for many fields. It provides more freedom to experiment and generate high-quality images without Photoshop skills. Here are some examples of how it can easily add value to the workspace.

  1. Creative industries such as graphic design, advertising, and entertainment can greatly benefit from DreamBooth, as it offers a high level of customization and uniqueness in visual content creation.
  2. DreamBooth allows for personalization by creating scenarios that may be difficult to replicate in real life or purely fictional.
  3. It can also be applied in educational and research fields where visual representation is crucial, as it can be used to create personalized educational content or for research purposes.

In the following sections, we will delve deeper into the process of accessing and fine-tuning the SDXL model on a custom dataset using free Kaggle GPUs.

Accessing Stable Diffusion XL

We can try the Stable Diffusion XL Spaces demo on Hugging Face, which quickly generates four images based on your input. Play around before deciding if it's right for your application.

Image from Stable Diffusion XL on TPUv5e

Image from Stable Diffusion XL on TPUv5e

The other way is to use the Python library diffusers to generate the image using the custom prompt.

Setting Up

Before we jump into running the code, make sure you are running a GPU machine with the CUDA support.

!nvidia-smi

image4.png

After that, install the diffuser package using PIP.

%pip install --upgrade diffusers[torch] -q

Loading the Base Model and VAE Decoder

We will load the custom VAE decoder that is modified to run in fp16 precision without generating NaNs.

After that, we will build a pipeline by loading the base SDXL model with fp16 precision by integrating the VAE decoder in the diffuser's workflow.

We could have loaded base mode directly without “fp16”, but it would have caused GPU memory issues. So, to run it on Kaggle, Colab, and Laptop GPUs, you have to access the model in the fp16 variant.

from diffusers import DiffusionPipeline, AutoencoderKL
import torch

vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
pipe.to("cuda");

Running the Image Generation Pipeline

To generate the image, you have to provide a simple prompt, inference steps, and the number of images per prompt.

In our case, we will be generating four images from the single prompt.

prompt = "A man in a spacesuit is running a marathon in the jungle."

image = pipe(prompt=prompt, num_inference_steps=25, num_images_per_prompt = 4)

To display images in a grid, we can write a simple Python function that resizes the images and displays them in a grid.

from PIL import Image

def image_grid(imgs, rows, cols, resize=256):
    assert len(imgs) == rows * cols

    if resize is not None:
        imgs = [img.resize((resize, resize)) for img in imgs]

    w, h = imgs[0].size
    grid_w, grid_h = cols * w, rows * h
    grid = Image.new("RGB", size=(grid_w, grid_h))

    for i, img in enumerate(imgs):
        x = i % cols * w
        y = i // cols * h
        grid.paste(img, box=(x, y))

    return grid

We will display four images in a 2x2 grid to easily compare model output.

image_grid(image.images, 2, 2)

The results are amazing. Certainly, SDXL is a lot better than Stable Diffusion 1.6.

image3.png

Let’s write another prompt and generate three images of a monkey playing with fireworks.

prompt = "A playful monkey handling fireworks with ease."
image = pipe(prompt=prompt, num_inference_steps=25, num_images_per_prompt = 3)
image_grid(image.images, 1, 3)

image14.png

Learn how to access the Stable Diffusion model online and locally by following the How to Run Stable Diffusion tutorial.

Improve the Results with Refiner

To further improve the image quality and model accuracy, we will use Refiner.

  1. Load SDXL refiner 1.0 using diffusion pipeline.
  2. Generate the image with the base SDXL model. We will change the output type to “latent” which will return the latent representation of the input, rather than the reconstructed image.
  3. Provide the refiner with the prompt and latent representation of generated image.
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

n_steps = 40
high_noise_frac = 0.7

image = pipe(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_end=high_noise_frac,
    output_type="latent",
).images

image = refiner(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_start=high_noise_frac,
    image=image,
).images[0]

image

The refined image is high quality and follows the prompt accurately.

image17.png

Learn how to generate photorealistic images using Python with state-of-the-art diffusion models by watching the Generating Photorealistic Images using AI with Diffusers in Python code-along.

Fine-Tuning SDXL using AutoTrain Advanced

Fine-tuning SDXL has become much easier with time. Thanks to AutoTrain Advance, we can now fine-tune our model with just one Python script. This Automatic Machine learning library is designed for training and deploying state-of-the-art machine-learning models with minimal code.

You can install the Python package using PIP:

%pip install -U autotrain-advanced

If you're interested in fine-tuning the SDXL model with accelerate and transformers, check out the SDXL DreamBoot LoRA Colab Notebook. The notebook is a bit outdated, and you might have to make changes in order to run it properly.

Setting Up

Before we run the DreamBooth script, we should set up some variables that we are going to use to run the script.

  1. Name the project.
  2. Location of the base model.
  3. Directory of the dataset.
  4. Repo ID for pushing the model to Hugging Face.
PROJECT_NAME = "Dreambooth_SDXL"
MODEL_NAME = "stabilityai/stable-diffusion-xl-base-1.0"
DATA_DIR = "/kaggle/input/abids-photos"
REPO_ID = "kingabzpro/sdxl-lora-abid"

Next, you will set up a Hugging Face token by using the Kaggle’s Secret feature. Learn how to set up secrets by reading Mistral 7B Tutorial: Step-by-Step Guide for Using Mistral 7B.

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
HF_TOKEN = user_secrets.get_secret("HUGGINGFACE_TOKEN")

Creating the Private Dataset in Kaggle

To create the private dataset of your selfies in Kaggle, you have to click on the upload button on the right-side panel of the notebook.

image10.png

Add the dataset title and upload the zip file containing five photos of the selfies taken at different locations. In this case, it's the photos of the author.

image7.png

To view all five images in the Jupyter Notebook, we will load the images using PIL and feed it to our image_grid function.

from PIL import Image

def image_grid(imgs, rows, cols, resize=256):
    assert len(imgs) == rows * cols

    if resize is not None:
        imgs = [img.resize((resize, resize)) for img in imgs]

    w, h = imgs[0].size
    grid_w, grid_h = cols * w, rows * h
    grid = Image.new("RGB", size=(grid_w, grid_h))

    for i, img in enumerate(imgs):
        x = i % cols * w
        y = i // cols * h
        grid.paste(img, box=(x, y))

    return grid

import glob

We displayed five high-quality images of the author at different locations. Please ensure that all photos are high quality and of similar size.

imgs = [Image.open(path) for path in glob.glob("/kaggle/input/abids-photos/*.jpg")]
image_grid(imgs, 1, 5)

image11.png

AutoTrain Dream Booth Training Script

It is time to fine-tune our model on these five images using the AutoTrain DreamBooth script.

  1. Provide base model, project name, and data directory.
  2. Ensure the instance prompt is unique, describing the person or the object. Use the person's full name, "Abid Ali Awan," and describe the photo in detail.
  3. Keep the rest of the argument the same to get the best performance.
  4. Provide the Hugging Face access token and repository ID to upload model LoRA adopter to the Hugging Face Hub.
!autotrain dreambooth \
--model $MODEL_NAME \
--project-name $PROJECT_NAME \
--image-path $DATA_DIR \
--prompt "A photo of Abid Ali Awan wearing casual clothes, taking a selfie, and smiling." \
--resolution 1024 \
--batch-size 1 \
--num-steps 500 \
--gradient-accumulation 4 \
--lr 1e-4 \
--fp16 \
--gradient-checkpointing \
--push-to-hub \
--token $HF_TOKEN \
--repo-id $REPO_ID

It took 2 hours and 15 minutes to train the model and push the LoRA weights to Hugging Face.

LoRA, short for Low-Rank Adaptation, is a technique that involves adding small trainable layers to an existing model without modifying the original weights.

LoRA is beneficial because it allows the introduction of new concepts, such as art styles, characters, or themes, to the model without requiring extensive computation or memory usage.

Steps: 100%|█████████| 500/500 [2:15:11<00:00, 14.49s/it, loss=0.119, lr=0.0001]Model weights saved in Dreambooth_SDXL/pytorch_lora_weights.safetensors
Steps: 100%|█████████| 500/500 [2:15:11<00:00, 16.22s/it, loss=0.119, lr=0.0001]
pytorch_lora_weights.safetensors: 100%|████| 23.4M/23.4M [00:01<00:00, 11.9MB/s]

Instead of saving a 7GB model file, we will save a LoRA adapter of 23.4MB that will be attached to the base model during inference.

Image from kingabzpro/sdxl-lora-abid

Image from kingabzpro/sdxl-lora-abid

SDXL Fine-tuned Model Inference

We will create a diffusion pipeline to generate the image, using a custom VAE decoder and SDXL base model with FP16 precision.

Then, we will load the LoRA weights and add them to the base model using our Hugging Face repo ID.

In the end, we will run the pipeline with the prompt and generate three images.

from diffusers import DiffusionPipeline, AutoencoderKL, StableDiffusionXLImg2ImgPipeline

import torch

vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
pipe.to("cuda");
pipe.load_lora_weights(REPO_ID, weight_name="pytorch_lora_weights.safetensors")


prompt = "A photo of Abid Ali Awan participating in a marathon."

image = pipe(prompt=prompt, num_inference_steps=25, num_images_per_prompt = 3)
image_grid(image.images, 1, 3)

The results are impressive. We have successfully fine-tuned our model to generate images of the author “Abid Ali Awan”.

image6.png

Let’s try again with another prompt.

prompt = "A photo of Abid Ali Awan giving a press conference."

image = pipe(prompt=prompt, num_inference_steps=25, num_images_per_prompt = 3)
image_grid(image.images, 1, 3)

The results are once again exceptional.

image9.png

Using Refiner

In this section, we will improve the generated image with the SDXL Refiner 1.0.

First, we will generate the image using the Stable Diffusion pipeline and provide a manual seed to ensure reproducibility.

prompt = "A photo of Abid Ali Awan wearing a business suit at an office meeting."

seed = 65
generator = torch.Generator("cuda").manual_seed(seed)
image = pipe(prompt=prompt, num_inference_steps=25, generator=generator).images[0]
image.resize((300, 300))

image12.png

Next, we will load SDXL refiner and run the refiner pipeline with the same prompt, generator, and generated image.

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda");

image = refiner(prompt=prompt, num_inference_steps=25, generator=generator, image=image)
image.images[0].resize((300, 300))

The image refiner has enhanced the image quality, but the resulting image bears no resemblance to the original photo of the author. Instead, it features a generic businessman from South Asia. Perhaps, for fine-tuned models, it's better that we avoid using the refiner.

image16.png

After fine-tuning the SDXL model, the next step is to build a Generative AI application. For inspiration on creating your project, check out the 5 Projects Built with Generative Models and Open Source Tools blog.

Conclusion

In this tutorial, we have learned about the Stable Difusion XL model and DreamBooth technique. Additionally, we have learned to fine-tune Stable Diffusion XL using AutoTrain Advance’s DreamBooth script for personalized image generation.

After loading the base SDXL model, we trained a lightweight LoRA adapter using just five photos of the author. Then. LoRA weights were attached to base SDXL, allowing it to generate high-quality images of the author in various imagined scenarios while retaining the capabilities of the full multi-billion parameter model.

Although the SDXL Refiner enhanced the image quality, it was most effective for generic prompts and appeared to override some personalization achieved through fine-tuning. Nonetheless, the tutorial demonstrated how simple it is to fine-tune an SDXL model using AutoTrain Advance with just a few images.

If you're new to the world of AI and want to understand what all the hype is about, we suggest taking an AI Fundamentals course. This will help you learn about Generative AI and complex language models. Additionally, you can kickstart your career in AI engineering by enrolling in the Machine Learning Scientist with Python career track. This complete program will help you learn natural language processing and image processing, as well as popular Python machine learning packages such as scikit-learn, PySpark, and Keras.


Photo of Abid Ali Awan
Author
Abid Ali Awan

I am a certified data scientist who enjoys building machine learning applications and writing blogs on data science. I am currently focusing on content creation, editing, and working with large language models.

Topics

Start Your AI Journey Today!

Track

AI Fundamentals

10 hours hr
Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Generative AI Certifications in 2024: Options, Certificates and Top Courses

Unlock your potential with generative AI certifications. Explore career benefits and our guide to advancing in AI technology. Elevate your career today.
Adel Nehme's photo

Adel Nehme

6 min

What is DeepMind AlphaGeometry?

Discover AphaGeometry, an innovative AI model with unprecedented performance to solve geometry problems.
Javier Canales Luna's photo

Javier Canales Luna

8 min

[AI and the Modern Data Stack] Accelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel

Richie, Nuri, and La Tiffaney explore AI’s impact on marketing analytics, how AI is being integrated into existing products, the workflow for implementing AI into business processes and the challenges that come with it, the democratization of AI, what the state of AGI might look like in the near future, and much more.
Richie Cotton's photo

Richie Cotton

52 min

OpenCV Tutorial: Unlock the Power of Visual Data Processing

This article provides a comprehensive guide on utilizing the OpenCV library for image and video processing within a Python environment. We dive into the wide range of image processing functionalities OpenCV offers, from basic techniques to more advanced applications.
Richmond Alake's photo

Richmond Alake

13 min

Building Intelligent Applications with Pinecone Canopy: A Beginner's Guide

Explore using Canopy as an open-source Retrieval Augmented Generation (RAG) framework and context built on top of the Pinecone vector database.
Kurtis Pykes 's photo

Kurtis Pykes

12 min

Semantic Search with Pinecone and OpenAI

A step-by-step guide to building semantic search applications using OpenAI and Pinecone in Python.
Moez Ali's photo

Moez Ali

13 min

See MoreSee More