How to Generate AI Images with Stable Diffusion XL in 5 Minutes

Jeremy Morgan - Mar 12 - - Dev Community

Want to generate awesome AI images from your machine?

Stable Diffusion XL is one of the best local image generators out there, and here's how you can set it up in minutes.

Note: you will need a good graphics card for this. A minimum of 4G of VRAM is needed; you'll be much better off with 8G or more.

We will use Stable Diffusion XL on a Linux system. The instructions are the same for Mac or Windows if you use WSL.

Step 1: Create a Python Virtual Environment

Let's set up a Python Virtual Environment. This helps us manage dependencies and keep our project clean.



python -m venv stablediff


Enter fullscreen mode Exit fullscreen mode

Activate the virtual environment with this command



source stablediff/bin/activate


Enter fullscreen mode Exit fullscreen mode

You should see the name of your environment in parens before your prompt to know it works:

Now, we'll install our dependencies!

Step 2: Install Dependencies

Next, we need to install our dependencies:



pip install invisible_watermark transformers accelerate safetensors xformers


Enter fullscreen mode Exit fullscreen mode

Lastly, install diffusers, since it will downgrade packages to make everything work together:



pip install diffusers


Enter fullscreen mode Exit fullscreen mode

Now you're ready to create your Python file and generate some images!

Step 3: Create a Simple Generator

Create a file named app.py or whatever you want.

In that file, let's import our libraries:



from diffusers import DiffusionPipeline
import torch


Enter fullscreen mode Exit fullscreen mode

Next, we want to initialize the pipeline to generate images. We're going to use the stable-diffusion-xl-base-1.0 pretrained model.

We'll set our datatype to float16 for memory efficiency and enable the use of safetensors:



pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")


Enter fullscreen mode Exit fullscreen mode

Next, we'll send the pipeline to the GPU:



pipe.to("cuda")


Enter fullscreen mode Exit fullscreen mode

And next, we'll have the text prompt to send to the model. This can be anything you want.



prompt = "A anthropomorphic poodle riding a dirt bike through the forest"


Enter fullscreen mode Exit fullscreen mode

One thing you may want to add is enabling xformers for memory efficiency:



pipe.enable_xformers_memory_efficient_attention()


Enter fullscreen mode Exit fullscreen mode

Now, we can generate our image!



images = pipe(prompt=prompt).images[0]


Enter fullscreen mode Exit fullscreen mode

Once the image is generated, we can save it:



images.save("output.png")


Enter fullscreen mode Exit fullscreen mode

Step 4: Run it!

Now we run the file and get a cool image!

You should see something like this at the prompt:

and now we'll have an image (output.png) generated!

Pretty awesome, right?

Using Base + Refiner

You can get even better quality images using an "ensemble of experts" pattern with a base and a refiner:



from diffusers import DiffusionPipeline
import torch

# load both base & refiner
base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
base.to("cuda")
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

# Define how many steps and what % of steps to be run on each experts (80/20) here
n_steps = 40
high_noise_frac = 0.8

prompt = "A anthropomorphic poodle riding a dirt bike through the forest"

# run both experts
image = base(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_end=high_noise_frac,
    output_type="latent",
).images
image = refiner(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_start=high_noise_frac,
    image=image,
).images[0]


Enter fullscreen mode Exit fullscreen mode

This produces some nice results, with settings you can tweak.

Awesome!

Conclusion

This is the easiest, most low-overhead way I know to run Stable Diffusion XL. If you prefer, you can also use the Stable Diffusion Web UI, which gives you easy access to lots of controls and allows you to swap models and refiners easily.

Follow me for more cool stuff like this!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .