If you're interested in generative AI, you've likely seen tools like Runway that generate video from an image. They're cool tools. Some of you, like me, want to run these tools for ourselves and have them on our personal machines.

If that's you, you've come to the right place. We're going to install a tool to generate AI videos today. Stable Video Diffusion was just released, and we're going to try it out.

What we'll do:

Install some pre-requisites
Configure our system for Stable Diffusion Video
Install the Stable Diffusion tools and checkpoints, and run it all with Streamlit.
Generate this video

In this tutorial, I'm using Linux and Anaconda to run this. The directions should be the same for Windows or Mac.

If this is your first time doing this kind of thing, don't worry. We'll start by setting up your environment with the right tools and dependencies and installing a few things. We aren't going to dive too deep into how this product works. We'll just install it and try it out.

Step 1: Install Dependencies

We're going to grab this repo from Stability AI.

git clone https://github.com/Stability-AI/generative-models.git && cd generative-models

Next, we'll create a virtual environment. It's essential to use Python 3.10. For some reason, it was fussy with the newer versions.

conda create --name stablevideodiff python=3.10.0
conda activate stablevideodiff

And then install all the requirements:

pip3 install -r requirements/pt2.txt

And you'll see all the dependencies get installed:

But you may see an error like this:

ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

This is sometimes due to the lack of a RUST compiler. You can install it with this command (in Linux):

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Once it's done installing you should see no errors.

Step 2: A couple of finishing touches

First, in your application folder (generative-models) type in the following.

pip3 install .

Then we want to install sdata for training:

pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

And now we're just about ready to let this thing cook.

Step 3: Running the Demo

Ok, now you want a demo.

You will need to download svd.safetensors to the /checkpoints directory. If it's not there, create a directory named checkpoints at the base of the app.

Note: I am running an RTX 4090 with 24G VRAM and have 64GB of RAM on the PC. I still had to do this:

export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'

This helps with memory allocation as you run it, so you aren't as likely to run out of memory. I did a few times before playing with it.

Now we can fire up the web app:

PYTHONPATH=. streamlit run scripts/demo/video_sampling.py --server.port 8080

In your browser, you should see this:

So let's try it out. Click "load model".

It will start running.

Then you'll see this message:

Awesome,

Now I'll load up an image sample using the "Browse files" button you see above.

And I want to set this to a lower value of 2, so I don't run out of VRAM. Again, I have 24G to work with, but I must be careful. Pytorch is happy to use all the memory for this.

and then click sample. You will see it start to work:

And there we have it!! We produced a video.

It's only 2 seconds long but pretty awesome. You can download it here

I love it and can't wait to do more of these.

What did we make?

So here, I generated exactly 14 frames from an image. It's 2 seconds, 7 frames per second.

You can change that setting here:

You can set how many frames total you can generate. I've managed to get up to 3 seconds but ran out of VRAM.

Your results may vary.

Conclusion

Congratulations, you did it! This tutorial taught us how to set up an environment for Stable Video Diffusion, install it, and run it. This is an excellent way to get familiar with generative AI models and how to tune Stable Diffusion. You can produce some awesome stuff by getting to know your way around. Experiment with different settings, frames, and images to see how you can push the boundaries of this thing. Your results may vary based on hardware limitations. I'm running a 4090, as I mentioned, and it's still getting taxed. But like with all these things, it's about the learning experience.

What are you doing with Stable Diffusion or Generative AI today? Let me know! Let's talk.

Also, if you have any questions or comments, feel free to reach out.

Happy hacking!

Creating Videos with Stable Video Diffusion

Step 1: Install Dependencies

Step 2: A couple of finishing touches

Step 3: Running the Demo

What did we make?

Conclusion