This is a simplified guide to an AI model called Stable-Diffusion-Xl-Base-1.0 maintained by Stabilityai. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model overview
The stable-diffusion-xl-base-1.0
is a text-to-image generative model developed by Stability AI. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders - OpenCLIP-ViT/G and CLIP-ViT/L. The model can be used to generate and modify images based on text prompts. It is part of the SDXL ensemble of experts pipeline for latent diffusion, which includes a specialized refinement model for the final denoising steps.
Similar models include oot_diffusion_dc, a full body version, kandinsky-2, a text2img model trained on LAION HighRes and fine-tuned on internal datasets, and pixart-sigma, a model for 4K text-to-image generation.
Model inputs and outputs
The stable-diffusion-xl-base-1.0
model takes text prompts as input and generates corresponding images as output. The model can be used as a standalone module or as part of a two-stage pipeline, where the base model generates latents that are then further refined by a specialized high-resolution model.
Inputs
- Text prompt: A description of the desired image, such as "a beautiful sunset over a mountain landscape".
Outputs
- Generated image: An image that corresponds to the input text prompt, generated using the model's diffusion-based text-to-image capabilities.
Capabilities
The stable-diffusion-xl-base-1.0
mod...
Click here to read the full guide to Stable-Diffusion-Xl-Base-1.0