Diffusion Models
What is Diffusion Models?
Diffusion models learn to make images by starting with random noise and gradually cleaning it up, step by step, until a coherent picture emerges that matches a text prompt or other instructions.
At a glance
- Also known as
- Denoising diffusion modelsScore-based generative modelsLatent diffusion models (for the latent space variant)
- Used for
- Text-to-image generationImage editing and inpaintingVideo generationAudio generationCustom model fine-tuning
- Common tools
- Stable diffusionDALL-e 2DALL-e 3MidjourneyImagenAI video generation platforms
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.
How it compares
Generative Adversarial Networks, or GANs, were the dominant image generation architecture before diffusion models. GANs use two competing networks, a generator and a discriminator, trained adversarially. While capable of producing sharp images, GANs are unstable to train, prone to mode collapse, and less diverse in their outputs. Diffusion models are more stable, produce greater diversity, handle conditioning more reliably, and scale better with additional compute, which is why they have replaced GANs as the dominant approach for high-quality image and video generation.
Pro tip
When using diffusion-based tools, the number of denoising steps, often called inference steps or sampling steps in the interface, directly affects both quality and generation time. More steps give the model more opportunities to refine the image, generally producing better detail and coherence, but each step takes time. For rapid concept exploration, lower step counts produce usable results quickly. For final-quality generations, higher step counts extract more detail from the model. Finding the minimum step count that produces acceptable quality for your use case is a practical way to balance speed and output quality.
Types and variations
- Pixel-space diffusion models operate directly on full-resolution image pixels, requiring significant computational resources.
- Latent diffusion models, including Stable Diffusion, operate in a compressed latent space rather than on pixels directly, substantially reducing computational requirements while maintaining output quality.
- Score-based models are a mathematically related approach that achieves similar generation quality through a different formulation.
- Video diffusion models extend the architecture to the temporal dimension, generating coherent sequences of frames rather than individual images.
Ready to make your first scene in Morphic?
Try MorphicCommon use cases
- Generating images from text prompts across creative, commercial, and research applications.
- Inpainting and outpainting existing images by replacing or extending regions using diffusion-based generation.
- Fine-tuning pre-trained diffusion models on custom datasets to produce specialized character models, style-consistent generators, or domain-specific tools.
- Video generation using temporal diffusion model architectures that produce coherent motion across multiple frames.
- Research into generative AI capabilities, alignment, and safety using diffusion model frameworks.
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.