DALL-E
What is DALL-E?
DALL-E is OpenAI's first AI model that could generate images from text descriptions, proving that a computer could create new pictures from written instructions.
At a glance
- Type of model
- Text-to-image generation model
- Developed by
- OpenAI
- Key capability
- Generating coherent images from natural language prompts, including novel combinations of concepts not seen during training
- How it fits in AI workflow
- The original DALL-E established text-to-image generation as a practical modality and is the ancestor of DALL-E 2 and DALL-E 3, which are the versions currently used in production creative workflows
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.
How it compares
DALL-E is a proprietary model developed and controlled by OpenAI, accessed through their API and products. Stable Diffusion is an open-source model whose weights are publicly available, enabling community customization, local deployment, and a wide ecosystem of fine-tuned variants. DALL-E prioritizes commercial safety and ease of use; Stable Diffusion prioritizes openness, flexibility, and community extension.
Pro tip
Understanding DALL-E's historical role helps contextualize the entire text-to-image generation field. When encountering literature, tutorials, or discussions about AI image generation from 2021 and 2022, DALL-E references typically mean the original model or DALL-E 2. Distinguishing between the three generations by their release context avoids confusion when evaluating older capability claims against current model performance.
Types and variations
- The original DALL-E used a transformer-based autoregressive architecture and produced lower-resolution outputs relative to its successors.
- DALL-E 2 replaced the architecture with a diffusion-based approach, significantly improving quality and enabling inpainting and outpainting.
- DALL-E 3 further advanced prompt adherence, text rendering, and compositional sophistication.
- Each version represents a distinct model with different capabilities, though they share the same founding concept and naming lineage.
Ready to make your first scene in Morphic?
Try MorphicCommon use cases
- Research and education contexts where the original model's historical significance and foundational capabilities are the subject of study.
- Early commercial creative workflows where DALL-E outputs were used for concept exploration and ideation before higher-quality successors were available.
- Demonstrations of AI creative capability to audiences unfamiliar with text-to-image generation.
- The original DALL-E is less commonly used for current production work, which typically relies on DALL-E 2, DALL-E 3, or third-party models.
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.
FAQs
DALL-E is OpenAI's original text-to-image generation model, released in January 2021. It demonstrated that an AI trained on image-text pairs could generate coherent new images from natural language descriptions, including novel combinations of concepts not present in training data.
DALL-E was developed by OpenAI. The name combines references to Salvador Dalí and the Pixar character WALL-E, reflecting the project's creative and technological ambitions.
The original DALL-E used a transformer-based autoregressive architecture and produced lower-resolution outputs. DALL-E 2 switched to a diffusion-based approach for significantly improved quality. DALL-E 3 added major advances in prompt adherence and text rendering. Each is a distinct model with different capabilities.
The original DALL-E used a transformer architecture that processed image and text tokens together as a joint sequence. DALL-E 2 and DALL-E 3 use diffusion-based architectures, which have become the dominant approach in text-to-image generation.
No. DALL-E and its successors are proprietary models developed and controlled by OpenAI. They are accessed through OpenAI's API and integrated products rather than being available as downloadable model weights.
DALL-E was significant because it was one of the first publicly demonstrated AI systems capable of generating coherent, creative images from open-ended natural language descriptions at scale. It sparked widespread interest in generative AI's creative potential and established natural language as a creative interface for image generation.
The original DALL-E is primarily of historical and educational significance today. Current creative workflows typically use DALL-E 3, which is integrated into ChatGPT and Microsoft creative tools, or third-party models that have surpassed the original in quality and capability.
The original DALL-E could generate a wide range of images from text prompts, including novel conceptual combinations such as objects in unusual forms or settings. Its outputs were lower in resolution and consistency than current models but demonstrated the core principle of compositional generalization from language to imagery.