DALL-E

What is DALL-E?

DALL-E is OpenAI's first AI model that could generate images from text descriptions, proving that a computer could create new pictures from written instructions.

At a glance

Type of model
Text-to-image generation model
Developed by
OpenAI
Key capability
Generating coherent images from natural language prompts, including novel combinations of concepts not seen during training
How it fits in AI workflow
The original DALL-E established text-to-image generation as a practical modality and is the ancestor of DALL-E 2 and DALL-E 3, which are the versions currently used in production creative workflows

Ready to create?

Direct scenes, design characters, and ship full films

All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.

How it compares

How it compares

DALL-E is a proprietary model developed and controlled by OpenAI, accessed through their API and products. Stable Diffusion is an open-source model whose weights are publicly available, enabling community customization, local deployment, and a wide ecosystem of fine-tuned variants. DALL-E prioritizes commercial safety and ease of use; Stable Diffusion prioritizes openness, flexibility, and community extension.


Pro tip

Understanding DALL-E's historical role helps contextualize the entire text-to-image generation field. When encountering literature, tutorials, or discussions about AI image generation from 2021 and 2022, DALL-E references typically mean the original model or DALL-E 2. Distinguishing between the three generations by their release context avoids confusion when evaluating older capability claims against current model performance.

Types and variations

  • The original DALL-E used a transformer-based autoregressive architecture and produced lower-resolution outputs relative to its successors.
  • DALL-E 2 replaced the architecture with a diffusion-based approach, significantly improving quality and enabling inpainting and outpainting.
  • DALL-E 3 further advanced prompt adherence, text rendering, and compositional sophistication.
  • Each version represents a distinct model with different capabilities, though they share the same founding concept and naming lineage.

Ready to make your first scene in Morphic?

Try Morphic

Common use cases

  • Research and education contexts where the original model's historical significance and foundational capabilities are the subject of study.
  • Early commercial creative workflows where DALL-E outputs were used for concept exploration and ideation before higher-quality successors were available.
  • Demonstrations of AI creative capability to audiences unfamiliar with text-to-image generation.
  • The original DALL-E is less commonly used for current production work, which typically relies on DALL-E 2, DALL-E 3, or third-party models.

Ready to create?

Direct scenes, design characters, and ship full films

All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.

FAQs

What is DALL-E?

DALL-E is OpenAI's original text-to-image generation model, released in January 2021. It demonstrated that an AI trained on image-text pairs could generate coherent new images from natural language descriptions, including novel combinations of concepts not present in training data.

Who made DALL-E?

DALL-E was developed by OpenAI. The name combines references to Salvador Dalí and the Pixar character WALL-E, reflecting the project's creative and technological ambitions.

How is DALL-E different from DALL-E 2 and DALL-E 3?

The original DALL-E used a transformer-based autoregressive architecture and produced lower-resolution outputs. DALL-E 2 switched to a diffusion-based approach for significantly improved quality. DALL-E 3 added major advances in prompt adherence and text rendering. Each is a distinct model with different capabilities.

What architecture does DALL-E use?

The original DALL-E used a transformer architecture that processed image and text tokens together as a joint sequence. DALL-E 2 and DALL-E 3 use diffusion-based architectures, which have become the dominant approach in text-to-image generation.

Is DALL-E open source?

No. DALL-E and its successors are proprietary models developed and controlled by OpenAI. They are accessed through OpenAI's API and integrated products rather than being available as downloadable model weights.

Why was DALL-E significant when it was released?

DALL-E was significant because it was one of the first publicly demonstrated AI systems capable of generating coherent, creative images from open-ended natural language descriptions at scale. It sparked widespread interest in generative AI's creative potential and established natural language as a creative interface for image generation.

What is DALL-E used for today?

The original DALL-E is primarily of historical and educational significance today. Current creative workflows typically use DALL-E 3, which is integrated into ChatGPT and Microsoft creative tools, or third-party models that have surpassed the original in quality and capability.

What kinds of images could the original DALL-E generate?

The original DALL-E could generate a wide range of images from text prompts, including novel conceptual combinations such as objects in unusual forms or settings. Its outputs were lower in resolution and consistency than current models but demonstrated the core principle of compositional generalization from language to imagery.

Can't find what you are looking for?
Contact us and let us know.
bg