Multi-modal

Available now

Grok Imagine

by xAI

xAI's cross‑modal model.
Text‑to‑image, image edits, text‑to‑video, image‑to‑video, and video‑to‑video in one consistent visual system.

Grok Imagine

by xAI

Key features

True cross-modal generation

One of the few models strong at both image and video. Create stills and motion with the same model, with a consistent visual style.

Five generation modes

The broadest range of input/output paths. Text-to-image, image edits, text-to-video, image-to-video, and video-to-video are all here.

Video-to-video transformation

Transform existing videos. Change style, rework the look, alter the environment, while keeping original motion and timing intact.

Image editing & enhancement

Edit and enhance existing images with text instructions. Modify elements, shift style, or apply broad changes via image-to-image.

Strong prompt adherence

Reliable interpretation of creative directions across all five modes. Complex text descriptions land as accurate visual output.

Seamless format pipeline

Design an image, then animate it. Or transform a video, pull frames, and re-edit. The cross-modal design keeps the workflow fluid.

A jellyfish drifting through a desert sky at sunset, ambient audioTry now

Technical specifications

1080p

Full HD output

6–10s

6–10 seconds for video

TTI, ITI, TTV, ITV, VTV

24 fps

Standard frame rate

Use cases

Multi-format campaigns

Generate matching images and videos from one concept. Design a still hero, then animate it to video with consistent visual style.

Video transformation

Restyle existing footage with AI. Transform the look and feel of video clips while preserving the original motion, timing, and structure.

Image-to-video pipeline

Design a still, refine it with image editing, then animate it to video. The cross-modal flow keeps creative control precise.

Creative exploration

Experiment across image and video without switching models. Explore ideas in stills and motion from one creative brief.

Style transfer

Apply new styles to existing images and videos. Turn photos into paintings, footage into animation, or restyle for new audiences.

Content repurposing

Transform existing assets across formats. Turn product photos into promo videos, animate illustrations, or restyle existing clips.

Prompt examples

Image generation

A cyberpunk city at night with holographic advertisements floating between buildings, reflections on wet streets, detailed neon colors, cinematic wide angle

Edit prompt

Video from image

Upload a landscape photo and describe: Bring this scene to life with gentle wind through trees, moving clouds, and soft light changes

Edit prompt

Video transformation

Upload a video clip and describe: Transform to watercolor painting style, maintain all original motion, soft pastel colors

Edit prompt

Simple pricing

Get started for free today, with the option to upgrade or cancel anytime.

Basic

/ month

billed as $0 per year

900 monthly credits

1 user only

All models

Workflows

Standard

/ month

billed as $0 per year

3200 monthly credits

1 user only

All models

Workflows

Pro

/ month

billed as $0 per year

6200 shared monthly credits

1 user

+ up to 4 more at extra cost

All models

Workflows

Pro Max

/ month

billed as $0 per year

24000 shared monthly credits

1 user

+ up to 9 more at extra cost

All models

Workflows

Enterprise

For higher limits

Custom

pricing and billing terms

High-volume credits

Custom seat limits

All models

Workflows

Free

For playing around

forever free

Up to 20 credits

1 user only

Limited models

Workflows

Compare plan details

FAQs

What is Grok Imagine?

Grok Imagine is xAI's multimodal AI model that generates both images and videos. It supports five modes: text-to-image, image-to-image editing, text-to-video, image-to-video animation, and video-to-video transformation.

Can Grok Imagine create both images and videos?

Yes. Grok Imagine is one of the few models that spans both image and video generation from a single model, making it uniquely versatile for creators who work across formats.

What is video-to-video?

Video-to-video lets you transform existing video clips, change visual styles, edit aesthetics, or reimagine footage while preserving the original motion, structure, and timing.

How long are Grok Imagine videos?

Grok Imagine generates videos between 6 and 10 seconds at 1080p, suitable for social media clips, creative shorts, and promotional content.

What makes Grok Imagine unique?

Its cross-modal versatility. While most models specialize in either image or video, Grok Imagine excels at both plus offers video-to-video transformation, five modes from a single model.

How do I use Grok Imagine on Morphic?

Open Copilot, describe what you want, and pick Grok Imagine. You can generate stills, animate an image, or transform an existing video clip, all from one model, which keeps cross-format work in a single conversation.

Grok Imagine

Key features

True cross-modal generation

Five generation modes

Video-to-video transformation

Image editing & enhancement

Strong prompt adherence

Seamless format pipeline

Technical specifications

Use cases

Multi-format campaigns

Video transformation

Image-to-video pipeline

Creative exploration

Style transfer

Content repurposing

Prompt examples

Image generation

Video from image

Video transformation

Simple pricing

FAQs

Other models