Multi-modal

Gemini Omni

by Google DeepMind

Google's first any-to-any AI model. Text, images, audio, and video in, a single video out.

Gemini Omni

Key features

Technical specifications

Omni Flash

First model in Google's Gemini Omni family

Video

Image and audio output planned in the Gemini Omni roadmap

Up to 10s

Flash clips capped at 10 seconds at launch to widen access

Any mix

Text, image, audio, and video in one prompt

Voice references

Voice samples supported first; full audio inputs coming later

SynthID

Imperceptible AI-provenance watermark on every clip

May 19, 2026

Announced at Google I/O 2026

Google DeepMind

Successor positioning to Veo for any-to-any video creation

Use cases

Multi-input storyboarding

A character image, location photo, music cue, and beat go in; the model builds the shot and iterates.

Conversational video editing

Edit any clip in plain language: swap wardrobe, change a background, or retime a beat. The rest stays steady.

Marketing video

Ad cuts that respect brand colors, product shape, and on-screen text. One photo, one brief, one finished spot.

Educational explainers

Visualize science, history, and engineering with built-in physics. The science stays honest, the footage clean.

Spokesperson video

A portrait plus a voice reference gives the same on-camera presenter across shorts, courses, and walkthroughs.

Social shorts

10-second clips fit YouTube Shorts, Reels, and TikTok. Generate variations, then publish the one that lands.

Prompt examples

Cinematic noir

Cinematic noir

Detective in a rain-soaked Tokyo alley, sodium-lamp glow, teal-amber noir

Edit prompt
Product launch

Product launch

Avant-garde sneaker mid-air over a titanium plinth, hard key light, launch mood

Edit prompt
Nature explainer

Nature explainer

Droplet frozen as a crystalline crown on a dewy leaf, backlit sunrise macro

Edit prompt
Avatar spokesperson

Avatar spokesperson

Poised studio host addressing the lens, warm three-point light, 85mm bokeh

Edit prompt
Architectural walkthrough

Architectural walkthrough

Golden-hour light through a brutalist concrete villa, long shadows, drifting dust

Edit prompt
Story beat

Story beat

Woman by a rain-flecked window reading a letter, worry easing into relief

Edit prompt

Simple pricing

Get started for free today, with the option to upgrade or cancel anytime.

Basic

$0/ month
billed as $0 per year

500 monthly credits

1 user only

All models

Workflows

Standard

$0/ month
billed as $0 per year

2800 monthly credits

1 user only

All models

Workflows

Pro

$0/ month
billed as $0 per year

6000 shared monthly credits

1 user

+ up to 4 more at extra cost

All models

Workflows

Pro Max

$0/ month
billed as $0 per year

24000 shared monthly credits

1 user

+ up to 9 more at extra cost

All models

Workflows

Enterprise

For higher limits

Custom

pricing and billing terms

Unlimited credits
Custom seat limits
All models
Workflows
Pricing Gradient

Free

For playing around

$0

forever free

Up to 20 credits
1 user only
Limited models
Workflows

FAQs

What is Gemini Omni?
Gemini Omni is Google's first any-to-any multimodal model, announced at Google I/O 2026. The first release, Gemini Omni Flash, accepts text, images, audio, and video as input and produces video as output, with conversational editing, character consistency, and SynthID watermarking on every clip.
Is Gemini Omni an image model?
No. Gemini Omni outputs video. The model accepts images as input alongside text, audio, and video, but the generated output is a video clip. Google has said image and audio output modalities are on the Gemini Omni roadmap but are not part of the initial launch.
How do I use Gemini Omni on Morphic?
Open Morphic, switch the prompt bar to Video mode, and pick Gemini Omni from the model picker. Attach any combination of text, image, audio, and video references, then run the prompt. To revise the result, ask in the next message and the scene keeps the prior context.
How long are Gemini Omni videos?
Gemini Omni Flash clips are capped at 10 seconds at launch. Google has framed the cap as a deployment decision to widen access during the initial rollout, not a hard model limit, so longer Gemini Omni durations are possible in future releases.
What inputs does Gemini Omni accept?
Gemini Omni accepts text, images, audio, and video in any combination within a single prompt. Voice references are the first audio input supported; broader audio inputs and additional output modalities are planned.
How does Gemini Omni compare to Veo 3.1?
Veo 3.1 is Google DeepMind's photorealistic video model with 4K resolution, native audio, and 8-second clips, tuned for broadcast-quality realism. Gemini Omni Flash is the any-to-any sibling, lower duration (up to 10 seconds), focused on multi-input reasoning, conversational editing, and persistent character consistency across edits. Veo is the realism specialist; Gemini Omni is the multimodal director.
How does Gemini Omni compare to Seedance 2.0?
Both Gemini Omni and Seedance 2.0 are multimodal video models. Seedance 2.0 accepts up to 12 mixed assets per generation with native audio synthesis and music beat sync, at 1080p, 4 to 15 seconds. Gemini Omni Flash focuses on conversational editing turn-by-turn and on Google's physics and real-world reasoning, currently capped at 10 seconds.
Does Gemini Omni include a watermark?
Yes. Every video generated by Gemini Omni carries Google's imperceptible SynthID watermark for AI provenance. The watermark is invisible to viewers and survives common edits like re-encoding and resizing.
Does Gemini Omni support character consistency?
Yes. Characters introduced in one Gemini Omni shot retain their face, clothing, and voice across cuts and across subsequent edits in the same conversation, without re-uploading the reference each turn.
When was Gemini Omni released?
Google announced Gemini Omni at Google I/O 2026 on May 19, 2026. Gemini Omni Flash is the first release in the family, with image and audio output described as planned future additions.