Multi-modal

Gemini Omni

by Google DeepMind

Google's first any-to-any AI model. Text, images, audio, and video in. Text, images, audio, and video out.

Any-to-any inputText-to-videoImage-to-videoAudio-to-videoVideo-to-videoConversational editingCharacter consistencyPhysics-accurate motionVoice reference audioSynthID watermark
Gemini Omni

Key features

What makes Gemini Omni stand out from other AI models

Technical specifications

Key specs and capabilities at a glance

Omni Flash

First model in Google's Gemini Omni family

Video

Image and audio output planned in the Gemini Omni roadmap

Up to 10s

Flash clips capped at 10 seconds at launch to widen access

Text, image, audio, video

Any combination in one Gemini Omni prompt

Voice references

Voice samples supported first; full audio inputs coming later

SynthID

Imperceptible AI-provenance watermark on every Gemini Omni output

May 19, 2026

Announced at Google I/O 2026

Google DeepMind

Successor positioning to Veo for any-to-any video creation

Use cases

How creators and businesses use Gemini Omni on Morphic

Multi-input storyboarding

Drop in a character image, a location photo, a music cue, and one beat. The model assembles the shot; follow-ups iterate on the scene.

Conversational video editing

Edit any clip in plain language. Swap wardrobe, replace a background, adjust lighting, retime a beat; the rest of the shot stays steady.

Marketing video

Ad cuts that respect brand colors, product shape, and on-screen text. One product photo, one voice-over, one brief, one finished spot.

Educational explainers

Visualize science, history, and engineering concepts with built-in physics. The science stays honest while the footage stays clean.

Avatar and spokesperson video

A portrait plus a voice reference gives you the same on-camera presenter across multiple shorts. Fits courses, walkthroughs, and social.

Social shorts

10-second clips fit YouTube Shorts, Reels, and TikTok. Generate variations through conversation, then publish the version that lands.

Prompt examples

Open any of these to tweak and generate

Cinematic action

Detective walking through neon-lit Tokyo alley at night, rain reflections on wet pavement, low-angle tracking shot, gritty noir

Edit prompt

Product launch

Matte-black wireless earbuds rotating above a marble pedestal, soft rim light, subtle haze, premium commercial mood

Edit prompt

Nature explainer

Slow-motion water droplet hitting a leaf and bouncing, macro lens, soft morning light, accurate fluid behavior

Edit prompt

Avatar spokesperson

Confident host in front of warm studio backdrop, eye contact, calm gestures, soft three-point lighting, broadcast feel

Edit prompt

Architectural walkthrough

Slow dolly through a minimalist concrete house at golden hour, long shadows, dust in the sunlight, calm score-ready pacing

Edit prompt

Story beat

Same character from earlier shot now seated by a window, reading a letter, expression shifting from worry to relief, soft natural light

Edit prompt

Simple pricing

Get started for free today, with the option to upgrade or cancel anytime.

Basic

$0/ month
billed as $0 per year

500 monthly credits

1 user only

All models

Workflows

Standard

$0/ month
billed as $0 per year

2800 monthly credits

1 user only

All models

Workflows

Pro

$0/ month
billed as $0 per year

6000 shared monthly credits

1 user

+ up to 4 more at extra cost

All models

Workflows

Pro Max

$0/ month
billed as $0 per year

24000 shared monthly credits

1 user

+ up to 9 more at extra cost

All models

Workflows

Enterprise

For higher limits

Custom

pricing and billing terms

Unlimited credits
Custom seat limits
All models
Workflows
Pricing Gradient

Free

For playing around

$0

forever free

Up to 20 credits
1 user only
Limited models
Workflows

FAQs

What is Gemini Omni?
Gemini Omni is Google's first any-to-any multimodal model, announced at Google I/O 2026. The first release, Gemini Omni Flash, accepts text, images, audio, and video as input and produces video as output, with conversational editing, character consistency, and SynthID watermarking on every clip.
Is Gemini Omni an image model?
No. Gemini Omni outputs video. The model accepts images as input alongside text, audio, and video, but the generated output is a video clip. Google has said image and audio output modalities are on the Gemini Omni roadmap but are not part of the initial launch.
How do I use Gemini Omni on Morphic?
Open Morphic, switch the prompt bar to Video mode, and pick Gemini Omni from the model picker. Attach any combination of text, image, audio, and video references, then run the prompt. To revise the result, ask in the next message and the scene keeps the prior context.
How long are Gemini Omni videos?
Gemini Omni Flash clips are capped at 10 seconds at launch. Google has framed the cap as a deployment decision to widen access during the initial rollout, not a hard model limit, so longer Gemini Omni durations are possible in future releases.
What inputs does Gemini Omni accept?
Gemini Omni accepts text, images, audio, and video in any combination within a single prompt. Voice references are the first audio input supported; broader audio inputs and additional output modalities are planned.
How does Gemini Omni compare to Veo 3.1?
Veo 3.1 is Google DeepMind's photorealistic video model with 4K resolution, native audio, and 8-second clips, tuned for broadcast-quality realism. Gemini Omni Flash is the any-to-any sibling, lower duration (up to 10 seconds), focused on multi-input reasoning, conversational editing, and persistent character consistency across edits. Veo is the realism specialist; Gemini Omni is the multimodal director.
How does Gemini Omni compare to Seedance 2.0?
Both Gemini Omni and Seedance 2.0 are multimodal video models. Seedance 2.0 accepts up to 12 mixed assets per generation with native audio synthesis and music beat sync, at 1080p, 4 to 15 seconds. Gemini Omni Flash focuses on conversational editing turn-by-turn and on Google's physics and real-world reasoning, currently capped at 10 seconds.
Does Gemini Omni include a watermark?
Yes. Every video generated by Gemini Omni carries Google's imperceptible SynthID watermark for AI provenance. The watermark is invisible to viewers and survives common edits like re-encoding and resizing.
Does Gemini Omni support character consistency?
Yes. Characters introduced in one Gemini Omni shot retain their face, clothing, and voice across cuts and across subsequent edits in the same conversation, without re-uploading the reference each turn.
When was Gemini Omni released?
Google announced Gemini Omni at Google I/O 2026 on May 19, 2026. Gemini Omni Flash is the first release in the family, with image and audio output described as planned future additions.

Try Gemini Omni on Morphic

Sign up for Morphic to start creating with Gemini Omni. No downloads, no setup, just describe what you want and generate.