Question 1

What is Gemini Omni?

Accepted Answer

Gemini Omni is Google's first any-to-any multimodal model, announced at Google I/O 2026. The first release, Gemini Omni Flash, accepts text, images, audio, and video as input and produces video as output, with conversational editing, character consistency, and SynthID watermarking on every clip.

Question 2

Is Gemini Omni an image model?

Accepted Answer

No. Gemini Omni outputs video. The model accepts images as input alongside text, audio, and video, but the generated output is a video clip. Google has said image and audio output modalities are on the Gemini Omni roadmap but are not part of the initial launch.

Question 3

How do I use Gemini Omni on Morphic?

Accepted Answer

Open Morphic, switch the prompt bar to Video mode, and pick Gemini Omni from the model picker. Attach any combination of text, image, audio, and video references, then run the prompt. To revise the result, ask in the next message and the scene keeps the prior context.

Question 4

How long are Gemini Omni videos?

Accepted Answer

Gemini Omni Flash clips are capped at 10 seconds at launch. Google has framed the cap as a deployment decision to widen access during the initial rollout, not a hard model limit, so longer Gemini Omni durations are possible in future releases.

Question 5

What inputs does Gemini Omni accept?

Accepted Answer

Gemini Omni accepts text, images, audio, and video in any combination within a single prompt. Voice references are the first audio input supported; broader audio inputs and additional output modalities are planned.

Question 6

How does Gemini Omni compare to Veo 3.1?

Accepted Answer

Veo 3.1 is Google DeepMind's photorealistic video model with 4K resolution, native audio, and 8-second clips, tuned for broadcast-quality realism. Gemini Omni Flash is the any-to-any sibling, lower duration (up to 10 seconds), focused on multi-input reasoning, conversational editing, and persistent character consistency across edits. Veo is the realism specialist; Gemini Omni is the multimodal director.

Question 7

How does Gemini Omni compare to Seedance 2.0?

Accepted Answer

Both Gemini Omni and Seedance 2.0 are multimodal video models. Seedance 2.0 accepts up to 12 mixed assets per generation with native audio synthesis and music beat sync, at 1080p, 4 to 15 seconds. Gemini Omni Flash focuses on conversational editing turn-by-turn and on Google's physics and real-world reasoning, currently capped at 10 seconds.

Question 8

Does Gemini Omni include a watermark?

Accepted Answer

Yes. Every video generated by Gemini Omni carries Google's imperceptible SynthID watermark for AI provenance. The watermark is invisible to viewers and survives common edits like re-encoding and resizing.

Question 9

Does Gemini Omni support character consistency?

Accepted Answer

Yes. Characters introduced in one Gemini Omni shot retain their face, clothing, and voice across cuts and across subsequent edits in the same conversation, without re-uploading the reference each turn.

Question 10

When was Gemini Omni released?

Accepted Answer

Google announced Gemini Omni at Google I/O 2026 on May 19, 2026. Gemini Omni Flash is the first release in the family, with image and audio output described as planned future additions.

Gemini Omni

Key features

Any-to-any input

Conversational editing

Character consistency

Physics and realism

Native audio

SynthID watermarking

Technical specifications

Use cases

Multi-input storyboarding

Conversational video editing

Marketing video

Educational explainers

Spokesperson video

Social shorts

Prompt examples

Cinematic noir

Product launch

Nature explainer

Avatar spokesperson

Architectural walkthrough

Story beat

Simple pricing

FAQs

More about Gemini Omni

Other models