Drop in a portrait reference, a location photo, a voice sample, and a one-line beat. Gemini Omni reads all four together and generates a single cohesive video, then keeps editing the same scene through conversation.
How to use Gemini Omni on Morphic
1.
Open Video mode in Morphic
From the prompt bar at the bottom on Morphic, switch to Video mode, then pick Gemini Omni by Google from the model picker.
2.
Pick Gemini Omni in the model picker
Open the model picker and choose Gemini Omni from the video models list. The first available release is Gemini Omni Flash, the entry point to Google's Omni family.
3.
Drop in your inputs
Attach the references you want Gemini Omni to combine: text, an image, an audio file, a video clip, or any mix. The model reasons across every input together rather than stitching them, so the final shot reflects each reference at once.
4.
Generate, then keep editing in conversation
Run the prompt. Gemini Omni produces a clip of up to 10 seconds. To change a costume, swap a setting, or retime an action, ask in the next message. The scene remembers what came before, so edits land on the existing shot.
What is Gemini Omni?
Gemini Omni is Google's first any-to-any multimodal model, announced at Google I/O 2026 on May 19, 2026. The first release, Gemini Omni Flash, accepts text, images, audio, and video as input and generates video as output, with conversational editing, character consistency, and SynthID watermarking on every clip. Google has described image and audio output as future additions to the Gemini Omni family.
On Morphic, Gemini Omni sits in the video model picker alongside Veo 3.1, Seedance 2.0, Kling, and the rest of the video catalog.
Gemini Omni features and capabilities
Gemini Omni any-to-any input
A single Gemini Omni prompt accepts text, images, audio, and video at the same time. Instead of stitching the inputs sequentially, the model reasons across them as one brief, so a portrait reference, a location photo, a voice sample, and a written beat all shape the same generated shot. Voice references are the first audio input supported at launch; broader audio inputs are on the roadmap.
Conversational Gemini Omni editing
Every instruction in Gemini Omni builds on the last. Change a costume, swap a background, retime an action, or extend the scene by describing it in the next message. The shot remembers what came before, so edits land on the existing scene rather than starting a new generation.
Gemini Omni character and scene consistency
Characters introduced in one Gemini Omni shot keep their face, clothing, and voice across cuts and across follow-up edits in the same conversation. The model also holds lighting and continuity between turns, so a character introduced in shot one is still recognizable in shot three.
Physics-accurate motion and real-world reasoning
Gemini Omni applies an understanding of physics, culture, history, and science to the scenes it generates. Gravity, weight, collisions, and fluid behavior follow real-world rules; historical and cultural detail holds rather than drifting into generic AI texture. The result is footage where motion looks correct, not just smooth.
Voice references for consistent on-screen voices
Provide a short voice sample alongside text and images, and Gemini Omni keeps the voice consistent in the generated video. Useful for avatar-led explainers, branded spokesperson clips, and short-form social content where the same presenter appears across multiple videos.
SynthID watermark on every Gemini Omni video
Every clip Gemini Omni produces carries Google's imperceptible SynthID digital watermark for AI provenance. The watermark is invisible to viewers and survives common transforms like re-encoding and resizing, so AI-generated material stays identifiable down the chain.
FAQs
Gemini Omni is Google's first any-to-any multimodal model. The first release, Gemini Omni Flash, accepts text, images, audio, and video as input and produces video as output, with conversational editing, character consistency, accurate physics, and SynthID watermarking on every clip.
Open Morphic, switch the prompt bar to Video mode, and pick Gemini Omni from the model picker. Attach text, an image, an audio clip, a video, or any mix, then run the prompt. To revise the result, ask in the next message and the scene keeps the prior context.
No. Gemini Omni's output is video. The model accepts images as one of its input modalities alongside text, audio, and video, but the generated result is a video clip. Google has said image and audio output are planned future additions to the Gemini Omni family.
Gemini Omni Flash clips are capped at 10 seconds at launch. Google has framed the cap as a deployment decision rather than a model constraint, so longer Gemini Omni durations are possible in future releases.
Gemini Omni accepts text, images, audio, and video in any combination within a single prompt. Voice references are the first audio input supported at launch; broader audio inputs are on the roadmap.
Veo 3.1 is Google DeepMind's photorealistic video model with 4K resolution, native audio synthesis, and 8-second clips, tuned for broadcast-quality realism. Gemini Omni Flash is the any-to-any sibling, capped at 10 seconds, focused on multi-input reasoning, conversational editing, and persistent character consistency across edits.
Both Gemini Omni and Seedance 2.0 are multimodal video models. Seedance 2.0 accepts up to 12 mixed assets per generation with native audio synthesis and music beat sync at 1080p, 4 to 15 seconds. Gemini Omni Flash focuses on conversational, turn-by-turn editing and on Google's physics and real-world reasoning, currently capped at 10 seconds.
SynthID is Google's imperceptible watermark for AI-generated content. Every video Gemini Omni produces carries it by default. The watermark is invisible to viewers and survives common edits like re-encoding and resizing, so AI-generated material stays identifiable through the production chain.
Google announced Gemini Omni at Google I/O 2026 on May 19, 2026. Gemini Omni Flash is the first release in the family, with image and audio output described as planned future additions.




