Gemini Omni by Google on Morphic

Use Gemini Omni by Google on Morphic. Google's any-to-any AI model accepts text, images, audio, and video in one prompt, generated as video, with conversational editing, character consistency, accurate physics, and SynthID watermarking.

Gemini Omni by Google on Morphic

Drop in a portrait reference, a location photo, a voice sample, and a one-line beat. Gemini Omni reads all four together and generates a single cohesive video, then keeps editing the same scene through conversation.

How to use Gemini Omni on Morphic

1.

Open Video mode in Morphic

From the prompt bar at the bottom on Morphic, switch to Video mode, then pick Gemini Omni by Google from the model picker.

2.

Pick Gemini Omni in the model picker

Open the model picker and choose Gemini Omni from the video models list. The first available release is Gemini Omni Flash, the entry point to Google's Omni family.

3.

Drop in your inputs

Attach the references you want Gemini Omni to combine: text, an image, an audio file, a video clip, or any mix. The model reasons across every input together rather than stitching them, so the final shot reflects each reference at once.

4.

Generate, then keep editing in conversation

Run the prompt. Gemini Omni produces a clip of up to 10 seconds. To change a costume, swap a setting, or retime an action, ask in the next message. The scene remembers what came before, so edits land on the existing shot.

What is Gemini Omni?

Gemini Omni is Google's first any-to-any multimodal model, announced at Google I/O 2026 on May 19, 2026. The first release, Gemini Omni Flash, accepts text, images, audio, and video as input and generates video as output, with conversational editing, character consistency, and SynthID watermarking on every clip. Google has described image and audio output as future additions to the Gemini Omni family.

On Morphic, Gemini Omni sits in the video model picker alongside Veo 3.1, Seedance 2.0, Kling, and the rest of the video catalog.

Gemini Omni features and capabilities

Gemini Omni any-to-any input

A single Gemini Omni prompt accepts text, images, audio, and video at the same time. Instead of stitching the inputs sequentially, the model reasons across them as one brief, so a portrait reference, a location photo, a voice sample, and a written beat all shape the same generated shot. Voice references are the first audio input supported at launch; broader audio inputs are on the roadmap.

Conversational Gemini Omni editing

Every instruction in Gemini Omni builds on the last. Change a costume, swap a background, retime an action, or extend the scene by describing it in the next message. The shot remembers what came before, so edits land on the existing scene rather than starting a new generation.

Gemini Omni character and scene consistency

Characters introduced in one Gemini Omni shot keep their face, clothing, and voice across cuts and across follow-up edits in the same conversation. The model also holds lighting and continuity between turns, so a character introduced in shot one is still recognizable in shot three.

Physics-accurate motion and real-world reasoning

Gemini Omni applies an understanding of physics, culture, history, and science to the scenes it generates. Gravity, weight, collisions, and fluid behavior follow real-world rules; historical and cultural detail holds rather than drifting into generic AI texture. The result is footage where motion looks correct, not just smooth.

Voice references for consistent on-screen voices

Provide a short voice sample alongside text and images, and Gemini Omni keeps the voice consistent in the generated video. Useful for avatar-led explainers, branded spokesperson clips, and short-form social content where the same presenter appears across multiple videos.

SynthID watermark on every Gemini Omni video

Every clip Gemini Omni produces carries Google's imperceptible SynthID digital watermark for AI provenance. The watermark is invisible to viewers and survives common transforms like re-encoding and resizing, so AI-generated material stays identifiable down the chain.

FAQs

What is Gemini Omni?

Gemini Omni is Google's first any-to-any multimodal model. The first release, Gemini Omni Flash, accepts text, images, audio, and video as input and produces video as output, with conversational editing, character consistency, accurate physics, and SynthID watermarking on every clip.

How do I use Gemini Omni on Morphic?

Open Morphic, switch the prompt bar to Video mode, and pick Gemini Omni from the model picker. Attach text, an image, an audio clip, a video, or any mix, then run the prompt. To revise the result, ask in the next message and the scene keeps the prior context.

Is Gemini Omni an image model?

No. Gemini Omni's output is video. The model accepts images as one of its input modalities alongside text, audio, and video, but the generated result is a video clip. Google has said image and audio output are planned future additions to the Gemini Omni family.

How long can Gemini Omni videos be?

Gemini Omni Flash clips are capped at 10 seconds at launch. Google has framed the cap as a deployment decision rather than a model constraint, so longer Gemini Omni durations are possible in future releases.

What inputs does Gemini Omni accept?

Gemini Omni accepts text, images, audio, and video in any combination within a single prompt. Voice references are the first audio input supported at launch; broader audio inputs are on the roadmap.

How does Gemini Omni compare to Veo 3.1?

Veo 3.1 is Google DeepMind's photorealistic video model with 4K resolution, native audio synthesis, and 8-second clips, tuned for broadcast-quality realism. Gemini Omni Flash is the any-to-any sibling, capped at 10 seconds, focused on multi-input reasoning, conversational editing, and persistent character consistency across edits.

How does Gemini Omni compare to Seedance 2.0?

Both Gemini Omni and Seedance 2.0 are multimodal video models. Seedance 2.0 accepts up to 12 mixed assets per generation with native audio synthesis and music beat sync at 1080p, 4 to 15 seconds. Gemini Omni Flash focuses on conversational, turn-by-turn editing and on Google's physics and real-world reasoning, currently capped at 10 seconds.

What is SynthID and why does Gemini Omni include it?

SynthID is Google's imperceptible watermark for AI-generated content. Every video Gemini Omni produces carries it by default. The watermark is invisible to viewers and survives common edits like re-encoding and resizing, so AI-generated material stays identifiable through the production chain.

When was Gemini Omni announced?

Google announced Gemini Omni at Google I/O 2026 on May 19, 2026. Gemini Omni Flash is the first release in the family, with image and audio output described as planned future additions.

chair
Bring your stories to life
No downloads, no installs. Join a growing community of creatives using Morphic to transform ideas into beautifully crafted stories.