Audio generation
Available now

Gemini 3.1 Flash TTS

by Google DeepMind

Google's most expressive text-to-speech, with audio tags and multi-speaker dialogue.

Gemini 3.1 Flash TTS

Key features

Technical specifications

Multilingual

Style, pace, and accent control across many languages

Up to 2

Two distinct voices in one multi-speaker generation

Audio tags

Natural-language notes plus inline bracket cues

SynthID

Imperceptible AI-provenance watermark on output

Use cases

Video narration and voice-over

Add natural narration to AI or live-action video, with the tone and pacing set in plain language.

Character dialogue

Voice two-speaker scenes for shorts, games, and explainers, each character with its own voice.

Localized voice-over

Narrate the same script across many languages with native pacing and accent control.

Audiobook and long-form

Keep delivery natural and consistent across long passages of narration.

Explainers and tutorials

Clear, directable narration for product walkthroughs, lessons, and how-tos.

Ad reads and promos

Expressive, on-brand voice reads with the energy and emphasis you direct.

Prompt examples

Warm narration

Say this warmly and slowly, like comforting a child: The storm has passed. You're safe now.

Edit prompt

Inline reaction

I can't believe you did that [laughs]. Best surprise all year.

Edit prompt

Whisper to normal

[whispering] Don't make a sound. [normal voice] Okay, we're clear.

Edit prompt

Accent control

Read this in a British accent: Lovely weather we're having, isn't it?

Edit prompt

Dramatic pacing

Read this slowly and deliberately: Every. Word. Matters.

Edit prompt

Two-speaker scene

Maya: Did you hear back about the job? Tom: I did. I start Monday.

Edit prompt

Simple pricing

Get started for free today, with the option to upgrade or cancel anytime.

Basic

$0/ month
billed as $0 per year

500 monthly credits

1 user only

All models

Workflows

Standard

$0/ month
billed as $0 per year

2800 monthly credits

1 user only

All models

Workflows

Pro

$0/ month
billed as $0 per year

6000 shared monthly credits

1 user

+ up to 4 more at extra cost

All models

Workflows

Pro Max

$0/ month
billed as $0 per year

24000 shared monthly credits

1 user

+ up to 9 more at extra cost

All models

Workflows

Enterprise

For higher limits

Custom

pricing and billing terms

Unlimited credits
Custom seat limits
All models
Workflows
Pricing Gradient

Free

For playing around

$0

forever free

Up to 20 credits
1 user only
Limited models
Workflows

FAQs

What is Gemini 3.1 Flash TTS?
Gemini 3.1 Flash TTS is Google's text-to-speech model, announced on April 15, 2026. It produces expressive, natural narration that you direct with plain-language instructions and inline audio tags, supports multi-speaker dialogue, and watermarks every clip with SynthID.
What can I create with it on Morphic?
Use Gemini 3.1 Flash TTS for voice-over, narration, character dialogue, localized reads, and expressive ad reads. Generate the audio on Morphic, then drop it into Canvas alongside your video clips in the same workflow.
How do I direct the voice?
Two ways, and you can combine them. Write a plain-language instruction before your line, like 'Say this warmly and slowly:', and add inline cues in square brackets, like [laughs] or [whispering], where you want them. Gemini performs the cue instead of reading it aloud.
Does it support multiple speakers?
Yes. Gemini 3.1 Flash TTS can voice a back-and-forth between two speakers in a single generation, giving each speaker a distinct voice. Label each line with the speaker's name and assign a voice to each one before you generate.
How many languages does it support?
Gemini 3.1 Flash TTS narrates across many languages, with control over accent, pacing, and style in each. Pick the voice and language that suit your script before generating.
How is it different from ElevenLabs on Morphic?
Both produce human-quality voice on Morphic. ElevenLabs is a full audio suite spanning speech, music, and sound effects with fine voice-tuning controls. Gemini 3.1 Flash TTS focuses on expressive, directable speech, with plain-language direction, inline audio tags, and multi-speaker dialogue. Many creators use both, one for voice, the other for music and effects.
Does it watermark the audio?
Yes. Every clip generated by Gemini 3.1 Flash TTS carries Google's imperceptible SynthID watermark for AI provenance. It is inaudible to listeners and built to survive common edits like re-encoding.
How do I use Gemini 3.1 Flash TTS on Morphic?
Open Morphic, switch the prompt bar to Audio, and choose Speech. Pick Gemini 3.1 Flash TTS as the audio model, write your script with any direction or tags, choose a voice and language, then generate.