Seed Audio 1.0: the complete guide

Seed Audio 1.0: the complete guide

Learn how to use Seed Audio 1.0: generate voice, music, and sound effects in one pass, write better prompts, clone voices, and edit audio in place.

Hear Seed Audio 1.0

Documentary narration

Speech, warm and measured

Thriller voice-over

Speech, hushed and tense

Spice-market ambience

Sound effects, open-air bed

Thunderstorm

Sound effects, storm to a clap

Orchestral cue

Music, rising strings and brass

Lo-fi beat

Music, soft keys and vinyl

Seed Audio 1.0 use cases

One-pass video audio

Give a video clip its narration, sound design, and music in one generation. Describe the scene, who speaks, what happens, and the mood, and the model handles the full audio track.

A cinematic film still: a lone figure with an umbrella on a rain-slicked street at dusk

Narrated explainers and tutorials

A composed voice with room tone and a light music bed in one output. The narration carries the content, and the model fills the acoustic space so it sounds placed and finished.

Over-the-shoulder shot of hands truing a bicycle wheel on a workbench in soft window light

Short ads and promos

Spoken line, sound effects, and music as one ready-to-use track. Write the timing into the prompt, and the model hits the beat on the right word and fades the music on cue.

A single running shoe caught mid-air over a sunlit track lane at golden hour

Scripted dialogue and audio drama

Multi-character scenes with distinct voices, accurate emotional delivery, and matching ambience, all in a single prompt. Write the script, label the speakers, and the model casts and directs.

Two people mid-conversation across a small cafe table by a rain-streaked window

Consistent voice across a series

Clone a character or narrator voice from a reference clip and carry it across every episode or chapter. Voice consistency across hours of content from a single short sample.

A cozy home recording nook with a studio microphone lit by a warm key light

Audio editing and repair

Extend a take, fill a gap, swap a line, or stitch two segments. The same model that generates original audio handles revision without re-recording the whole track.

An audio-editing workspace with a glowing waveform timeline on a dark monitor

How to write a Seed Audio 1.0 prompt

A strong prompt reads like a short scene brief, not a text-to-speech line, so the model fits voice, music, and effects into one scene. Run through SPACE before you send.

SPACEIncludeExample
SpeakerVoice character, age, emotionCalm male narrator, mid-30s, warm
PhrasingThe exact line, in quotes'Combine the flour and the butter.'
AmbienceAcoustic space and backgroundSoft kitchen ambience, a low oven-fan hum
CompositionMusic mood, genre, or tempoLight acoustic guitar, under the voice
Extra cuesTiming, effects, transitionsA brief chime at the end, then silence

Two habits separate strong prompts from generic ones: name the setting, since with no place the model defaults to flat room tone, and cue music timing, where "fades in after the first line" beats a bare "upbeat music."

Voice cloning with Seed Audio 1.0

Zero-shot voice cloning works from up to three reference clips of about 30 seconds each, with no training. Prepare clips against the CLEAR checklist:

  • Clean recording, with little background noise
  • Length under 30 seconds per clip
  • Emotion aligned to the delivery you want
  • Accent consistent within each clip
  • Room tone steady across clips

The model reads the vocal character and carries it across the whole generation.

With no clip, describe the voice in text, giving age, accent, and pace rather than "nice" or "professional." A character image also works: the model derives a matching voice from apparent age and character, useful for fictional or animated speakers.

How to use Seed Audio 1.0

Getting a finished track takes four steps, and none of them need a separate editor.

  1. Write the scene brief. Describe who speaks, what they say, the setting, and the mood, following the SPACE checklist above.
  2. Set the voice. Clone it from a short reference clip, or define it with a text description or a character image.
  3. Generate. One pass returns the voice, music, and sound effects together, already mixed, up to two minutes long.
  4. Refine in place. Extend the clip, swap a line, or fill a gap with the editing modes, with no re-recording.

FAQs

What is audio inpainting in Seed Audio 1.0?

Inpainting fills a gap between two existing audio segments without re-generating the content around it. You provide the surrounding audio as context, and the model generates only the missing part, matched in voice character and acoustic space to what surrounds it.

What languages does Seed Audio 1.0 support?

English and Chinese at launch, with broader language support planned. For voice cloning, matching the reference clip language to the output language gives the most consistent result.

Can Seed Audio 1.0 edit existing audio?

Yes. Beyond generating from scratch, the same model extends a clip, fills a gap, swaps a single line, or stitches two takes into one continuous piece, so you can revise a track without re-recording it.

Can Seed Audio 1.0 generate multiple speakers at once?

Yes. Label each line in the prompt, for example Host: ... and Guest: ..., and the model gives each speaker a distinct voice, emotion, and pacing in a single generation. Define additional voices by reference clip, text description, or character image.

How long can a Seed Audio 1.0 generation be?

Up to two minutes in a single pass. For longer productions, continuation mode extends the output while preserving voice character, musical style, and consistency with what came before.

Is Seed Audio 1.0 different from text-to-speech?

Significantly. Text-to-speech produces one voice track from written text. Seed Audio 1.0 generates the full scene, the voice, background music, and sound effects together in one output, with editing tools to revise specific sections afterward. The difference in scope is the entire audio production versus only the voice.