Video generation

Available now

Happy Horse 1.0

by Alibaba

Alibaba's video model that generates video and synced audio in one pass, with native lip‑sync across 7 languages.

Happy Horse 1.0

by Alibaba

Key features

Joint audio and video in one pass

Synchronized video and audio from one unified transformer. Lip movement, ambient sound, and music land in sync from frame one.

Multilingual native lip-sync

Speaks and lip-syncs across English, Mandarin, Cantonese, Japanese, Korean, German, and French. Mouth shapes match phonetics.

Reference-driven control

Feed up to 5 reference images to lock characters, wardrobe, scene layout, or style. The model reproduces what you show with fidelity.

Video editing with natural language

Edit existing video with text. Replace characters, swap scenes, adjust details, or apply transformations without re-rendering.

#1 on the Artificial Analysis Video Arena

Top Elo on the Artificial Analysis Video Arena in both Text-to-Video and Image-to-Video at launch, voted by blind human preference.

1080p output at 24 fps

Full HD at cinematic frame rate with stable motion, realistic body dynamics, and clean multi-character interactions.

Two friends laughing at a Paris café, French dialogue, golden-hour window lightTry now

Technical specifications

1080p

Full HD output at 24 fps

3–15s

Per generation

16:9, 9:16, 1:1, 4:3, 3:4

Native lip-sync across 7 languages

Up to 5

For reference-to-video and video editing

15B

Unified 40-layer transformer

Use cases

Dialogue-driven scenes

Scenes where characters speak in any of 7 languages with synced lip movement, ambient sound, and timing.

Music videos and performance clips

Video and audio generated together means motion lands on beat from the first pass, no manual sync work needed.

Ad and campaign spots

Reference-driven control keeps product, talent, and brand visuals consistent across multiple shots.

Character-consistent storytelling

Lock in characters with reference images and carry them across multiple scenes for narrative video work.

Multilingual content localization

Same scene, same characters, dialogue swapped across languages with native lip-sync, suited for global campaigns.

Video editing without full re-renders

Adjust details, swap elements, or restyle existing footage through text instructions instead of starting over.

Prompt examples

Dialogue scene

Two friends laughing in a Paris café, French dialogue, handheld

Edit prompt

Performance clip

Cellist on a rooftop at sunset, sweeping orchestral score

Edit prompt

Product spot

Sneakers spin on glossy floor, hip-hop beat, macro lens

Edit prompt

Simple pricing

Get started for free today, with the option to upgrade or cancel anytime.

Basic

/ month

billed as $0 per year

900 monthly credits

1 user only

All models

Workflows

Standard

/ month

billed as $0 per year

3200 monthly credits

1 user only

All models

Workflows

Pro

/ month

billed as $0 per year

6200 shared monthly credits

1 user

+ up to 4 more at extra cost

All models

Workflows

Pro Max

/ month

billed as $0 per year

24000 shared monthly credits

1 user

+ up to 9 more at extra cost

All models

Workflows

Enterprise

For higher limits

Custom

pricing and billing terms

High-volume credits

Custom seat limits

All models

Workflows

Free

For playing around

forever free

Up to 20 credits

1 user only

Limited models

Workflows

Compare plan details

FAQs

What is Happy Horse 1.0?

Happy Horse 1.0 is Alibaba's video generation model from the Taotian Future Life Lab, released April 2026. It generates video and synchronized audio together in a single pass and held the #1 Elo on the Artificial Analysis Video Arena at launch.

Which languages does Happy Horse support for lip-sync?

Seven languages with native lip-sync: English, Mandarin, Cantonese, Japanese, Korean, German, and French.

How long can a Happy Horse 1.0 video be?

Each generation is 3 to 15 seconds at 1080p, across five aspect ratios including 16:9, 9:16, and 1:1.

How is it different from Seedance 2.0 or Veo 3?

Happy Horse generates video and audio jointly in one pass with native multilingual lip-sync across 7 languages. Seedance 2.0 emphasizes multimodal inputs and music beat sync. Veo focuses on cinematic photorealism.

Can I edit existing video with Happy Horse?

Yes. The video-edit endpoint accepts natural language instructions and up to 5 reference images to modify existing footage without a full re-render.

How do I use Happy Horse 1.0 on Morphic?

Open Copilot, describe your scene (and optionally attach a still or audio reference), and select Happy Horse 1.0. Because it generates synchronized audio in the same pass, you don't need a separate audio model, the model returns a clip with native lip-sync and sound.

Happy Horse 1.0

Key features

Joint audio and video in one pass

Multilingual native lip-sync

Reference-driven control

Video editing with natural language

#1 on the Artificial Analysis Video Arena

1080p output at 24 fps

Technical specifications

Use cases

Dialogue-driven scenes

Music videos and performance clips

Ad and campaign spots

Character-consistent storytelling

Multilingual content localization

Video editing without full re-renders

Prompt examples

Dialogue scene

Performance clip

Product spot

Simple pricing

FAQs

More about Happy Horse 1.0

Other models