How do I get the best results from Happy Horse 1.1?

Name the audio in every prompt, since Happy Horse 1.1 generates sound with the video. Describe motion rather than a still frame, and give a shot type with one camera move. For multi-character scenes, index each subject as character1, character2, and keep spoken lines short for clean lip-sync. Draft at 720p, then re-run the keeper at 1080p.

Does Happy Horse 1.1 generate audio?

Yes. Happy Horse 1.1 generates audio with the video in a single pass, so it stays in sync with the motion. A generation can include lip-synced dialogue, sound effects, ambience, and music, with native lip-sync across seven languages and no separate audio step.

How does reference-to-video work in Happy Horse 1.1?

Pass up to nine reference images and refer to each by index, as character1 through character9, matching the order you supply them. State which subject comes from which image, then describe the scene and action. Happy Horse 1.1 carries each subject into the new scene so a cast stays recognizable from shot to shot.

What resolutions, durations, and aspect ratios does Happy Horse 1.1 support?

Happy Horse 1.1 outputs 720p or 1080p in clips of 3 to 15 seconds, with a 5-second default. It supports nine aspect ratios, including 16:9, 9:16, and ultrawide 21:9, plus 9:21, 5:4, and 4:5. Choose the ratio first, since framing changes how you stage the action.

How do I use Happy Horse 1.1 on Morphic?

Open Morphic, switch the prompt bar to Video mode, and pick Happy Horse 1.1. Describe the scene, attach a still for image-to-video or up to nine reference images for reference-to-video, choose a resolution and aspect ratio, then run the prompt. Audio generates in the same pass.

Happy Horse 1.1: Complete Guide, Prompts & Features

Happy Horse 1.1 features and capabilities

Happy Horse 1.1 is Alibaba's video model, served on fal and available on Morphic. It generates video and audio together in a single pass, with native lip-sync across seven languages, and supports reference-to-video with up to nine subjects, nine aspect ratios, and 1080p output.

Feature	What it does	Best for
Joint audio and video	Generates the clip and its synchronized audio in one pass, with no separate audio step	Dialogue scenes, music clips, talking heads
Multilingual lip-sync	Speaks and lip-syncs across 7 languages, with mouth shapes that match the phonetics	Localized ads, multilingual presenters
Reference-to-video, up to 9	Carries up to nine reference subjects into a new scene, each called by index	Ensemble scenes, character-consistent series
Image-to-video	Animates a still first frame into a moving 1080p clip with audio	Product shots, key art, photo animation
Nine aspect ratios	Delivers from 16:9 and 9:16 to ultrawide 21:9, in nine ratios	Cinematic, vertical, and square delivery

Joint audio and video in one pass

Happy Horse generates the picture and its sound together rather than adding audio afterward. Spoken dialogue with lip-sync, ambient room tone, sound effects, and music all come out of the same generation, so motion and sound line up from the first frame. You describe the sound in the same prompt as the action.

Multilingual native lip-sync

The model speaks and lip-syncs across English, Mandarin, Cantonese, Japanese, Korean, German, and French. The mouth shapes follow the phonetics of the spoken language rather than being approximated, which makes it a fit for dialogue scenes and localized versions of the same shot.

Reference-to-video with up to 9 subjects

Pass up to nine reference images and refer to each by index in the prompt, as character1 through character9 matching the order you supply them. With up to nine subjects, a full cast can stay recognizable across shots. Describe each subject, then the scene and the action.

Image-to-video

Provide a still first frame, such as a product shot or a character frame, add a prompt describing the motion and the sound, and the model animates outward from that image while holding its lighting and detail. It also runs text-to-video when you have no starting image.

Nine aspect ratios

Deliver in nine ratios: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, 9:21, 5:4, and 4:5. The same prompt framework produces an ultrawide cinematic cut and a vertical social cut without a separate workflow per format.

Happy Horse 1.1 technical specs

Spec	Happy Horse 1.1
Provider	Alibaba (served on fal)
Modes	Text-to-video, image-to-video, reference-to-video
Audio	Native, synchronized, with multilingual lip-sync
Languages	7 (English, Mandarin, Cantonese, Japanese, Korean, German, French)
Resolution	720p or 1080p
Duration	3 to 15 seconds (default 5)
Aspect ratios	16:9, 9:16, 1:1, 4:3, 3:4, 21:9, 9:21, 5:4, 4:5
Reference images	Up to 9 (character1 to character9)
Prompt length	Up to 2,500 characters
Released	June 2026

A news anchor reads the evening headline, synced studio audioTry now

Happy Horse 1.1 use cases

Dialogue and talking-head scenes

Characters speak with synced lip movement, room tone, and timing, generated in one pass. Write the line in the prompt and the audio comes back with the motion.

Multi-character ensemble scenes

Carry up to nine subjects from reference images into a single scene, calling each by index so the whole cast stays recognizable from shot to shot.

Music videos and performance clips

Because video and audio generate together, motion lands on beat from the first pass. Build a performance clip with a score and synced movement in one generation.

Ultrawide cinematic cuts

Use the 21:9 ratio for a widescreen, cinematic frame, then deliver the same scene as a 9:16 vertical from the same prompt.

Multilingual ad localization

Keep the same scene and characters and swap the dialogue across languages with native lip-sync, so one treatment ships in several markets.

How to get the best out of Happy Horse 1.1

Happy Horse rewards a brief that names the motion and the sound together, and a clean set of reference images when characters have to stay consistent. A few practices carry most of the quality:

Always name the audio. Dialogue, sound effects, ambience, or music in plain language, so the model generates sound with the motion instead of a silent clip.
Write motion, not a photo. Describe how the subject and camera move over the clip, not just how the frame looks at a single instant.
Index your references. For reference-to-video, refer to each subject as character1, character2, and so on, matching the order you supply the reference images.
Keep lines short for clean lip-sync. For talking characters, use a front-facing frame with the mouth visible and keep each spoken line brief.
One beat per clip. Pack a single action into a few seconds rather than crowding several into one generation.
Pick the ratio up front. Choose 21:9 for a cinematic cut or 9:16 for vertical, since the framing changes how you stage the action.

Happy Horse 1.1 prompt guide

A strong prompt reads like a short shot brief, not a caption. Two things drive the result: a clear list of what the shot contains, and concrete wording in place of vague wording.

What goes in a prompt

Element	What to include	Example
Subject	Who or what is in frame, described concretely	a news anchor in a navy suit at a glass desk
Motion	What moves, and how	he turns to a second camera and gestures
Camera	Shot type plus one move	medium shot, slow push-in
Audio	Dialogue, effects, ambience, or music	he says, 'Good evening'; soft studio room tone
Format	Duration and aspect ratio	10 seconds, 16:9

Reference and dialogue syntax

For reference-to-video, refer to each subject as character1, character2, and so on, matching the order you supply the reference images. For timed dialogue, mark the spoken lines against the clip's timeline so the lip-sync lands where you want it.

Reference and timed dialogue

character1 and character2 sit across a café table, warm window light. 0-4s: character1 says in French, "Tu as vu ça?"; 4-8s: character2 laughs and replies, "Incroyable." Soft café ambience, gentle handheld.

Edit prompt

Weak vs strong prompts

Name the camera, the motion and its timing, and the audio rather than leaving them to chance.

Focus	Weak	Strong
Camera	A woman in a city at night	Handheld tracking shot following a woman through rain-slicked streets, shop lights reflecting on the pavement, shallow depth of field
Motion and timing	The door opens and someone walks in	The door swings open slowly, a figure steps through after a beat, then the camera settles into a medium shot
Audio	A chef plating a dish	Close-up of a chef plating a dish, steam rising. Audio: pan sizzle, soft kitchen ambience, and 'Service.'

Common mistakes

Leaving the prompt silent: always write at least one sound cue, since the model generates audio with the video.
Vague camera: "cinematic" tells the model nothing; name the shot and the move.
Unindexed references: for reference-to-video, label each subject as character1, character2, rather than "use these references."
Too much in one clip: keep one action per clip, and keep spoken lines short for clean lip-sync.

Happy Horse 1.1: complete guide, prompts, and features