Happy Horse 1.1: complete guide, prompts, and features

Happy Horse 1.1: complete guide, prompts, and features

The complete Happy Horse 1.1 guide on Morphic: what Alibaba's joint audio-video model does, its specs, native audio and lip-sync, reference-to-video with up to 9 subjects, and prompting with examples.

Happy Horse 1.1 features and capabilities

Happy Horse 1.1 is Alibaba's video model, served on fal and available on Morphic. It generates video and audio together in a single pass, with native lip-sync across seven languages, and supports reference-to-video with up to nine subjects, nine aspect ratios, and 1080p output.

FeatureWhat it doesBest for
Joint audio and videoGenerates the clip and its synchronized audio in one pass, with no separate audio stepDialogue scenes, music clips, talking heads
Multilingual lip-syncSpeaks and lip-syncs across 7 languages, with mouth shapes that match the phoneticsLocalized ads, multilingual presenters
Reference-to-video, up to 9Carries up to nine reference subjects into a new scene, each called by indexEnsemble scenes, character-consistent series
Image-to-videoAnimates a still first frame into a moving 1080p clip with audioProduct shots, key art, photo animation
Nine aspect ratiosDelivers from 16:9 and 9:16 to ultrawide 21:9, in nine ratiosCinematic, vertical, and square delivery

Joint audio and video in one pass

Happy Horse generates the picture and its sound together rather than adding audio afterward. Spoken dialogue with lip-sync, ambient room tone, sound effects, and music all come out of the same generation, so motion and sound line up from the first frame. You describe the sound in the same prompt as the action.

Multilingual native lip-sync

The model speaks and lip-syncs across English, Mandarin, Cantonese, Japanese, Korean, German, and French. The mouth shapes follow the phonetics of the spoken language rather than being approximated, which makes it a fit for dialogue scenes and localized versions of the same shot.

Reference-to-video with up to 9 subjects

Pass up to nine reference images and refer to each by index in the prompt, as character1 through character9 matching the order you supply them. With up to nine subjects, a full cast can stay recognizable across shots. Describe each subject, then the scene and the action.

Image-to-video

Provide a still first frame, such as a product shot or a character frame, add a prompt describing the motion and the sound, and the model animates outward from that image while holding its lighting and detail. It also runs text-to-video when you have no starting image.

Nine aspect ratios

Deliver in nine ratios: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, 9:21, 5:4, and 4:5. The same prompt framework produces an ultrawide cinematic cut and a vertical social cut without a separate workflow per format.

Happy Horse 1.1 technical specs

SpecHappy Horse 1.1
ProviderAlibaba (served on fal)
ModesText-to-video, image-to-video, reference-to-video
AudioNative, synchronized, with multilingual lip-sync
Languages7 (English, Mandarin, Cantonese, Japanese, Korean, German, French)
Resolution720p or 1080p
Duration3 to 15 seconds (default 5)
Aspect ratios16:9, 9:16, 1:1, 4:3, 3:4, 21:9, 9:21, 5:4, 4:5
Reference imagesUp to 9 (character1 to character9)
Prompt lengthUp to 2,500 characters
ReleasedJune 2026

Happy Horse 1.1 use cases

Dialogue and talking-head scenes

Characters speak with synced lip movement, room tone, and timing, generated in one pass. Write the line in the prompt and the audio comes back with the motion.

Multi-character ensemble scenes

Carry up to nine subjects from reference images into a single scene, calling each by index so the whole cast stays recognizable from shot to shot.

Music videos and performance clips

Because video and audio generate together, motion lands on beat from the first pass. Build a performance clip with a score and synced movement in one generation.

Ultrawide cinematic cuts

Use the 21:9 ratio for a widescreen, cinematic frame, then deliver the same scene as a 9:16 vertical from the same prompt.

Multilingual ad localization

Keep the same scene and characters and swap the dialogue across languages with native lip-sync, so one treatment ships in several markets.

How to get the best out of Happy Horse 1.1

Happy Horse rewards a brief that names the motion and the sound together, and a clean set of reference images when characters have to stay consistent. A few practices carry most of the quality:

  • Always name the audio. Dialogue, sound effects, ambience, or music in plain language, so the model generates sound with the motion instead of a silent clip.
  • Write motion, not a photo. Describe how the subject and camera move over the clip, not just how the frame looks at a single instant.
  • Index your references. For reference-to-video, refer to each subject as character1, character2, and so on, matching the order you supply the reference images.
  • Keep lines short for clean lip-sync. For talking characters, use a front-facing frame with the mouth visible and keep each spoken line brief.
  • One beat per clip. Pack a single action into a few seconds rather than crowding several into one generation.
  • Pick the ratio up front. Choose 21:9 for a cinematic cut or 9:16 for vertical, since the framing changes how you stage the action.

Happy Horse 1.1 prompt guide

A strong prompt reads like a short shot brief, not a caption. Two things drive the result: a clear list of what the shot contains, and concrete wording in place of vague wording.

What goes in a prompt

ElementWhat to includeExample
SubjectWho or what is in frame, described concretelya news anchor in a navy suit at a glass desk
MotionWhat moves, and howhe turns to a second camera and gestures
CameraShot type plus one movemedium shot, slow push-in
AudioDialogue, effects, ambience, or musiche says, 'Good evening'; soft studio room tone
FormatDuration and aspect ratio10 seconds, 16:9

Reference and dialogue syntax

For reference-to-video, refer to each subject as character1, character2, and so on, matching the order you supply the reference images. For timed dialogue, mark the spoken lines against the clip's timeline so the lip-sync lands where you want it.

Reference and timed dialogue

character1 and character2 sit across a café table, warm window light. 0-4s: character1 says in French, "Tu as vu ça?"; 4-8s: character2 laughs and replies, "Incroyable." Soft café ambience, gentle handheld.

Weak vs strong prompts

Name the camera, the motion and its timing, and the audio rather than leaving them to chance.

FocusWeakStrong
CameraA woman in a city at nightHandheld tracking shot following a woman through rain-slicked streets, shop lights reflecting on the pavement, shallow depth of field
Motion and timingThe door opens and someone walks inThe door swings open slowly, a figure steps through after a beat, then the camera settles into a medium shot
AudioA chef plating a dishClose-up of a chef plating a dish, steam rising. Audio: pan sizzle, soft kitchen ambience, and 'Service.'

Common mistakes

  • Leaving the prompt silent: always write at least one sound cue, since the model generates audio with the video.
  • Vague camera: "cinematic" tells the model nothing; name the shot and the move.
  • Unindexed references: for reference-to-video, label each subject as character1, character2, rather than "use these references."
  • Too much in one clip: keep one action per clip, and keep spoken lines short for clean lip-sync.

FAQs

How do I get the best results from Happy Horse 1.1?

Name the audio in every prompt, since Happy Horse 1.1 generates sound with the video. Describe motion rather than a still frame, and give a shot type with one camera move. For multi-character scenes, index each subject as character1, character2, and keep spoken lines short for clean lip-sync. Draft at 720p, then re-run the keeper at 1080p.

Does Happy Horse 1.1 generate audio?

Yes. Happy Horse 1.1 generates audio with the video in a single pass, so it stays in sync with the motion. A generation can include lip-synced dialogue, sound effects, ambience, and music, with native lip-sync across seven languages and no separate audio step.

How does reference-to-video work in Happy Horse 1.1?

Pass up to nine reference images and refer to each by index, as character1 through character9, matching the order you supply them. State which subject comes from which image, then describe the scene and action. Happy Horse 1.1 carries each subject into the new scene so a cast stays recognizable from shot to shot.

What resolutions, durations, and aspect ratios does Happy Horse 1.1 support?

Happy Horse 1.1 outputs 720p or 1080p in clips of 3 to 15 seconds, with a 5-second default. It supports nine aspect ratios, including 16:9, 9:16, and ultrawide 21:9, plus 9:21, 5:4, and 4:5. Choose the ratio first, since framing changes how you stage the action.

How do I use Happy Horse 1.1 on Morphic?

Open Morphic, switch the prompt bar to Video mode, and pick Happy Horse 1.1. Describe the scene, attach a still for image-to-video or up to nine reference images for reference-to-video, choose a resolution and aspect ratio, then run the prompt. Audio generates in the same pass.

Simple pricing

Get started for free today, with the option to upgrade or cancel anytime.

Basic

$0/ month
billed as $0 per year

900 monthly credits

1 user only

All models

Workflows

Standard

$0/ month
billed as $0 per year

3200 monthly credits

1 user only

All models

Workflows

Pro

$0/ month
billed as $0 per year

6200 shared monthly credits

1 user

+ up to 4 more at extra cost

All models

Workflows

Pro Max

$0/ month
billed as $0 per year

24000 shared monthly credits

1 user

+ up to 9 more at extra cost

All models

Workflows

Enterprise

For higher limits

Custom

pricing and billing terms

Unlimited credits
Custom seat limits
All models
Workflows
Pricing Gradient

Free

For playing around

$0

forever free

Up to 20 credits
1 user only
Limited models
Workflows