Happy Horse 1.0 is the #1 ranked AI video model on the Artificial Analysis Video Arena as of April 2026. It generates video and synchronized audio from text, images, or visual references in a single pass, with native multilingual lip-sync. This guide walks you through creating your first video with Happy Horse 1.0 on Morphic.
How to use Happy Horse 1.0 on Morphic
1.
Open a file in Morphic
Start a new file or open an existing one inside any project. Happy Horse 1.0 generations live in the same canvas as the rest of your work, so you can iterate against your references without switching tools.
2.
Switch to video mode
Switch the prompt bar to Video, then pick Text/Image to Video, Frames to Video, or Lip Sync.

3.
Select Happy Horse from the model picker
Click the model selector and choose "Happy Horse." Morphic exposes Happy Horse 1.0 alongside other video models, so you can swap and compare without leaving the file.

4.
Write your prompt and generate
Write your prompt, optionally drop in visual references for character or style consistency, and click "Generate." Happy Horse 1.0 produces high-resolution video with synchronized audio in roughly half a minute. Preview and download from Morphic, or keep editing in the same file.
What Happy Horse 1.0 can do
Happy Horse 1.0 is a 15-billion-parameter unified Transformer model built by Alibaba. It produces video and audio together from a single prompt, which sets it apart from models that require separate pipelines for visuals and sound.
Joint audio and video in Happy Horse 1.0
Happy Horse produces video and audio (dialogue, ambient sound, Foley effects) in a single forward pass. There is no need for a separate dubbing or sound design step. It supports native lip-sync in seven languages:
- English
- Mandarin
- Cantonese
- Japanese
- Korean
- German
- French
Happy Horse 1.0 camera control
The model is unusually responsive to camera language. Specific terms produce distinct, reliable results:
| Camera cue | What it produces | Best for |
|---|---|---|
| Steadicam push | Smooth forward movement | Walking scenes, reveals |
| Slow dolly-in | Gradual zoom from medium to close | Emotional moments, product focus |
| Lateral orbit | Side-to-side rotation with parallax | Product showcases, architecture |
| Helicopter aerial | Bird's-eye sweeping shot | Landscapes, establishing shots |
| Locked-off framing | Static camera, subject moves | Dialogue, interviews |
| Tracking shot | Camera follows subject movement | Action, walking sequences |
Put the camera cue at the end of your prompt. That is where Happy Horse 1.0 gives it the most weight.
Happy Horse 1.0 motion consistency
Subjects stay stable throughout the clip:
- Faces do not drift or deform mid-shot
- Products maintain their shape and proportions
- Gait and body movement stay natural across the full duration
This makes Happy Horse 1.0 particularly reliable for anything that requires subject fidelity, from product demos to character-driven scenes.
Multi-shot storytelling in Happy Horse 1.0
Happy Horse 1.0 is the only AI video model with native multi-shot support. From a single prompt, it can:
- Generate coherent scene sequences with multiple beats
- Maintain persistent character identity across shots
- Synchronize audio across the full sequence
- Follow shot lists with timecodes (e.g., Shot 1 wide 0-1s, Shot 2 mid tracking 1-4s)
Happy Horse 1.0 modes and output specs
| Mode | What it does | Best for |
|---|---|---|
| Text-to-video | Generate video from a written description | Starting from scratch, creative concepts |
| Image-to-video | Animate a still image with motion | Product photos, existing artwork, portraits |
| Lip sync | Generate or apply lip-synced dialogue to video | Multilingual voiceover, talking-head content |
| Visual references | Use reference images for style and character consistency | Maintaining a consistent look across clips |
| Spec | Details |
|---|---|
| Resolution | Up to 1080p |
| Clip duration | 5 to 8 seconds |
| Aspect ratios | 16:9, 9:16, 4:3, 21:9, 1:1 |
| Audio | Native joint generation (dialogue, Foley, ambient) |
| Generation speed | Roughly half a minute for 1080p |
Prompting tips for Happy Horse 1.0
| Do | Don't |
|---|---|
| Start with your subject, action, and setting, then add one camera or lighting cue. Shorter prompts produce cleaner results, especially for single-character scenes. | Stack five cinematography cues in one prompt. They cancel each other out. |
| Use concrete lighting terms ("warm amber backlight," "overcast daylight," "sodium vapor street lamps"). | Use vague modifiers like "good lighting" or "beautiful atmosphere." |
| Put camera direction at the end of the prompt, where it gets the most weight. | Use slop words: stunning, epic, breathtaking, masterpiece, hyperrealistic, ultra detailed. |
| Use shot lists with timecodes for multi-beat scenes (e.g., Shot 1 wide 0-1s, Shot 2 mid tracking 1-4s). | Write multi-step action as plain prose ("first X, then Y, then Z"). It compresses into one motion. |
| Describe a visual style in technique terms ("backlit silhouette, soft natural haze, cool desaturated palette"). | Drop a director or DP name alone ("Roger Deakins cinematography") and expect it to carry the look. |
Happy Horse 1.0 responds best to plain English prose. In testing, structured formats like booru tags, JSON, and weighted parentheses underperformed against the same content written as a sentence. For longer prompts, shot lists with timecodes or markdown sections (Subject, Action, Setting, Camera, Lighting, Mood) help the model parse each element cleanly.
Example Happy Horse 1.0 prompts
A woman in a red coat walks through a rain-soaked Tokyo alley at night, neon signs reflecting off wet pavement, slow steadicam push forward.
Camera slowly orbits the subject, wind catches their hair, warm golden hour backlight, ambient city noise.
Shot 1 (0-2s): Wide shot of a coffee shop interior, morning light through windows, ambient chatter. Shot 2 (2-5s): Mid tracking shot follows a barista preparing a latte, sound of steaming milk. Shot 3 (5-8s): Close-up of the finished latte placed on the counter, soft piano in background.
For a deeper dive on prompting, see our complete Happy Horse 1.0 guide.
Frequently asked questions
Happy Horse 1.0 is available on Morphic alongside other leading video models. Open a file in any project, switch the prompt bar to "Video," and select "Happy Horse" from the model menu.
No. Morphic runs in your browser. There is no software to download and no technical setup required.
Seven: English, Mandarin, Cantonese, Japanese, Korean, German, and French. Specify the dialogue language directly in your prompt.
Happy Horse 1.0 generates clips of five to eight seconds. For longer sequences, combine multiple clips using Morphic's editing tools, or use the multi-shot prompt format to keep characters and audio consistent across cuts.
Text-to-video generates a clip entirely from your written description. Image-to-video takes a still image as the visual starting point and animates it based on your prompt. For image-to-video, focus your prompt on the motion you want rather than describing what is already in the image.
Yes. Happy Horse 1.0 supports visual references, which let you pass in reference images so the model maintains a consistent character or style across multiple generations.


