ByteDance Bernini: complete guide to AI video editing and prompts

The complete guide to ByteDance Bernini, the open-source AI video model: what it does, its specs, how it reads a prompt, the consistency lock behind clean edits, and prompt structure by task.

ByteDance Bernini: complete guide to AI video editing and prompts

Bernini is ByteDance's open-source video model, built around editing as much as generation. An MLLM planner reads your instruction and works out what should change, then a DiT renderer built on Wan2.2 paints the pixels, so it can alter a real clip while leaving everything you didn't mention untouched. This guide covers what Bernini does, its specs, how it reads a prompt, the consistency lock behind its clean edits, and the prompt structure for each task.

What can Bernini do? Editing, subject-to-video, and generation

CapabilityWhat it doesBest for
Consistency-locked editingAdds, removes, or alters elements in a clip while untouched regions stay frozenObject add/remove, clean retouches
Reference-guided editingApplies a reference image or a second clip to the source videoGarment swaps, product or screen insertion
Subject-to-videoPlaces a person or character from reference images into a new sceneAvatars, character work, serialized content
Motion editingChanges what a subject is doing inside a clipRe-posing an action without re-shooting
Unified image + videoOne model spans text-to-image, image editing, text-to-video, and video editingStills and motion from one prompt language

Consistency-locked editing

Because the planner settles the semantics before the renderer paints, Bernini holds the parts of a clip you didn't ask to change. Name the edit, then name what stays fixed, and untouched regions keep still across the whole video with no flicker or drift. It is the model's strongest editing trait.

Reference-guided editing

Feed a reference image or a second clip and Bernini applies it to the source video. Swap a garment onto a moving subject from a single still, or insert a product or on-screen video so it tracks the original footage. The rest of the source clip stays intact around the change.

Subject-to-video

Pass reference images and refer to each by index in the prompt (image0, image1), saying which subject or attribute comes from which. Bernini carries the subject into a new scene with the face recognizable as it moves, its standout result in ByteDance's subject-to-video evaluations.

Motion editing

Change what a subject is doing inside an existing clip, a person crouches instead of bending over, while their identity, the framing, the lighting, and the background stay put. It re-blocks an action without re-shooting the take.

Unified image + video

One model spans text-to-image, image editing, text-to-video, and video editing, so a still and a moving edit come from the same prompt language. You learn one way to instruct it and apply it across both formats.

Bernini use cases

Clean up footage you already shot

Remove a distraction, add a missing element, or restyle a detail in a real clip, without re-shooting it. The consistency lock keeps the rest of the shot identical.

Before and after: a distraction removed from a lakeside clip while the rest of the scene stays unchanged

Build a character that recurs

Keep the same face across episodes, ads, or an avatar series. Subject-to-video carries a person's identity from a few reference images into new scenes.

The same character with a consistent face shown across three different scenes and outfits

Try-on and product placement

Swap a garment onto a moving subject from a reference image, or drop a product or an on-screen video into a shot, with the source clip kept intact.

Before and after: a model's tee swapped for a tailored blazer while the pose, lighting, and background stay the same

Change a performance

Re-block an action or adjust a subject's motion in a take, instead of filming it again, while identity, framing, and lighting stay fixed.

Before and after: a subject's pose changed from bending to crouching while the scene, framing, and lighting stay the same

How to prompt Bernini

Two habits carry most of the quality on Bernini.

  • Write an instruction, not just a description. For edits you are changing an existing clip, so the prompt is a directive: what to add, remove, or alter, and where. For generation (text-to-video, text-to-image) you describe the whole scene as usual.
  • Name what changes, then name what stays. The renderer can touch any region, so the most reliable edits state the change and then pin everything that should not move. That second habit is the consistency lock, covered next.

A detailed, structured instruction beats a terse one. Bernini's planner does better when you spell out size, placement, materials, and how the new element's lighting matches the scene, rather than leaning on a one-liner.

The consistency lock: edit one thing, keep the rest

The renderer holds untouched regions well, but only if the prompt tells it what they are. The pattern is to state the edit precisely, then list everything that must stay unchanged, ending on "unchanged." Removal works the same way, describe the fill, then lock the surroundings.

EditWeakStrong
Add an objectPut a snowman in the videoAdd a three-snowball snowman in the mid-right ground beside the dog, carrot nose and coal buttons, matching the overcast light and soft shadows. Keep the dog, road, and trees unchanged.
Garment swapChange the shirtReplace the outer shirt with the one in the reference image, worn with realistic drape. Keep the pose, camera, lighting, background, and motion exactly as they are.
Subject-to-videoUse these references in a beach videoThe statue from image0, in the shorts from image3, on the bench from image4 at sunset, gently swaying to music. Keep the statue's stone body from image0 and the beach scene from image4 unchanged.

Skip the lock and the model is free to redraw the background. Spend a sentence on it and the edit reads as native to the original shot.

Common Bernini prompt mistakes (and how to fix them)

  • No lock: name what stays unchanged, or the edit bleeds into the rest of the frame.
  • A terse instruction: describe the new element fully, its size, placement, materials, and lighting, instead of a three-word command.
  • Vague references: for subject-to-video, reference each image by index (image0, image1) and say which attribute comes from which, rather than "use these references."
  • Motion edits that move identity: when changing motion, pin the person, wardrobe, position, and camera so only the action changes.
  • Expecting 4K: the default render is 480p at 16fps, tuned for editing fidelity over resolution. Judge it on how cleanly it holds the untouched regions.

Bernini specs and architecture

SpecBernini
ProviderByteDance
ArchitectureMLLM planner (Qwen2.5-VL) + 14B DiT renderer (Wan2.2)
ModesText-to-image, image editing, text-to-video, video editing, motion editing, reference editing, subject-to-video
Resolution480p (default)
Frame rate16 fps
LicenseApache 2.0, open weights

FAQs

How do I get the best results from Bernini?

State the change precisely, then explicitly lock everything that should stay unchanged, the subject, camera, lighting, background, and shadows. Write detail rather than a one-liner, and make one edit per pass.

What is the consistency lock?

It is the phrasing habit that makes Bernini's editing shine. After you describe the edit, you pin the untouched regions as unchanged. Bernini holds those regions well, but only if the prompt tells it what they are.

How do I reference images for subject-to-video?

Pass several reference images and refer to each by index in the prompt (image0, image1, image2). State which subject or attribute comes from which image, then describe the new scene and the motion.

What inputs does Bernini accept?

Text alone for generation, a video plus text for editing and motion editing, a video plus a reference image or clip for reference-guided edits, and a set of reference images plus text for subject-to-video.

What resolution and frame rate does Bernini output?

The default render setting is 480p at 16fps. The release prioritizes editing fidelity and consistency over maximum resolution, and higher settings are possible at greater compute cost.

chair
Bring your stories to life
No downloads, no installs. Join a growing community of creatives using Morphic to transform ideas into beautifully crafted stories.