You have spent time on the idea, uploaded your reference images, written out the scene. Then the generation gets flagged. Or it comes back looking nothing like what you described. You tweak a word or two, try again, and hit the same wall.
It is not your idea that is the problem. It is not even the content of your scene in most cases. The problem is that Seedance 2.0 is reading your prompt differently than you intended it to. Once you understand how that reading works, the fix becomes obvious.
This guide covers the full picture: how the input system works, why the filter behaves the way it does, how to structure prompts that pass cleanly, how to handle image uploads correctly, and how to unlock the advanced techniques most users never reach. Most people who apply these changes see a significant improvement in their very next generation.
Understanding Seedance 2.0's input system
Before anything else, it helps to understand exactly what Seedance 2.0 can accept and where most people go wrong before they even write a prompt.
- Images: Up to 9. Used as opening frames, character references, scene environments, or style anchors.
- Video clips: Up to 3 clips, combined duration no longer than 15 seconds. Used to reference camera movements, replicate motion, or as source footage to extend or edit.
- Audio files: Up to 3 files, combined duration no longer than 15 seconds. Used for background music, sound design, or voiceover tone reference.
- Text: Your prompt, in natural language or structured JSON.
The combined file count across all input types cannot exceed 12. When approaching that ceiling, deprioritize audio and secondary visual references first, as those elements are easier to describe in text. Reserve upload slots for inputs that most directly determine what the generation looks like.
[Reference] Choosing the right entry point
- First and last frames: Use for single-image plus text generations. Straightforward and fast for simple shots.
- All-in-One Reference: Required for any combination of images, video, and audio. This is the only mode where @ tag referencing works. If you are mixing input types at all, this is the mode you need to be in.
Note: Smart Multi-Frame and Subject Reference are not currently available in Seedance 2.0.
The real reason your prompt got flagged
Most people assume flagged prompts contain a specific word or phrase that tripped a filter. That assumption leads to an endless loop of swapping out words, adding disclaimers, or stripping the prompt down. None of which actually solves it.
Seedance 2.0's content filter does not work that way. It uses a language model to read your entire prompt as a single scene and make a judgment about what that scene represents. It is evaluating intent and context, not scanning for individual terms.
Think of it like a security guard at a film studio versus one at a bank. The same prop gun gets waved through the studio gate without a second thought, because the context makes the purpose obvious. At the bank, it is a completely different story. The object has not changed. The context has.
What this means in practice: a word that looks sensitive in isolation can sit inside a well-constructed cinematic prompt without any issue at all. The filter reads the full picture. A prompt with no picture to read, no setting, no visual purpose, no narrative logic, gives it nothing to work with. When the filter cannot confidently interpret what is being made, it errs on the side of caution.
That is the core of almost every flagged prompt that should not have been flagged. Not bad content. Not a bad idea. Just a prompt that did not give the filter enough to understand.
The practical shift is this: a prompt that reads like a filmmaker describing a shot tends to pass. A prompt that reads like a note to a friend tends to get flagged.
One category is a hard block, not a fixable flag. Two types of content are rejected at the image scan stage before the prompt is even read, and no amount of cinematic framing will pass them:
- Real faces of identifiable people: celebrities, politicians, and public figures
- Named copyrighted characters: branded superheroes, Disney characters, recognizable fictional IP
If your generation is failing on an uploaded photo of a real person, that is a platform-level restriction, not a prompt problem.
How to write prompts that the filter reads as clearly creative
[Filter] Frame the whole scene, not just the action
The most common structure in flagged prompts is a single action with no surrounding context. Something happens, but there is no location, no visual atmosphere, no reason for the camera to be there. The filter cannot tell if this is a film set or something else.
The solution is not to remove the action. It is to build the scene around it until the intent is self-evident.
| Avoid this | Use this instead |
|---|---|
| a soldier shoots someone in the street | wide shot, war-torn Eastern European street in the 1940s, a soldier in a grey uniform fires toward an off-screen position during an active firefight, smoke rising from collapsed buildings in the background, overcast flat light, 35mm grain, documentary-style handheld framing, debris scattered across the foreground |
The action is identical. The first gives the filter one thing to evaluate. The second gives it a war context, a historical period, a camera position, and a full visual atmosphere. One reads like a report. The other reads like a film brief.
Build outward from the action and answer these four questions in your prompt:
- Where is this happening?
- What does it look like visually?
- What is the camera doing?
- What is the overall atmosphere?
Answer all four and most flagging issues resolve on their own.
[Prompt] Treat your prompt as a series of visual facts, not a story
One of the less obvious causes of flagging is prompt text that reads emotionally or narratively rather than visually. These elements add interpretive noise that the model has to work around:
- Character motivations
- Dramatic backstory
- Relationship context
- Emotional explanations
The filter cares about what the camera would see if this scene existed. It does not need to know why.
A screenplay has two parts: the scene description and the subtext. Seedance 2.0 only needs the scene description. The emotional undercurrent, the backstory, the reason the character is running — that is subtext. It belongs in the writer's head, not in the prompt.
Before including any sentence in your prompt, ask one question: if this were a real film shoot, would this sentence appear on the shot list? If not, it almost certainly does not belong in the prompt.
This discipline also improves generation quality significantly. The model executes what it can see, not what it can infer. Dense, specific, visual prompts outperform long, narrative ones almost every time.
For multi-shot sequences, structuring the prompt as JSON makes this discipline automatic. Seedance 2.0 accepts it natively:
{
"visual_world": {
"light": "soft overcast, diffused shadows, no hard edges",
"color": "muted naturals, cold whites, desaturated tones",
"film": "35mm grain, anamorphic lens, soft halation on highlights",
"atmosphere": "quiet, isolated, expansive"
},
"sequence": {
"duration": "10 seconds",
"pacing": "slow build to rapid cuts, ends in stillness",
"shots": {
"shot_1": {
"duration": "3 seconds",
"camera": "locked off wide shot, low angle",
"action": "Lone rider on horseback crests a snowfield ridge",
"transition": "SMASH CUT"
},
"shot_2": {
"duration": "4 seconds",
"camera": "tracking shot from behind, handheld feel",
"action": "Horse and rider gallop through deep snow, cloak whipping in wind",
"transition": "SMASH CUT"
},
"shot_3": {
"duration": "3 seconds",
"camera": "static wide, fully locked off",
"action": "Empty snowfield, a wolf standing motionless on a distant ridge"
}
}
}
}
Setting up a visual_world block establishes the cinematic register of the whole sequence. Each shot then needs only to describe what the camera sees at that moment.
[Prompt] Use production language to establish what kind of content this is
There is a reliable pattern worth knowing: prompts that include the vocabulary of film production tend to be evaluated with significantly more latitude than prompts written in plain language.
The reason is straightforward. When a prompt contains shot types, lens specs, lighting setups, and aspect ratios, the model interprets it as a production brief. Film productions are allowed to depict dramatic, intense, and morally complex material. That context shifts how the filter weighs the content.
It works a bit like wearing a hard hat on a construction site. The hat does not change what you are doing, but it immediately signals to everyone around you what kind of environment this is and what the rules are. Two or three production terms in a prompt do the same thing: they establish the register before the filter evaluates anything else.
This does not mean loading every prompt with technical jargon. It means including enough production language that the framing is unambiguous. Here is a reference list organized by category:
Shot types
- Wide shot, medium shot, close-up, extreme close-up
- Over-the-shoulder, POV, bird's-eye view, two-shot
- Low angle, high angle, Dutch angle
Camera movements
- Dolly in / dolly out
- Tracking shot, pan, tilt, crane shot
- Locked off, low-angle push, circling shot, handheld
Lens and format
- 35mm grain, anamorphic lens, 2.39:1 aspect ratio, 1.85:1
- Vintage glass, soft halation, shallow depth of field
- Lens flare, rack focus
Lighting
- Overcast diffused light, volumetric rays through haze
- Practical lighting, side backlight, motivated shadow
- Golden hour, hard directional light, rim light
Color and tone
- Muted desaturated palette, high contrast, bleach bypass
- Cold blue tones, warm amber, crushed blacks
- Washed-out highlights, flat low-contrast grade
Adding two or three terms from any of these categories to a prompt establishes the production context clearly. That is often all it takes.
Why your prompt is getting flagged even with no sensitive content
Sometimes a generation gets flagged and there is nothing remotely sensitive in the prompt. No action, no drama, no difficult subject matter. Just a scene that should be completely fine.
This happens when the prompt is too sparse. A short, plain description without any cinematic framing, scene context, or visual specificity gives the filter an incomplete picture. It is like sending someone a single sentence from the middle of a script with no title page, no scene heading, and no stage direction. They cannot tell if it is a thriller, a comedy, or something else entirely. Incomplete pictures do not get approved, they get held.
| Avoid this | Use this instead |
|---|---|
| a person holds a knife | close-up, a chef's hands grip a cleaver over a wooden chopping board, motion blur as the blade comes down on a whole fish, kitchen environment with steam rising in the background, warm tungsten lighting, shallow depth of field, cinematic food documentary style |
Same object, completely different read. The first gives the filter one object and one action. The second gives it an environment, a purpose, a production context, and a camera description.
The fix is straightforward. Even a simple scene benefits from these additions:
- A specific setting and time period
- An atmosphere or mood descriptor
- A camera position or shot type
- One or two production language terms to establish context
The @ reference system: why uploads fail silently
A significant number of Seedance 2.0 problems are not filter issues at all. They are reference issues. Users upload images and video clips expecting the model to understand what each file is for, and the model does not assume anything.
Uploading a video does not make it a camera reference. Uploading an image does not make it the opening frame. Think of it like handing a stack of unlabelled photos to a director on set. They can see what is in each picture, but they have no idea which one you want as the opening frame, which is a costume reference, and which is just background inspiration. Without labels, they guess. Seedance 2.0 is no different. Every uploaded file needs an explicit role stated in the prompt using @ tags, or it gets processed ambiguously.
Activate @ tagging by typing @ in the prompt field (a reference selector will appear) or by clicking the @ icon in the toolbar. Then state exactly what each file is for before describing the action.
| What you want | How to write it |
|---|---|
| Set the opening frame | @Image 1 as the first frame |
| Reference camera movement | reference all camera movements from @Video 1 |
| Match character appearance | character appearance based on @Image 2 |
| Set background music | use @Audio 1 as the background score |
| Replicate motion choreography | replicate the movement style from @Video 1 |
| Define the environment | the setting is based on @Image 3 |
| Reference voiceover style | match the voiceover tone of @Video 2 |
When running multiple references, list every role assignment at the top of the prompt before any scene description. Any @ tag without an explicit role is one of the most reliable causes of inconsistent or unexpected output.
How Seedance 2.0 reads image uploads (and where it breaks down)
[Image] When you upload a character image, let it do the work
There is a natural temptation, when uploading a character reference image, to describe that character in the prompt as well. Resist it. The image has already done that work. Repeating it in text does not reinforce the output. It introduces a second, competing layer of information the model has to reconcile.
What the prompt needs to do is describe the scene clearly:
- What is happening in the shot
- How the camera is positioned
- What the environment looks like
- How the shot moves
The image handles appearance. The prompt handles everything the camera sees.
This is also where flagging becomes an image issue. Seedance 2.0 applies stricter evaluation to any prompt where a character is likely to be read as a minor. Words that signal youth ("child," "kid," "young," "boy," "girl") trigger heightened scrutiny across the entire prompt, not just the phrase where they appear, and regardless of what any uploaded image shows.
The cleaner approach is to describe characters by their role in the scene. The image handles everything about who they are. The prompt handles what happens and what the camera sees.
| Avoid this | Use this instead |
|---|---|
| a young boy watches a building burn down | a small figure in a dark coat stands at the edge of a crowd watching a building consumed by fire, medium shot from behind, warm orange glow from the flames, thick smoke rising into a dark sky, cinematic, 2.39:1 anamorphic, documentary style |
The word "young" in the first version raises the sensitivity threshold for the entire prompt. The second version lets the uploaded image carry the character's identity. The prompt describes only what the camera sees.
[Image] Flagged before you even submitted? The image itself is the issue
There is an image evaluation layer in Seedance 2.0 that runs independently of the prompt filter. If an uploaded image contains a clearly visible face, it can trigger a rejection before the model processes any text at all. This explains the pattern where rewriting the prompt repeatedly makes no difference. The prompt is not being read.
How to get around it:
- Face away from camera. Frame the subject from behind or at an angle where facial features are not visible. Clothing, posture, hair, and environment carry enough information for most reference purposes.
- Go wide. Pull the shot wide enough that the figure reads as a silhouette or small element in the frame rather than the dominant subject.
- Use illustration over photography. Swap photographic references for illustrated or stylized ones. The evaluation applies differently and illustrated images pass more reliably.
- Shift the reference purpose. Use the image for clothing, setting, color palette, or spatial composition rather than for the face or identity.
If a generation keeps failing with no clear prompt-side explanation, adjust the image before rewriting any text.
Advanced techniques worth knowing
[Advanced] Extending existing footage
State the extension length and describe what the new segment contains:
Extend @Video 1 by 6 seconds. [Description of new segment content.]
Set the generation duration to the length of the new segment only, not the combined total. Extending by 6 seconds means setting the duration to 6 seconds.
[Advanced] Bridging two clips with a generated middle
Generate a connecting scene between @Video 1 and @Video 2. The transition shows [describe the action, environment, or movement that links the two clips].
The generated segment will be inserted between the two uploaded clips, so describe it as if it were its own short scene.
[Advanced] Copying camera style from a reference clip
Upload any clip with the movement style you want and name it directly:
Reference all camera movements from @Video 1, including the low-angle circling shot and the push into close-up.
The model pulls movement rhythm, framing logic, and transition pacing from the reference clip. Precise technique names are helpful but not required.
[Advanced] Syncing edits to music
Scene transitions should align with the beat positions of @Audio 1. Apply visual style changes at each cut.
Seedance 2.0 can synchronize cuts, lighting changes, and scene transitions to the rhythm of an uploaded audio track.
[Advanced] Using audio from an existing video clip
No separate audio upload is needed if the clip you are already referencing has the audio you want:
Use the audio embedded in @Video 1 as the background score.
[Advanced] Using negative prompts to reduce common generation failures
Seedance 2.0 accepts negative prompt instructions alongside the main description. These are not a fix for filter flags, but they are effective at reducing visual artifacts that keep appearing despite solid prompting.
Keep them short and specific to the failure you are actually seeing rather than listing every possible problem:
negative: no jitter, no warping, no flickering, no identity drift
negative: no text morphing, no garbled logos, no color shift
negative: no motion blur on face, no floating limbs, no background collapse
A long negative prompt can backfire or simply get ignored. Two or three targeted terms tied to what is actually going wrong in your output tend to outperform exhaustive lists.
[Community] One thing people have been experimenting with
Some users have reported better pass rates by writing their scene description in Chinese while keeping any dialogue or on-screen text in English. The reasoning is that Seedance 2.0 was originally developed with strong Chinese-language training, so prompts written in Chinese may be interpreted with slightly different filter thresholds.
This is not a guaranteed fix and results vary, but if a well-constructed prompt keeps getting flagged despite solid cinematic framing, it is a low-effort thing to try. Run your scene description through any translator, keep the dialogue lines in English, and see if the output differs.
Input limits at a glance
| Input type | Limit |
|---|---|
| Images | Up to 9 |
| Video clips | Up to 3, combined total up to 15s |
| Audio files | Up to 3, combined total up to 15s |
| All files combined | Up to 12 |
| Generation duration | 4 to 15 seconds |
Before you generate: a quick checklist
- [ ] Am I using All-in-One Reference mode? (Required any time you mix input types)
- [ ] Does every @ tag have a role explicitly stated in the prompt?
- [ ] Does the prompt describe a visual scene, not a narrative or backstory?
- [ ] Have I included at least one production language element: a shot type, camera move, or lighting description?
- [ ] Does every sentence describe what the camera sees or establish a cinematic context?
- [ ] Does the prompt refer to characters by role rather than age?
- [ ] Is my reference image free of prominent faces, or cropped/illustrated?
- [ ] Is my reference image free of real identifiable people or named copyrighted characters?
- [ ] Is my total file count 12 or under?
Frequently asked questions
The filter needs enough visual context to confidently interpret what is being made. Short, plain prompts without cinematic framing or scene detail give it an incomplete picture, so it defaults to caution. Adding setting, atmosphere, camera position, and a sense of the production context usually resolves this.
If prompt edits are making no difference, the image is likely the issue. Seedance 2.0 runs face detection on uploads before the prompt filter activates. If a face is detected in a reference image, the generation is rejected at that stage. Edit the image, crop it, widen the shot, or use an illustration, before revising the prompt further.
This is a hard platform-level block, not a prompt issue. Seedance 2.0 scans uploaded images for identifiable real faces before the prompt is processed. Photos of celebrities, public figures, or anyone with a recognizable likeness are rejected at that stage. Switching to an illustrated reference or a non-identifiable image is the only way around it.
Yes. Production language signals to the model that this is a film creation context, which is evaluated with more latitude than plain-language descriptions. Including shot types, lens specs, and lighting descriptions shifts how the filter interprets the intent of the whole prompt.
First and Last Frames is for single-image plus text generations. All-in-One Reference is required any time you combine multiple input types: images, video clips, and audio. It is also the only mode where @ tagging works.
Yes. Tag it multiple times with different roles. For example: @Image 1 as the first frame, environment and lighting also based on @Image 1. Each role needs to be stated explicitly.
Upload it, reference it with @Video 1, state the extension duration and describe the new content. Set the generation duration to the length of the new segment only, not the combined total.
Yes. JSON is accepted natively and works well for multi-shot sequences. Use a visual_world block for the overall cinematic context and individual shot blocks for each camera position and action. This structure prevents the ambiguity that causes inconsistent generations.
Negative prompts are not useful for getting past the content filter, but they are worth using when your generations consistently show the same visual artifact. Keep them short and specific to your actual problem. Two or three targeted terms outperform a long generic list.
The system will not accept more than 12 files across all input types. Plan allocations before uploading. Secondary style references and audio descriptors are usually more efficiently handled as text in the prompt than as uploaded files.
Start generating on Morphic
The best way to test what is covered here is to open a generation and try it. Seedance 2.0 on Morphic gives you full multimodal access: images, video, audio, and text, with nothing to install.
