ControlNet
What is ControlNet?
ControlNet lets you give an AI image generator a structural reference, like a pose or a depth map, so the output follows that exact spatial layout.
At a glance
- Also known as
- Spatial control for diffusion modelsConditional image generation control
- Used for
- Pose-controlled generationDepth-constrained compositionEdge-guided image synthesisPrecise layout control
- Common tools
- Stable diffusion with ControlNet extensionComfyUIAutomatic1111
- Related terms
- Diffusion modelImage-to-imagePose estimationDepth mapInpainting
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.
How it compares
Image-to-image generation uses a reference image directly as a visual starting point, influencing both the structure and the visual content of the output. ControlNet extracts specific structural information from a reference, such as pose or edges, and uses that as a spatial constraint while leaving the visual content and style to the text prompt and base model. ControlNet provides structural precision without requiring the full visual content of the reference to appear in the output.
Think of it like…
Imagine you are drawing a picture and someone gives you a colouring book outline showing exactly where all the lines and shapes should go. You can still choose any colours and textures you like for each area, but the shapes are already decided for you. ControlNet works like that outline. It gives the AI a structural skeleton to follow, whether that is the pose of a person, the edges of a composition, or the depth of a scene, while still letting the AI choose all the visual details, textures, and style within that structure. How it works in simple terms: a separate neural network module processes the structural control image and passes spatial conditioning information to the main generation model during the diffusion process. The control module constrains where things are; the main model decides what they look like. Where you encounter this: ControlNet is used in open-source AI generation pipelines for character pose matching, architectural render generation, illustration-to-render conversion, and any workflow requiring precise compositional control over AI-generated imagery.
Pro tip
When using multiple ControlNet inputs simultaneously, adjust the weight of each control module rather than applying all at full strength. A pose control at 0.8 weight combined with a depth control at 0.6 weight typically produces better results than both at 1.0, because it gives the base model more room to produce coherent visual quality within the structural constraints rather than fighting between competing high-weight control signals.
Types and variations
- Pose ControlNet uses skeleton keypoint maps to control character body position.
- Edge ControlNet uses contour detection maps to constrain the structural lines of the output.
- Depth ControlNet uses depth maps to preserve spatial depth relationships from a reference.
- Segmentation ControlNet uses region labels to control what type of content appears in each area of the frame.
- Normal map ControlNet uses surface normal data to constrain the three-dimensional character of surfaces in the output.
- Multiple ControlNet modules can be used simultaneously with weighted blending between control inputs.
Ready to make your first scene in Morphic?
Try MorphicCommon use cases
- Character pose matching uses pose ControlNet to generate characters in specific body positions defined by a reference image or skeleton.
- Layout preservation uses edge or depth ControlNet to generate stylised versions of an existing composition while maintaining its structural logic.
- Product placement uses segmentation ControlNet to control where specific content types appear in a generated scene.
- Architecture visualisation uses depth and edge control to generate design renders that preserve the spatial structure of an existing model or sketch.
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.
FAQs
ControlNet is a neural network architecture that adds spatial control to image generation models by conditioning the generation process on structural input images such as pose maps, edge maps, or depth maps. It allows creators to specify the compositional and spatial structure of generated outputs with far greater precision than text prompts alone.
ControlNet trains additional neural network modules that process structural control images alongside the base diffusion model. These modules extract spatial information from the control input and pass it as conditioning to the generation process, constraining where elements appear in the output without overriding the base model's visual style.
ControlNet supports pose maps for body position control, edge maps for structural line control, depth maps for spatial depth relationships, segmentation maps for regional content control, and normal maps for surface geometry control, among others. Multiple control types can be used simultaneously.
Image-to-image uses a reference image directly, influencing both structure and visual content. ControlNet extracts specific structural information from a reference and uses only that as a spatial constraint, allowing text and base model to determine visual content and style independently of the reference's appearance.
Pose ControlNet uses skeleton keypoint maps to ensure generated characters match a specific body position. It is widely used for generating character variations in identical poses, matching a reference pose for product or fashion visualisation, and ensuring consistent character stance across multiple generations.
ControlNet modules are architecture-specific and must be compatible with the base model. Most ControlNet development has been for Stable Diffusion and its variants. Each base model architecture requires its own ControlNet modules trained for that specific architecture.
ControlNet weight controls how strongly the control module's spatial conditioning influences the generation output. Higher weights produce outputs that follow the control image more precisely but may reduce visual quality. Lower weights allow more generative freedom while still applying directional spatial guidance.
ControlNet principles are used or referenced in many commercial AI generation tools, though implementations vary. The architecture originated in the open-source Stable Diffusion ecosystem and has influenced how spatial control features are developed across a broader range of commercial and research AI generation platforms.