Inference
What is Inference?
Inference is what happens when you click 'generate' — the AI applies everything it learned during training to produce a new image or video based on your prompt.
At a glance
- Also known as
- Model inferenceGenerationForward pass
- Used for
- Generating images and video from promptsRunning AI models to produce new outputsApplying trained model knowledge to user inputs
- Common tools
- Stable diffusionMidjourneyRunwayKlingAny AI generation platform
- Related terms
- Diffusion modelsSamplingCFG scaleLatent spaceModel distillation
- How it works in simple terms
- A trained AI model contains learned patterns and parameters. During inference, the model takes your input ( a text prompt, a reference image, or other conditioning ) and runs it through those learned parameters in a single forward pass, producing an output that reflects both the training data's patterns and the specific guidance you provided.
- Where you encounter this
- Inference is what occurs every time you generate content using an AI tool. The wait time between submitting a prompt and receiving a result is the inference time. Cost-per-generation pricing on AI platforms reflects the computational cost of running inference. When platforms offer speed options: draft quality versus high quality, or different model sizes: they are offering different inference configurations.
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.
How it compares
Compared with related concepts
Inference is the operational counterpart to training. Training is the computationally massive, one-time process of building a model's capabilities over millions of examples; inference is the comparatively smaller computation that runs the trained model to produce individual outputs. A model trained once can then be used for countless inference runs, which is why large companies invest heavily in training but can offer inference at relatively low per-generation costs.
Pro tip
When you encounter slow generation times or want to reduce costs, look for settings that control inference steps or quality levels. Reducing steps from the default can produce faster, lower-fidelity outputs suitable for concept exploration, while maximising steps and resolution uses more compute to produce the highest quality result for final production.
Types and variations
- Inference configurations vary by the number of sampling steps used (more steps generally produce higher quality but take longer), the guidance scale applied (how closely the model follows the prompt), the image resolution requested, and the underlying model architecture.
- Batch inference allows multiple generations to run simultaneously, improving throughput.
- Real-time inference optimises for speed above quality, enabling near-instantaneous generation for interactive applications.
Ready to make your first scene in Morphic?
Try MorphicCommon use cases
- Inference is central to every AI generation workflow.
- It is what occurs when generating images from prompts, creating video from text or reference images, running style transfers, performing inpainting, upscaling images, or using any AI model to produce new content.
- Understanding inference helps creators manage generation costs, interpret speed and quality tradeoffs, and make informed choices about which models and settings to use for different tasks.
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.