Question 1

What is inference in the context of AI generation?

Accepted Answer

Inference is the process of running a trained AI model to generate new outputs ( images, video, text, or other content ) from user inputs such as prompts or reference images. It is the operational phase that follows training and represents what actually happens when a creator requests a generation.

Question 2

How is inference different from training?

Accepted Answer

Training is the process of building a model's capabilities by exposing it to large datasets and adjusting its parameters over many iterations: a computationally massive, one-time process. Inference is the process of using the already-trained model to generate new outputs, which is comparatively less computationally demanding but still requires significant GPU resources for large models.

Question 3

Why does inference take time?

Accepted Answer

Inference time is determined by the number of processing steps the model performs, the resolution of the output, and the size of the model itself. Diffusion models, which iteratively refine noise over multiple denoising steps, are particularly computationally intensive because each step requires running the full model forward pass: a process that must be repeated tens or hundreds of times per generation.

Question 4

What factors affect inference speed?

Accepted Answer

The main factors are model size (larger models require more compute per step), the number of denoising steps (more steps mean better quality but longer generation time), output resolution (higher resolution requires more memory and computation), and the hardware available (better GPUs significantly reduce inference time).

Question 5

How do inference costs work on AI generation platforms?

Accepted Answer

Most platforms charge per generation based on the computational cost of running inference, which varies with model quality, output resolution, and generation duration for video. Premium models with higher output quality typically cost more per generation because they consume more compute during inference.

Question 6

What is model distillation and how does it relate to inference?

Accepted Answer

Model distillation is a technique for creating smaller, faster models that approximate the behaviour of larger, more capable ones. Distilled models run inference significantly faster and at lower cost while attempting to maintain most of the quality of the original. Many platforms offer distilled model variants for use cases where speed is more important than maximum quality.

Question 7

Can inference quality be controlled by the user?

Accepted Answer

Yes. On most platforms, users can control inference quality through parameters such as the number of sampling steps, the guidance scale, and the choice of sampler. More steps generally produce higher quality at the cost of longer generation times. Some platforms abstract these controls into simple quality presets ( draft, standard, and high quality ) that adjust the underlying inference settings automatically.

Question 8

What does 'real-time inference' mean?

Accepted Answer

Real-time inference refers to configurations optimised to produce outputs fast enough for interactive applications: in some cases, near-instantaneously. Achieving real-time inference typically requires using smaller, distilled models and reducing output resolution or quality, making it suitable for live previews, interactive experiences, or rapid iteration rather than final production.

Inference

What is Inference?

Direct scenes, design characters, and ship full films

Types and variations

Ready to make your first scene in Morphic?

Common use cases

Direct scenes, design characters, and ship full films

FAQs