Glossaryarrow
Inference
Inference

Inference is the process of using a trained AI model to generate outputs from new inputs, as distinct from the training phase where the model learns patterns from data. During inference, the model applies its learned knowledge to produce images, video, or other content based on prompts or conditioning inputs provided by the user.

Inference is computationally intensive, particularly for large models generating high-resolution images or video, requiring significant GPU processing power and memory. The speed of inference determines how long users wait for generations to complete, and optimization of inference performance is a major focus in making AI generation practical for real-time or high-volume applications. Techniques like model distillation, quantization, and specialized inference engines are used to reduce computational requirements and speed up generation times.

Understanding inference as distinct from training helps creators grasp why some models are faster than others, why certain modifications affect generation speed, and how computational resources impact practical workflows. For platforms like Morphic that offer multiple models, inference costs and speeds are factors in how credits are allocated and which models are appropriate for different use cases.

Can't find what you are looking for?
Contact us and let us know.
bg