Token
What is Token?
A token is the small chunk of text ( roughly a word or part of a word ) that AI models use as their basic unit of processing, like the individual bricks a model builds its understanding from.
At a glance
- Also known as
- Text tokenInput tokenOutput tokenVisual token
- Used for
- Measuring prompt length and context window consumption in AI modelsCalculating the cost of AI API usage based on tokens processedRepresenting image patches as visual tokens in multimodal architecturesUnderstanding how model attention is distributed across prompt content
- Key features
- Basic unit of text processing: roughly one word or part of a wordToken limits define maximum prompt length, output length, and session memoryExtended to visual tokens in multimodal models for image and video inputsToken position and proximity influence how strongly concepts are associated
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.
How it compares
Compared with related concepts
Tokens are related to but distinct from words, characters, and parameters. Words are the human unit of language that tokens approximate; characters are the raw letter-level units that tokens aggregate; parameters are the learned weights within a model's neural network, an entirely different concept that is sometimes confused with tokens in casual discussion. A model's parameter count describes its size and learning capacity, while its token count describes the length of text it can process at once: a model with more parameters is not necessarily one with a larger context window, and a larger context window does not imply more model knowledge or capability. The distinction matters when evaluating AI tools: parameter count is a measure of what a model knows; token limits are a measure of how much it can attend to at once.
Think of it like…
Think of a token as a puzzle piece in a very large jigsaw. A word is often one piece, but an unusual or technical word might need to be broken into two or three smaller pieces that the model assembles into meaning from context. The model can only hold a certain number of pieces on the table at once: its context window. If you pour too many pieces onto the table, the oldest ones slide off the edge and are forgotten. This is why long prompts sometimes lose track of instructions specified far from the current generation point: those tokens have moved beyond the active attention space.
Pro tip
When writing prompts for AI video or image generation, treat the opening twenty to thirty tokens as prime real estate. Lead with the most critical creative decisions ( subject, camera treatment, visual style, lighting ) before adding secondary details like background elements, colour temperature, or mood. Models weight earlier tokens more consistently than later ones, and a long prompt that buries the key instruction in paragraph three will often under-execute on that instruction while faithfully following the details described early. If your prompts are consistently long, try a trimming pass that removes any phrase that could be inferred from context, freeing tokens for the genuinely specific creative direction that the model cannot guess.
Types and variations
- Tokens take different forms depending on the modality and context in which they are used.
- Text tokens are the standard form: units of language produced by a tokenizer from input text and processed sequentially by the model's attention layers.
- Input tokens are those submitted by the user as part of the prompt; output tokens are those generated by the model as its response.
- These are often priced differently in commercial AI APIs because output generation is computationally more intensive than input processing.
- Visual tokens extend the concept to image data, where an image is divided into fixed-size spatial patches and each patch is converted into a numerical vector that the model processes alongside text tokens.
- In video models, temporal tokens represent sequences of frames, adding a time dimension to the spatial patch structure.
- Special tokens: such as those marking the beginning or end of a sequence, or separator tokens between different content types: are used internally by models to manage context structure.
Ready to make your first scene in Morphic?
Try MorphicCommon use cases
- Token awareness is most directly relevant when working with AI models through APIs, where usage is billed per token and where context window limits require careful management of prompt length and conversation history.
- Developers building AI-powered applications must track cumulative token counts across a session to avoid exceeding context limits and to manage API costs.
- For creators using AI generation interfaces directly, token considerations become relevant when constructing long, detailed prompts: particularly for complex scenes with multiple subjects, specific stylistic references, and detailed technical instructions: where there is a risk that the prompt's later content will be under-attended by the model.
- Understanding token allocation also helps explain why multi-subject scenes sometimes under-specify one subject: if the prompt spends many tokens establishing the first subject in detail, fewer tokens remain to describe the second, resulting in unequal generation quality across the composition.
Ready to create?
Direct scenes, design characters, and ship full films
All-in-one AI creative platform with simple, transparent pricing, no speed throttles, and an infinite Canvas for max creativity.