A wide shot is a camera framing that shows a subject ( typically a person or group of people ) within their surrounding environment, with sufficient visual field to establish the spatial context, scale, and geography of the scene. In a standard wide shot of a person, the full body is visible and occupies a portion of the frame, with the environment surrounding them clearly legible: the room they are in, the landscape they stand within, the street or square they inhabit. The wide shot tells the viewer where the action is taking place and establishes the spatial world within which closer shots and tighter detail can then be understood. It is the visual grammar equivalent of beginning a sentence with its subject placed in its context: grounding the viewer before moving into detail.
The wide shot occupies a specific position in the continuum of shot sizes that ranges from the extreme wide shot (EWS), which reduces subjects to small figures within a vast environment, through the wide shot, to the medium wide shot, the medium shot, the medium close-up, the close-up, and the extreme close-up. Each step inward along this scale trades environmental context for personal or emotional detail. The wide shot strikes a balance between these poles: it includes enough environment to be spatially informative without sacrificing legibility of the subject's actions and body language. In traditional coverage-based filmmaking, the wide shot: often called the master shot when it captures an entire scene: is typically filmed first, providing a complete record of the scene's action from which closer coverage can be selected and assembled in editing. The wide shot is the safety net that editors return to when tighter shots create continuity problems or when the full spatial relationship between characters needs to be restated for clarity.
In AI video and image generation, specifying a wide shot gives the model clear compositional guidance that significantly improves the relevance and spatial logic of the output. Without explicit framing specification, generation models default to whatever composition they find most aesthetically compelling for the described subject, which is often a medium shot or close-up rather than the wider environmental view that may be required for establishing purposes. Language like 'wide shot of a figure walking through a crowded market,' 'full-body wide shot showing the character in their surroundings,' or 'master shot establishing the interior of the cathedral' communicates both the scale of the framing and the environmental-contextual function of the shot, producing outputs better suited to their intended editorial role. Adding descriptions of what should be visible in the environment around the subject further strengthens the compositional specificity, helping the model understand that the environment is as important as the subject in this particular frame.