Activity 6.1 Monocular Depth Cues
When you look at the photograph at left (of the Hawthorne Bridge crossing the Willamette River in Portland, Oregon), you get a compelling sense of depth—a sense that the various objects in the picture are at different distances from you—despite the fact that you’re really just looking at an array of light specks on your computer monitor, all of which are exactly the same distance from your eyes.
We infer object distances in photos using pictorial depth cues. Like the Gestalt principles we learned about in Chapter 4, no single depth cue is always available or always reliable, but by combining multiple pictorial cues, we are usually able to parse depth relations pretty well. We’ll cover seven pictorial cues in this activity: occlusion, relative size, familiar size, relative height, texture gradients, linear perspective, and aerial perspective.
The pictorial depth cues are a subset of our collection of monocular depth cues: cues that we need only one eye to use. The other two monocular cues we’ll cover in this activity are motion parallax and accommodation/convergence.
The other major depth cue, stereopsis, is binocular (you need both eyes to use it). Binocular vision and stereopsis will be the subject of the other three activities in this chapter.
Click on the image links within the text of each page to see all images. The first time you do this activity, you should probably go through the parts in order from top to bottom.
Occlusion is probably the most pervasive and most reliable of all the depth cues. It is also perhaps the simplest: If part of object A is covering part of object B, A is almost certainly closer to you than B. In Image 1, we perceive the bicycle to be in front of the railing because the bicycle’s parts cover the railing’s metal tubes.
Image 2 shows another example in a more complex scene, a cityscape from Boston, Massachusetts. Here we perceive two clusters of tall buildings in the distance, one on the left and one on the right. In each cluster, one building is clearly perceived to be behind the rest because of occlusion: The John Hancock building on the left and the Prudential building on the right. Image 3 highlights the occlusion relations, and makes it clear that we also perceive the trees as being closer to us than the skyscrapers because they occlude the buildings.
Image 4 shows the Portland street scene again. Can you identify the places in the scene where occlusion allows us to disambiguate the depth relations? Image 5 highlights some of them.
Relative Size and Familiar Size
Remember the distance-to-object size to retinal-size relationship discussed back in the Chapter 3 activity on Visual Angle? We learned that visual angle is proportional to object size divided by distance from the observer. Thus, after doing a little reshuffling of the formula, we can determine that the distance from an object to our eyeball is proportional to the object’s size divided by its retinal size (Image 1).
One consequence of this relationship is that the farther away an object is, the smaller it appears on the retina. Furthermore, if there are two of the same type of object present in the scene you’re looking at, the object whose retinal image size is smallest must be farthest away (and vice-versa: the object with the largest retinal image must be closest). Thus, in Image 2, the disembodied eyeball knows that the red flower is farther away than the purple flower because the red flower projects a smaller retinal image.
This monocular depth cue, called relative size, is quite effective because it turns out that in many real-world visual scenes, we see multiple objects that can be assumed to be about the same size. For example, in Image 3, one of the ways we know that the statue in the center of the photo is closer than the three statues on the left because the central statue’s retinal image is much larger.
As its name implies, the relative size cue tells us how far away different objects are relative to each other, but it can’t, on its own, tell us exactly how far away any of the objects are. However, if we know how big an object really is, our brains can solve the distance-equals-object-size-divided-by-relative-size relationship to determine absolute distances. In this case we are using the familiar size depth cue.
Thus, adding people to the statues in Image 4 improves the sense of depth compared to Image 3 because now you can use your knowledge about how tall people are to mentally calculate how far away they are. And since the people are standing next to the central statue, you can use them to judge the exact height of this statue (this is an alternative use of the relative height cue).
Returning to our Portland street scene again (Image 5), can you identify some sets of objects whose relative or absolute distances can be inferred from the size cues? Image 6 highlights two such sets of objects: The cars circled in green must be at different distances since their retinal sizes are different, whereas the traffic lights must all be at about the same distance since their retinal sizes are all identical.
In Image 1, how far away is the boy in the center of the photo (call him Bob) relative to the other boys? Bob is not occluding any of the boys, and since we can’t say for sure how old any of them are, the size cues don’t help us either. Nevertheless, it should be quite clear that Bob is the closest of the boys.
We know this because of another depth cue, relative height. Physics tells us (or, at least, our brain’s implicit knowledge of physics tells us) that for objects standing on the ground, the higher an object is in the retinal image, the farther away it is.
Image 2 makes the relative heights of the boys explicit: Bob (“1”) is closest, followed by the boys whose shoes are labeled “2” and “3,” then “4,” and finally “5.”
In Image 3, you should see that relative height alone is enough to provide a fairly powerful sense of one object being closer to you than another. At first, the red cube should appear farther away than the blue cube. Click and drag the red cube to move it around the yellow frame, and you should easily be able to make the red cube appear to be closer. Note that the shadows are crucial to getting a sense of depth in this image; without them, you might perceive the cubes as floating in midair, and the brain (appropriately) does not apply the relative height cue when objects aren’t rooted to the ground.
Image 4 shows our Portland street scene again. Can you identify the objects whose relative heights allow us to order them in depth? Image 5 highlights some of these objects.
Texture Gradients and Linear Perspective
Our next two depth cues are really just special (but common and effective) combinations of relative size and relative height. In Image 1, it is readily apparent that the cup on the right is closer than the cup on the left. In part, this is due to the relative sizes and heights of the cups themselves. But the strong sense of depth in this picture is conveyed even more by the other objects in the image—the bricks.
For the most part, our brains consciously ignores the bricks because they are part of the background of the scene, and we’re usually more interested in objects in the foreground. Unconsciously, though, our brains notice that some bricks are considerably smaller and higher in the visual field than others (Image 2). Therefore, the bricks form a texture gradient. The distance to any object sitting on the texture can be accurately judged by comparing it to the part of the texture (i.e., the bricks) the object happens to be sitting on.
In Image 3, depth is conveyed by a similar cue, linear perspective. If we assume that the two sides of the road are parallel to each other, we know that the actual three-dimensional distance across the road is the same everywhere in the image. Therefore, the fact that the retinal distance across the road shrinks as the road goes on (Image 4) tells us that the road must be winding away from us into the distance. By extension, we can judge the relative distances of the motorcycles driving on the road.
You will never find texture gradients or linear perspective in a scene without also encountering the relative size and relative height cues, since, as we saw above, relative size and relative height effectively define texture gradients and linear perspective. However, when a scene includes a texture gradient and/or linear perspective, the sense of depth increases dramatically. For example, Image 5 shows our two cubes from the Relative Height part of this activity, with some background elements that provide a texture gradient and linear perspective. In this image, you should get a much more powerful sense that the cubes are at different distances.
There are two more interesting things to note in this image. First, you may perceive the red cube to be larger than the blue one. Click and drag the red cube so that it is next to the blue one, though, and you can confirm that the size of the two cubes is the same. When the cubes were in their original position, you perceived the red one to be farther away than the blue one, so your brain solved the distance–object size–relative size equation, calculated the actual object size of the red cube and determined that it must be bigger than the blue cube. The retinal sizes of the two cubes are equal, but the red cube’s distance is greater; therefore, the red cube’s object size must also be greater.
Second, if you start with the red cube back in its original position and drag it off to the right side of the window, you will probably perceive it as floating in midair right above the blue cube. Here, in the absence of a texture gradient cue, your brain sees that the retinal sizes of the two cubes are the same, assumes that the actual object sizes are the same, and therefore concludes that the distance to the two objects must be the same. This would mean that the red cube floating in the air must be at the same distance as the blue cube, which is a reasonable interpretation since there isn’t a shadow cue to indicate that the red cube is anchored to the wall.
The atmosphere is mostly empty, but every molecule in the air scatters a little bit of sunlight, and over a long distance this scattering adds up to make distant objects appear hazier and less distinct than closer objects. This provides our last pictorial depth cue: aerial perspective, illustrated in Image 1. Look closely and you will see that the buildings in the lower-right portion of the photo are sharpest, the buildings across the river on the left are fainter, and the buildings above the trees in the center-right of the photo are hazier still (Image 2 shows close-ups of these three areas). Thus the distances of the three sets of buildings must be ordered accordingly (closest, medium distance, farthest).
Image 3 shows our Portland street scene one more time. The object whose distance is determinable via aerial perspective should be obvious here: Mt. Hood (Image 4), which is located some 60 miles to the east of the point where this picture was taken.
The last two depth cues we will discuss in this activity are not available in static pictures. We will illustrate the first of these monocular-but-non-pictorial cues with the situation diagrammed in Image 1. You are on a train (top) traveling through the countryside, looking out the window (bottom) at a flower, a cow, and a tree. A pictorial depth cue (which one?relative height) already tells us that the flower is closest, the cow is in the middle, and the tree is farthest away. But when we set the train in motion (Image 2), the motion parallax depth cue kicks into effect, and the depth relations jump out in a much more compelling way.
Motion parallax is based on the idea that objects that are closer to you move more quickly across your field of view than objects that are farther away. When your head moves, every object in the scene you’re looking at shifts position on your retina. The fact that objects that are closer to you shift position more than objects that are farther away is just a consequence of projective geometry. And once again, we find that your brain knows more about geometry than you probably realized, because it instantly compares the relative magnitudes of these position shifts to order the objects in depth.
You can also see motion parallax in action with the following demonstration: Hold your right index finger up about a foot in front of your face, then hold your left index finger up at arm’s length. Now close one eye and move your head back and forth from right to left (you have to actually move your head, not just shift your gaze). You will see your right finger move much farther across your field of vision than your left finger, and the depth relation between the two fingers will pop right out.
Our final monocular depth cue, which, like motion parallax, is only available in the real world (not in pictures) is a byproduct of the process of focusing our eyes. When you are focused on a distant point (Image 1), your eyes are pointed at a certain angle toward each other and your lens is relatively thin.
As you shift your gaze to a nearer spot (Image 2), your eyes turn inward and your lens gets fatter. The eye-turning is called convergence and the lens fattening is called accommodation. If you reshift your gaze to the more distant spot (Image 3), your eyes diverge and your lens gets thinner again.
Thus, assuming your brain has access to the state of the muscles controlling your lens shape and gaze direction, the extent to which your eyes are converged and your lens is accommodated should provide cues to the distance of the object you are focused on. If your lens is fat, you must be looking at something near; if your lens is thin, the object must be farther away.
Note that accommodation and convergence almost always occur in concert. This is why they are listed as a single depth cue.
Your instructor may assign a quiz for this Activity. Activity quizzes are available here.
One way that we perceive depth in the world around us is through the use of what are known as monocular cues. These are clues that can be used for depth perception that involves using only one eye. If you try closing one eye, it might be more difficult to judge depth, but you're still able to detect how near or far objects are in relation to your position.
Depth perception allows us to perceive the world around us in three dimensions and to gauge the distance of objects from ourselves and from other objects.
You can contrast monocular cues with binocular cues, which as you might expect, are those that require the use of both eyes.
These are some of the common monocular cues that we use to help perceive depth:
The relative size of an object serves as an important monocular cue for depth perception. It works like this: If two objects are roughly the same size, the object that looks the largest will be judged as being the closest to the observer. This applies to both three-dimensional scenes as well as two-dimensional images. Two objects on a piece of paper are the same distance away from, yet size difference can make the larger object appear closer and the smaller object appears farther away.
Absolute Size and Familiar Size
Absolute size, or the actual size of an object, also contributes to the perception of depth. Smaller objects, even if we don't know exactly how big they are, will look farther away than a large object placed in the same spot.
Our perceptions of size can also be influenced by our familiarity with these objects. While driving, your familiarity with the typical size of a car helps you determine how close or far away other vehicles on the road are from your location.
An object position in relation to the horizon can also serve as a type of monocular cue.
Objects located closer to the horizon tend to be perceived as farther away, while those that are farther from the horizon are usually seen as being closer.
Another essential monocular cue is the use of texture to gauge depth and distance. When you're looking at an object that extends into the distance, such as a grassy field, the texture becomes less and less apparent the farther it goes into the distance. As you look out over a scene, the objects in the foreground have a much more apparent texture. The asphalt of the road looks rough and bumpy. The vegetation in the field looks distinctive, and you can easily distinguish one plant from another.
As the scene recedes into the distance, these texture cues become less and less apparent. You cannot detect every single tree on the mountain in the distance. Instead, the vegetation covering the mountains simply looks like an indistinct patch of green color. These texture differences serve as important monocular cues for gauging the depth of objects that are both near and far.
The perception of moving objects can also serve as a monocular cue for depth. As you're moving, objects that are closer seem to zoom by faster than do objects in the distance.
When you're riding in a car, for example, the nearby telephone poles rush by much faster than the trees in the distance. This visual clue allows you to perceive the fast moving objects in the foreground as closer than the slower moving objects off in the distance.
Objects that are farther away seem to be blurred or slightly hazy due to the atmosphere. As you look off into the horizon, closer objects seem more distinct while those in the distance might be obscured by dust, fog, or water vapor. Because objects in the distance tend to appear hazier, this cue tells us that blurry objects tend to be further away.
Parallel lines appear to meet as they travel into the distance. For example, the outer edges of a road seem to grow closer and closer until they appear to meet. The closer together the two lines are, the greater the distance will seem.
Overlap (or Interposition)
When one object overlaps another, the object that is partially obscured is perceived as being farther away. For example, if you see two figures standing in the distance and one figure overlaps and occludes the other one, you will perceive the occluded figure as being behind the non-occluded one. This allows you to judge how objects are placed in relation to one another and contributes to your experience of depth in the world around you.
Shading and Lighting
The way light falls on objects and the amount of shading present can also be an important monocular cue. Objects that are darkened and obscured may appear further off in the distance than those that are brightly lit.
In order to focus on close-up objects, certain muscles in your eye contract, altering the shape of your lens. When looking at objects that are far away, these same muscles relax. This accommodation can serve as a monocular cue, even though we are often unaware of it.
How Monocular Cues Are Used
When perceiving the world around us, many of these monocular cues work together to contribute to our experience of depth. The corner of a building looks larger and more textured, causing it to seem closer. Objects further down the street appear smaller, so we judge them as being farther away. The parallel lines of the highway appear progressively closer as they disappear in the distance, and the mountains in the distance seem fuzzy and indistinct.
All of these monocular cues contribute to our total experience of the scene, our perception of depth and distance, and our interpretation of our position in relation to other objects in the scene.
A Word From Verywell
Monocular cues can play an important role in the detection of depth in the world around us. Unlike binocular cues, which involve the use of both eyes, monocular cues only require the use of one eye and can be presented in two dimensions. Because of this, many of these cues are used in art to create the illusion of depth in a two-dimensional space.
Coon D, Mitterer JO. Introduction to Psychology: Gateways to Mind and Behavior. Belmont, CA: Wadsworth Cengage Learning; 2010.
Goldstein EB. Sensation and Perception. Belmont, CA: Wadsworth Cengage Learning; 2014.