the structure of vision (i)

 
This page and the next explore the visual processes that transform our binocular, retinal images into a flowing, rich, three dimensional experience of the world.

We won't completely ignore the topic of color, because the perception form and space affects our color perceptions. However, our main interest is to clarify or even discover artistic strategies or resources in the way vision works.

I explain the basic visual processes and the evidence for their importance to vision, then review the artistic implications of these processes for artistic representations. I especially try to clarify the relationships among the many aspects of color vision. By alternating between science and art, I hope to keep the discussion focused and useful.

This page and the next are technical, but include much information relevant to the principles of composition and design.

 
the weave of vision
 
A common idea is that vision constructs and interprets what we see. This is usually said to assert that the influence of experience and culture is closely intertwined with our biological or cognitive capabilities. But it more accurately means that one visual capability builds on or flows out of another, regardless of how much that capability depends on experience, and in ways that are most often unconscious and far below the complexity of attitudes or beliefs.

How can we describe this process? The metaphor that seems to apply everywhere is the weaving of contrasting sources of information. This process begins in the retina, with the network of retinal interconnections among receptor cones and rods, and expands until literally most of the cortex contributes to visual experience. At each step in this process, there is a competitive separation or combination of related sensory signals: differences are accentuated and similarities smoothed together.  

This weaving occurs across a sequence of visual tasks, each task building on and partly influencing the outcomes of preceding tasks. A highly simplified description of this sequence is shown below.

 

color
vision

the weave of vision

center/surround
receptor fields

edge & region detection

texture & surface analysis


 
the major stages of visual processing
after Stephen Palmer (1999)

 
In the first stage, the eye forms the basic retinal image from the response of the millions of separate cones or rods to the optical image projected onto the back inner surface of the eye, as discussed in the page on light and the eye. Each of these cones independently codes a very tiny area of the total optical image, like a single pixel in a computer image. At the same time, these autonomous cones are linked together through a network of other retinal cells, so that the output from neighboring cells can be combined into center/surround receptive fields. This retinal network creates the opponent coding of color, sharpens edge contrast, and performs a basic frequency analysis of the retinal image.  

The next step involves the construction of a primal sketch. The retinal image is "filtered" to eliminate noise and increase contrast through the complex cells in the visual cortex, which also perform edge & region detection that simplifies the two dimensional image into a "line drawing" of edges enclosing continuous areas. Simple principles of two dimensional geometry are used to connect broken or interrupted edges across the entire image. This primal sketch is created using the retinal location as the basic framework, and by aligning and merging the retinal images from both eyes, which results in a rendering of the visual image that we can think of as a moderately detailed line drawing.  

The third stage shown in the diagram elaborates the primal sketch into textures and surfaces. The resulting surface layout identifies separate surfaces through movement, luminosity, reflectivity, texture and color, approximately determines the slant or curvature of these surfaces through perspective gradients and illuminant shading, and uses a variety of depth or distance cues (binocular disparity, motion parallax, retinal image location, occlusion, visual fusion, etc.) to locate the surfaces as near or far from the viewer in three dimensional space. This analysis also guides and clarifies the edge and region detection that preceded it. The result is a 2.5 dimensional sketch, with shows the tilt, slant and distance of visible surfaces in relation to the viewer's location and orientation in space. We can think of this rendering as a bas relief image of the world.  

The next, object representation stage completes the 2.5 dimensional sketch in three dimensions by identifying discrete objects or continuous surfaces in space. This depth and volume analysis uses basic knowledge of three dimensional geometry and object grouping heuristics (such as common movement, connectedness or visual similarity) to join together surfaces interrupted or hidden from view behind closer objects; infer the unseen, back sides of objects; and conceptualize three dimensional forms having volume and shape. Again, this object representation guides the surface layout analyses that come before it, is highly dependent on our experience with things, and begins to transform the visual image into intellectual or conceptual content.  

In the penultimate or categorical representation, objects are recognized or categorized in terms of their physical properties or the functional possibilities (or affordances) that were encountered in similar objects in past experience. That is, the purely surface and volumetric characteristics of objects are merged with abstract ideas of color, form, hardness, weight, stability, temperature, function and origin derived from past experience. At this point even unseen features (such as teeth inside a dog's mouth), anticipated outcomes (the fear of falling when looking down from a high balcony) or even false perceptions (an open doorway that is actually a closed glass sliding door) are "seen" as real.

Only after all these tasks are completed does the weave of vision enter consciousness and become visual experience. At this level the physical scene is already joined to our awareness of our own body; we have emotional reactions to what we see, assign words or labels to things, direct our gaze to explore specific aspects of the scene, and use what we see to guide our behavior.
 

Two cautions. First, although described as separate steps in visual perception, these various tasks are highly interdependent. Second, the separation into tasks and their description depends heavily For example, color perception appears to require contributions from the retina, the lateral geniculate nucleus, the primary visual cortex, and several secondary regions of the visual and language cortex. The essential property of vision is that it forms a continuous weave: no part can be separated from any other without damaging or destroying the whole.

The problem with much of the art design literature is that it assumes our visual capabilities work in a fixed or mechanical way, and therefore can be translated into simple design principles. In fact, our visual capabilities mutually influence or compete with one another, yielding or asserting their contribution to our visual experience depending on the relative importance of the others. These dynamic design principles are much less well understood, but they are probably the most important considerations in a visual design. Images determine the balance of visual interpretation by how much perceptual material they offer to arouse these interacting visual capabilities.

Our understanding of human vision relies on four types of evidence: (1) detailed anatomical knowledge of how nerves and regions of the human eye and brain interconnect; (2) measurements of the response of single cells to simple visual stimuli in the visual pathways of small mammals (cats and monkeys); (3) extensive experimental study of human visual performance, including susceptibility to visual illusions; and (4) practical experience in building electronic sensors and programming computer neural networks to perform equivalent visual analysis tasks. The resulting picture of human vision is plausible, though it is also largely speculative — habitual scientific explanations applied to ambiguous facts.

An excellent overview of the most recent research and concepts in vision science is Stephen Palmer's Vision Science: Photons to Phenomenology (MIT Press, 1999), in text hardback. A much shorter, somewhat older but thoughtful survey is Richard Gregory's Eye and Brain: The Psychology of Seeing (Princeton University Press, 1997), currently in its 5th paperback edition.

 
center/surround receptive fields
 
Center/surround organization of visual fields, opponent organization.

Competition is an essential component of visual processing: it sharpens, or provides contrast enhancement, in a neural representation, adjusts the network's total activity to a constant overall level of stimulation, and prevents cells from becoming saturated in response to variable inputs.

In pursuit of competition, the cones and rods of the retina are systematically interconnected as groups that can signal when some but not all of them are stimulated by light — that is, when a contrasty edge falls across them. These clusters are themselves clustered, and those clusters clustered, to create a spatial frequency interpretation of the visual image.
 

Center/Surround Design. Schematically, the process works as follows. A spatial array or row of receptor cells responds to a sharply defined light stimulus (diagram at right, top). This produces both direct (stimulating) outputs to the nearest bipolar cell, and indirect (inhibiting) outputs to nearby horizontal cells (middle). As these separate outputs are averaged in deeper layers of vision, edges become accented by increasing the light and dark contrast at the edge location (bottom). These edge contrasts are very easy to induce — on lightness, chroma or hue — in what are called Mach bands.

Through the appropriate neural network these contrasts can occur across different retinal distances (visual angles), different orientations or shapes, even different patterns of movement, which basically transforms the retinal image into a spatial frequency analysis (diagram at right), which can block out large areas of value or provide a detailed outline of form (illustration at right). The very broad utility of our pattern and spatial perception capabilities is suggested by the fact that they underlie many of our most civilized capabilities, from reading text to catching a fly ball. They are also intimately connected to, and strongly influence, our perception of color.

 

responses of an "off center/on surround" receptive field to different patterns of light and dark

 
These response fields are in turn sensitive to changes in the light stimulus, producing a visual fusion across time, or motion blurring in quickly moving objects, regardless of their size. This boundary is defined by the number of responses the cell can omit across time, how quickly the cell changes its responses over time, and the synchronization of the changes.
 

design of an "on center/off
surround" retinal receptive field

Center/Surround Contrasts. If the retinal receptors are joined into these larger center/surround fields, how do individual cones convey color or luminosity information? The answer seems to be that individual cones contribute to multiple fields, and the fields themselves are combined to produce the opponent contrasts.

We've seen that there are three basic opponent contrasts — w/k, y/b and r/g — but these can be represented in two ways, resulting in six types of center/surround fields:

w/k contrast:+W-K (white)+K-W (black)
r/g contrast:+R-G (red)+G-R (green)
y/b contrast:+Y-B (yellow)+B-Y (blue)

As there appears to be about 1.25 million ganglion cells in the human retina, but there are about 6 million cones, at least 5 cones, on average, must feed into each ganglion cell, though this ratio is probably much lower in the fovea and much higher in the periphery. In fact, as many as 25 different types of ganglion cell exist in human retinas, with different cell body sizes and different connection patterns to other cells. (The dendritic connections are smallest in the fovea and may be as much as 10 times larger toward the periphery of the retina.).

In the human retina, the three commonest ganglion cell types are the large parasol, small parasol and the midget ganglion cells, which connect to separate layers of the lateral geniculate nucleus (LGN) between the eyes and the brain. These connections have not been measured in humans, but in monkeys with demonstrably equivalent visual responses, by measuring the rate of synaptic impulses (cell firing rate) in single LGN or visual cortex cells receiving information from a single center/surround field.

 

response curves of six center/surround LGN color cells
changes in firing rate of center/surround cell with changes in monochromatic wavelength shining on its receptive field in the retina; horizontal lines show baseline firing rates (primate visual cortex, from De Valois & De Valois, 1997)

 
If there is a fundamental harmonic or mathematical structure to color vision, I think it would closely resemble these curves. Because the retinal cells are always producing a baserate neural signal, each of the contrast fields has a resting or adapted value shown by the horizontal lines; these appear to be different for each type of contrast, not unlike the different fundamental frequencies of musical tones.

The colored curves show a number of peaks and crossings that do not correspond to the cone sensitivity peaks or the hue cancellation curves, but do identify significant color landmarks of their own. The +Y-B peak, at the crossing of the +R-G and +G-R curves, is very close to the warmest red orange hue, while the complementary +G-R peak above the crossing of the +B-Y and +Y-B curves is close to the coolest blue green hue. Unfortunately, we lack a clear picture of these contrasts in human subjects, and a clear understanding of how they are combined and transformed to produce conscious color experience.

To make things even more complicated, response fields react somewhat differently to patterns of color differences as opposed to patterns of luminance or brightness differences.

 

center/surround visual fields respond differently to color or luminance

 
As this diagram shows, a typical +R-G center/surround can code either for color (at left) or for luminosity (at right), depending on the specific size, color and movement of the visual stimulus. In addition, the color response has a lower resolution than the luminosity contrast, and somewhat lags in time the luminosity response. Color is a fuzzier and slightly tardy arrival in visual experience, while luminosity contrast is prompt and crisp.  

Although the center/surround organization of the retina has not been demonstrated by explicitly tracing retinal nerve connections, it does explain a large range of visual phenomena, including visual illusions that are hard to explain in any other way. One of the most popular is the Hermann grid.

 

a hermann grid and a scintillating grid
the lower diagram shows that an "on center/off surround" receptor field receives greater inhibition at line intersections, and therefore reports a darker color

 
Direct your gaze at any intersection of horizontal and vertical lines in the grid on the left. You may notice the appearance of diffuse, faint dots in the intersections around it — but if you look directly at them, they disappear, only to reappear that the intersection you were just looking at.

The schematic underneath shows the center/surround explanation for this effect. The dots disappear at the center of attention because the center/surround fields there are extremely small, and so report all the edges accurately. However the peripheral intersections fall on peripheral center/surround fields which are on average of a larger diameter. Some of these have a diameter roughly twice the thickness of the lines in the grid, which causes an "on center/off surround" field to be more inhibited at the intersections than elsewhere: this inhibition produces the appearance of a darker, fuzzy dot.

You will notice that the red colored half of the grid produces pinkish colored dots, although these may appear slightly fainter than the dots in the black and white half.

A striking variant is the scintillating grid (above, at right), in which the illusory darkening of the gray line intersections competes with white dots placed at the same locations. Fixating directly on any single dot shows it to be white, but dots in the parafoveal field appear to be filled with gray dots, while dots in the peripheral field seem to be replaced by black dots. This makes the illusory nature of the dots even more obvious: some say that this grid is actually the Bush administration's secret map showing the location of Saddam's weapons of mass destruction!  

Spatial Frequency Analysis. So far the retinal center/response fields have been described in terms of simple, center vs. surround contrasts of luminosity or color. By the time these responses reach the visual cortex, however, the fundamental emphasis has been radically altered. This brings us to the next major stage of vision, the image frequency analysis. Let's look first at what a frequency analysis does, then look at how this is probably performed by our visual system.

We start by inspecting the perception of spatial frequencies in our conscious visual experience. This is commonly done with a contrast sensitivity plane, similar to the one pictured below. (Unfortunately, a computer monitor is limited by its pixel spacing and contrast settings; the best contrast stimuli are printed on very fine grain, high contrast photographic papers.)

 

a contrast sensitivity stimulus
the number of vertical stripes in a constant horizontal visual angle (spatial frequency) increases left to right, and lightness contrast (luminance amplitude) increases from top to bottom
 
This contrast sensitivity stimulus shows alternating black and white sinusoidal waves that change visually in two ways: the spacing between the waves decreases from left to right, and the contrast or lightness difference between the peaks and troughs of the waves decreases from bottom to top. Eventually the contrast between light and dark becomes so subtle, or the spacing between the bands so narrow or so wide, that the stripes seem to disappear altogether.
 

This is most evident at the top left and top right corners of the stimulus. The boundary where the waves disappear shows (1) the minimum contrast sensitivity (minimum amount of contrast) necessary to perceive alternating bands at each spatial frequency. This frequency/contrast relationship becomes more obvious if you view this image from a distance of 1, 3, 6 and 12 feet (your room or cubicle allowing): the stripes at far right will disappear and the stripes at center will become shorter, because the visual frequency of the bands increases with distance.

The boundary where the bands dissolve into continuous gray defines a characteristic contrast sensitivity function (right): contrast sensitivity is greatest (extends to the most subtle lightness differences) at around 4 to 5 cycles (black stripes) per visual degree. (A visual degree is roughly the apparent size of your thumbnail viewed at arm's length.) At higher spatial frequencies, greater contrast is necessary to see the bands until, at around 60 cycles per degree, even pure black and white stripes become invisible: visual fusion occurs for all textures.

The contrast sensitivity function is actually made up of separate, narrow frequency functions corresponding to complex cell wavelet fields of different spatial frequency. Each wavelet contributes a narrow sensitivity to a specific spatial frequency, but the combined fields overlap to produce the continuous contrast sensitivity function. (Note that the contrast stimulus is designed with vertical stripes; different groups of complex cells provide contrast sensitivity tuned to other orientations or direction of movement.) And, as we've seen demonstrated with the scintillating grid, the highest visual resolutions (the smallest response fields) are only available in fovea, while the coarsest resolutions span a broad area of the retina and are limited to peripheral vision.
 

the contrast sensitivity function
from Palmer (1999)

Two images of Groucho Marx, at low and high spatial frequencies (right), show how the highest frequencies are important to identify detailed textures, edges and rapid changes of contrast, while low frequencies make broad contrasts between areas of different lightness or color.

This is a radical difference between human (mammalian) vision and, say, a digital camera. The camera records a scene as a constant matrix of equal sized pixels, equally spaced across the whole image: resolution is consistent across the whole image. The eye records a scene as many layers of overlapping visual frequencies, with the highest frequencies concentrated at the center of view (the foveal field): resolution depends on what we look at.  

Wavelets, Orientation & Motion. The visual cells of the cortex, first investigated by Hubel and Weisel in 1964, respond primarily to the spacing, orientation and movement of edges or isolated dots or lines.
 

the face of groucho marx as
low and high spatial frequencies
from Palmer (1999)

The diagram at right suggests how this is done. Separate but overlapping center/surround fields in the retina produce a series of peaks and valleys along the axis of their grouping, producing a wave function in visual sensitivity. These fields map directly to relay cells in the lateral geniculate nucleus (LGN), which in turn feed to a single complex cell in the visual cortex. However, the inputs from the LGN are themselves arranged in a center/surround pattern, which unites the separate retinal fields into a single waveform receptive field or Gabor wavelet. This is shown in cross section as a tapering wave function, and in two dimensions as a retinal pattern of excitation and inhibition (light and dark).

What good is this? Well, this type of field would be maximally stimulated or maximally inhibited by a pattern of light and dark lines that exactly matches the spacing of excitation and inhibition. Other spacings would produce a smaller response, while very discrepant spacings would produce no response at all.

Also, spaced lines will stimulate this wavelet only if they are close to parallel to the waveform orientation — in this case, the lines would have to be vertical. Horizontal lines, whatever their spacing, would cross all the fields equally, and produce no response.

Finally, the firing of LGN cells comes in spikes rather than the continuous signal of the cones, and these spikes importantly define a pattern across time. If the complex cell is tuned to a particular temporal pattern, then the movement of a single line along the waveform axis would produce a series of spikes to the complex cell depending on the speed of the line. It also appears that the complex cells are tuned to different frequencies in time, which have been studied using flicker stimuli of flashing, stationary lights.

These wavelet structures are actually embedded in the large structure of the cortex itself. Each region of the retina feeds into a large population of complex cells, organized into compact columns about 2mm wide and 10mm deep. Within each column, complex cells are organized around the circumference like the points on a compass, to respond to different spatial orientations or directions of movement; from top to bottom within the column, the cells are tuned respond to different visual spacings or visual frequencies. Columns from the two eyes are layered side by side in matching areas of the binocular field, and the columns are arranged across the cortex according to their overall location in the retina.

Neural organization that precise and extensive implies processing tasks of incredible importance to visual experience.

It is not clear how color is involved in these contrasts, but one account is that the entire visual scene is first "outlined" using luminosity information alone, then "painted" with the color information brought through separate pathways.

 

the gradation of visual information

 
I described earlier the importance of the primate visual design for life among the trees, and in that context the emphasis on motion and spatial frequency information makes sense in the resolution of distance and body movement information. The superficial resemblance between the contrast sensitivity diagram and the inescapable perspective gradient suggests the usefulness of spatial frequency analysis for distance perception, especially via the systematic changes in spatial frequency that characterize an approaching object. The fact that spatial frequency analysis is so powerful and flexible in the definition of edges and textures indicates it is truly the foundation of visual perception.

 
edge & region detection
 

overlapping retinal/LGN receptive
fields summed in a wavelet filter cell
in the visual cortex

 

a single foveal cone (blue) contributing to different center/surround clusters

Now edges and regions.  

Partitive Mixing. These boundaries on visual frequency mean that human vision divides roughly into three categories: colors, textures and object surfaces. Each is determined by the visual fusion for the available visual size and contrast.

The human contrast sensitivity curve shows that any contrasting elements smaller than about 1 pixel (roughly 1/80th an inch) viewed from about 48 inches will appear to be a single color. However, a variety of effects appear in vision depending on the exact spacing of the elements and their contrasting hues or values, as the next example shows.

 

color shifts caused by differences in spatial frequency

 
At relatively large visual sizes (left), the gray color interacts with the intense orange background according to the principles of simultaneous color contrast: the color shifts toward the complement of bright orange (dull blue) and appears darker and cooler.

At moderately small sizes (center), the spreading effect causes the gray to retain its identity, but now the shift is toward (rather than away from) the surrounding color, making the gray appear lighter and warmer.
 

At extremely small sizes (right), as in the tiny dots in one of Seurat's Neo-Impressionist paintings or modern halftone color reproductions, partitive mixing (also called visual fusion) occurs, and we see the additive mixture of the orange and gray light to produce a grayed maroon. Visual fusion was exploited particularly by the Neo-Impressionists to create color mixtures through partitive mixing or optical color mixing, though exmaples of the technique can be readily found in the paintings of Delacroix, Watteau and Rubens. More dramatic effects due to spatial frequency have been exploited by Op Art painters such as Bridget Riley.

It's difficult to demonstrate here, but temporal frequency also has an effect on color, though the principal demonstration is commonly that a completely black and white pattern, if rotated rapidly under an intense light, can produce phantom reds, greens and blues.

Anyone who has put together jigsaw puzzles knows the strategy of finding and fitting together the edge pieces first. The edge shapes are easy to recognize, their connections are limited to two sides only, and they determine the location of shapes inside the puzzle. Our visual system seems to work along similar lines: identify the obvious edges, then use these to define shapes or regions in the image.
 

Edge Contrast Effects. Various mechanisms are active to sharpen edges by increase the color difference between the two areas on either side.

 

the relationship between area and texture

 
And so.  

There are various ways to characterize the complexity of edges and forms.

The first method is by the number of edge crossings by a straight line between two randomly or regularly selected points. The simplest area colors have few or no edge crossings between two points inside the form, and only one edge crossing between a point inside and a point outside the form. Most texture colors have a large number of edge crossings, as shown in the examples.

 

high crossing and low crossing areas and borders

 
A second method is by grid scaling. A square grid of fixed spacing is placed over the surface of a pattern or texture, and the number of squares that contain an element are counted. The spacing is changed, the grid is placed at random on the texture, and the process is repeated. After several counts the number of texture containing squares N is divided by the width of the squares, W, to yield the ratio, W/N. In natural textures and forms, such as gravel, coastlines and lightning, this ratio is about 1.3.  

First of all, contrast is most enhanced along an edge, especially between areas that differ slightly in lightness, chroma or hue. This makes the edge easier to see. The first example involves Mach bands (first described by Ernst Mach in 1866) appearing clearly in areas that increase slightly in lightness across the page.

 

center/surround response to a Mach band

 
Within each band, the edge against a darker band (on the left) appears lighter, and the edge against a lighter band (on the right) appears darker. The lightness shift increases as we near the edge on either side, which gives the appearance of scalloped grooves, such as the fluting on an Ionian column.
 

chevreul illusion for small changes in lightness (top), chroma (middle) or hue (bottom)

 
The second example shows a chroma transition for a constant hue and lightness of red, and the third a hue transition from yellow green to red at a constant lightness saturation of 70%, (which makes the yellow appear dull green). There is a weaker Mach effect for chroma and a very weak effect for hue — as we would expect, because hue is a weaker contrast stimulus than lightness or chroma. Within each chroma band the edge against a more intense color (on the right) shifts toward gray (less intense) and the edge against a less intense color appears more intense; in the hue demonstration the edge against a redder hue (on the right) shifts toward green (cooler) and the edge against a cooler color shifts toward red or yellow (warmer).

Notice that the Mach effect is more apparent in some of the chroma or hue bands than in others, while the effect across value gradations appears equal. Because value structure is essential to our perception of three dimensional form, the eye has adapted to discriminate equally even slight differences in edge lightness across a wide range of tonal values.

This induced shift in the quality of color along an edge was also described by Michel-Eugène Chevreul, who influenced a few contemporary artists to use it as an artificial accenting device — for example, as Georges Seurat did extensively in his pointillist Un Dimanche après-midi à Ile de Grande Jatte (1884). But the trick is in fact old, and easily found in works by Rembrandt or Tintoretto.  

Area Filling. Perhaps the simplest and most pervasive of these illusions, which is related to the "filling in" responsible for the spreading effect, is the Craik-O'Brian effect, crudely illustrated in the figure below.

 

the craik-o'brian (cornsweet) effect

 
The subtle gradation in value necessary to make the illusion more dramatic can't be fully achieved in a browser digital image, but the gist is easy to explain. The mind relies on the difference in lightness at the edges of forms or surfaces in order to determine the visual appearance of the areas bounded by (inside or outside) the edges. In this case, the center of the dark circle, and the outside edge of the surrounding ring, are exactly the same lightness, but they appear to be very different values because the mind adjusts them to correspond to the strong light/dark contrast at their common edge.

Borders improve y/b discrimination, degrade w/k discrimination, and have no effect on r/g discrimination.  

Pattern or Border Effects. Many complementary color effects were first systematically described in On the Law of Simultaneous Contrast of Colors (1839) by Michel-Eugène Chevreul, for many years the chief scientist at the Gobelins (Paris) textile and weaving factory. The next illustration goes back to the earliest modern color studies, which studied of color effects induced by changing the colored threads within the same textile pattern.

 

spreading effects in complex patterns

 
These color shifts are called the spreading effect, which is produced by changes in a single color within a larger pattern of interlocking colors. In the top pattern, the background reddish brown and the blue scrolling pattern seem to change hue and value when a black border or a white border is added between them. In the bottom pattern, the blue seems lighter when combined with white than with dark brown, and darkest when the brown is the background rather than the tracery.

These illusions seems to contradict the "center of gravity" contrast principle, but in fact they define its limits. Seen in the metaphor of a computer, the mind must perform a series of complex analysis tasks in order to understand a visual image. One of these is the identification of edges, which we've seen leads to edge contrast effects such as Mach bands.

//

Roger Hanlon, a marine biologist at Woods Hole, Massachusetts who studies the camouflage skills of the cuttlefish and octopus, has found natural camouflage systems correspond to three disguise templates: uniform color, random patterns within a single color variation, and disruptive patterning that disguises the body outlines. These correspond to color, texture and outline as the three most important perceptual mechanisms for object perception. The first two mechanisms degrade the color and texture cues for outline, and the last degrades the cues for object recognition from perceptible outline.

//

Equally important is the identification of surfaces, because these define solid planes and three dimensional forms. Surface recognition occurs in part through textures, patterns or gradients appearing on the surfaces. This is how the "spreading effect" seems to arise, because it is strongest when the eye interprets a pattern as lying on a single surface. The surface is made more consistent or unified by bringing its various colors toward the average color.

 
texture & surface analysis
 

mach bands
a gradation from light to dark
appears bordered by light
and dark bands, which become
more prominent as the gradation
becomes steeper

The next step in visual analysis is believed to involve resolution of the edge and region analysis into depth and volume. An important bridge in this transformation is resolution of surface texture.

Surface texture is important for three reasons: it provides important visual clues to the slope and recession in space of surfaces; it provides (via visual fusion) an estimate of the distance in space from the viewer; and it helps to resolve motion when gross features and edges present ambiguous information.

Studies of movement of plaid patterns shows that textures are separated by visual frequency, and only merge if the frequency is roughly the same in all dimensions.

Where edges are lacking as region boundaries, textures can provide information to separate surfaces.

A simple but intriguing question is, how can we "recognize" or distinguish one kind of texture from another? For example, if you were shown a piece of paper printed all over with a single visual texture that was interrupted by a second, different texture, what visual characteristics would make that second texture easy or difficult to see? Some examples suggest the answers.

 

artificial texture contrasts
top row: difference in values, difference in spatial frequency, difference in orientation; bottom row: similar elements with random rotation, mirror reversal with random rotation, mirror reversal

 
These are some of the many kinds of artificial textures used in perceptual experiments. Most people find the contrasting textures in the top row immediately visible, while those in the bottom row require some scrutiny to identify. What seems to characterize textures that are easy or difficult to identify in this way?

As a rule, the "easy" texture contrasts rely on elements that can be discriminated with "low level" visual capabilities: contrasts in lightness, saturation, hue, spatial frequency, size or length, orientation, thickness, density, grouping and movement. The "difficult" textures are composed of separate elements that have to be discriminated in terms of how they are shaped or combined, and the most difficult are those that have to be discriminated in terms of orientation or mirror reversal.
 

But these units become visible if they related to each other in larger patterns. Quite often a simple linear or spectral frequency pattern is sufficient to do this. The example at right makes it easier to see the rotated elements in terms of the contrasting pattern they form with the parallel lines.

// Lights are rather poor depth cues, and they cannot be used to distinguish distance unless luminance corresponds very closely to a perspective gradient. Thus the droplets in a fountain illuminated by the sun, or fireflies at night, appear as a scattered mass of vague depth; stars in the sky, despite variations in brightness, seem to shine from the interior surface of a dome. City lights receding over flat land, such as Los Angeles viewed from the Hollywood Hills, convey an atmospheric sense of perspective; randomly spaced lights, or lights across a hilly geography, seem vaguely at the same distance. //
 

The Boundary of Texture and Form. This brings us to perhaps the most important attribute of texture: its boundary with form. In visual perception any kind of boundary or edge is a form of focus or attention created by what we see clearly and what we don't, and textures help us actually see the boundary between seeing and recognizing or remembering.

 

surfaces dominate lights and shadows
(left) image of corrugated metal projected onto a flat surface; (right) image of corrugated metal projected onto corrugated metal

 
The basic feature of the natural world that is at issue here is that most natural forms have surfaces which are themselves natural forms — such as planes, bumps, ridges, crevises, curves, grooves — and usually this kind of surface texture is composed of miniature versions of itself. Planes are faceted, bumps are rough, ridges are creased, crevises are cracked, curves are variable, grooves are furrowed.

The example at right shows how these natural surfaces can be studied with a drawing or image processing program such as Adobe Photoshop. The starting point is an enclosed figure suggestive of the shape of a rock lying on the ground. This form is reduced by 66%, tripled, and arranged in a triangular pattern below; this pattern is reduced by 33% and tripled again, and so on for two more levels. Then all the layers are arranged into a singe image that allows occlusion of overlapping forms. Many other natural forms are possible, depending on the shape of the basic unit, the amount of reduction between layers, and whether occlusion or perspectival changes in viewing angle are inserted.

The result is not simply a random texture of large and small units but an image highly suggestive of a rocky covering on flat earth, in which the units of different sizes play different roles. And this allows us to see more clearly the fundamental boundary between textures and forms.

Simulated patterns show two distinct boundaries between object and texture: in the example at right it is between (1) the forms that we can count at a glance or a few seconds of viewing (the largest two sizes of rocks in the example), and (2) the largest forms that define a texture (the smallest two sizes of rocks in the example). We can see at a glance that there are about a half dozen large rocks in the image, and below that a pebbly texture of various sized units; we get an impression of the largest forms in this texture and the amount of size reduction or texture gradient into the smaller sizes.

This boundary would show itself in a simple recognition test over a time delay of several hours, in which we are shown two images constructed in different ways from the same image elements. I believe we would find the same discrimination dimensions in these more complex images that apply to the textures described above. That is, if the smallest two texture elements were changed in lightness, saturation, hue, spatial frequency, size or length, orientation, thickness, density, grouping or movement, the image would be recognized as different. At the same time, the rectangle formed of the back rocks would perhaps not be noticed if it were rotated 90° in perspective, but omitting a rock would be.  

Stabilized Images. A final condition in the definition of edges and regions is the situation where edges and regions completely disappear.

The Ganzfield effect (German for "whole field") arises when the eye is presented with a continuous, monochromatic visual field. After a minute or two, the perception of edges and colors disappears completely. (In snowstorms or heavy fog the condition is called snowblindness.)

To stimulate a ganzfield,

This is a final indication of the importance of contrast and change in color vision.

The Craik-O'Brian effect reveals how the mind interprets the contrast at the edge to mean the two circular areas are different, then "fills in" each of the areas as suitably contrasting values. This "interpret the edge, then color the area" perceptual strategy is fundamental to the way we perceive color and light around three dimensional forms.

N E X T :   the structure of vision (ii)

 

Last revised 03.28.2004 • © 2004 Bruce MacEvoy

a texture example created by
layering repeated forms of
different sizes