Welcome to Geoff-Hart.com Link to Contact page
Link to French site
Link to Home page Link to Services page Link to the Books page Link to the Articles page Link to Resources page

You are here: Articles --> 2007 --> Some thoughts on visual vocabulary...
Vous êtes ici : Essais --> 2007 --> Some thoughts on visual vocabulary...

Some thoughts on visual vocabulary, grammar, and rhetoric

by Geoff Hart

Previously published as: Hart, G. 2007. Some thoughts on visual vocabulary, grammar, and rhetoric. Intercom May 2007:36–38.

Most STC members are writers, so when communication problems arise, our first thought is to solve them with words. But visual literacy is important too, because sometimes graphics—or perhaps a combination of graphics and words—will be more effective than words alone. Before you can decide whether words, images, or both would be appropriate, you must understand their advantages and disadvantages. In this article, I’ll describe some of what you need to know to make an informed decision. Armed with that knowledge, you'll find it easier to choose between the two media—or to combine them when neither is sufficient on its own. For the sake of simplicity, I'll focus here on static images; moving images may become the subject of a future column.

Any form of communication relies on three tools: a set of well-defined symbols that precisely convey distinct concepts (a vocabulary), rules for how to (and how not to) combine these symbols to construct meaning (a grammar), and strategies for choosing and presenting symbols to communicate the message (a rhetoric). In this sense, words are no different from images: you can't communicate effectively with either if you don't understand how to use all three tools. But it's their differences that are most interesting.


To communicate, we must first possess a vocabulary large enough to cover all the concepts we must communicate, and must have the same understanding of that vocabulary as our audience. Vocabulary is a familiar concept when applied to words, since we communicate primarily using words and since we have standard references—dictionaries and usage guides—that provide a shared basis for using our language's vocabulary. In addition, we have style guides that further specify which words are acceptable in a given context (e.g., computer documentation) and which ones are not.

Unfortunately, few of us have an equally sophisticated visual vocabulary, and there are no broadly acknowledged standards for visual images. Yes, there are the visual equivalent of dictionaries, such as Transport Canada's standard warning symbols required during the transportation of hazardous materials (http://www.tc.gc.ca/tdg/who.htm), but there are a great many such visual dictionaries, few of which agree. This means that in practice, we must learn a new visual dialect for each new situation: for example, the icons used by Microsoft Word differ from those used by WordPerfect and Framemaker. Yet despite this obstacle, visual vocabularies function similarly to textual vocabularies. Even though we have potentially millions of words or images at our disposal, we use only a small subset of those words or images for most technical communication, and we strive to use them consistently to minimize ambiguity.

Similarly, categories help us to manage our textual and visual vocabularies. Just as words can be divided into function-based categories called "parts of speech", such as nouns and adjectives, we can categorize visual images based on their function. For example, the visual equivalent of a noun would be the image of an object, whereas the visual equivalent of an adjective would be the object's size, color, or position. Once we define these categories, we can develop rules on how to use them.


The grammar of any language defines the relationships between the various parts of speech in that language. For example, English generally places adjectives before the nouns they modify (e.g., a "red icon", not an "icon red"), and uses punctuation to group words and communicate how the groups relate to each other. Images also have a fundamental underlying visual grammar based on their shape, size, color, pattern, position, texture, and other characteristics. Some of this grammar is as obvious to us as the relationships between nouns and adjectives; for example, we require no formal instruction to understand the concept of "larger than". Some aspects of this grammar are obscure; for example, we must learn that green means go and red means stop rather than vice versa, and once we learn that English text flows from left to right across the page, we must unlearn that habit to read right-to-left languages such as Hebrew.

As these examples illustrate, the problem with visual grammar is not that it doesn't exist, but rather that there are many different visual languages, and few of us have studied each one's visual grammar. Spend an hour in an art museum with a guide trained in the different ways that visual images are constructed and you'll find the experience both literally and metaphorically eye-opening. Even reading the comics in the daily newspaper requires a certain training in the visual conventions of this genre. For example, we must learn that a series of horizontal lines trailing behind an image indicates motion (i.e., they function as a verb), and that more or longer lines mean faster motion (i.e., they function as an adverb). Figure 1 shows some grammatical functions of images.

Squares with different colors and different perceived speeds

Figure 1. An example of nouns, verbs, adjectives and adverbs: a large black square moving more rapidly to the right than a small white square.

Because we cannot assume that our audiences have learned a given form of visual grammar, we face the challenge of identifying and using visual conventions that are so ubiquitous anyone should be able to understand them. Many conventions are built on this knowledge. For example, viewers from any culture can distinguish differences in the perceived size of objects, and because size differences clearly represent the concept of "larger than", viewers can learn to interpret bar graphs (which depend on differences in bar size to communicate) with little formal training. Because lines direct the viewer's eye in a direction and define borders between objects, most viewers quickly learn that arrows indicate a direction of motion or that they point at something important, and that circles and squares can encompass an area to group things or focus attention. However, colors lack any inherent order—scientists use the spectrum of visual light (from red at one end to violet at the other) to define this order, whereas graphic artists might relate to colors based on complements. Shades of a color can potentially communicate order more clearly, since there is a direct relationship between the intensity of a shade and the concept of an "amount": if white represents the complete absence of black, then solid black represents the maximum amount of that color, and intermediate values (shades of grey) represent intermediate amounts of black. With a little thought, you can learn to see the shallow areas around the edges of Figure 2, and the deep hole towards the right side.

Using shades of grey to represent different depths

Figure 2. An example of a direct relationship between color quantity and actual quantity.


Classical rhetoric focused (broadly speaking) on the art of persuasion, whereas the modern definition of rhetoric has expanded to encompass writing intended to accomplish any given goal, including persuasion. Writers have many rhetorical tools available, including voice (e.g., active vs. passive), metaphor (comparison by means of analogy), example, and appeals to logic or emotion. Our goals in using these tools may be instruction, persuasion, or entertainment, among others. Similar, though less familiar, rhetorical tools and goals exist for visual images. For example, images may be:

Textual versus visual communication

Words are the best choice to communicate abstract concepts that nobody can see, but because these concepts are not part of anyone's direct experience, they must be learned. Conversely, images outperform words for exact depictions of visual reality; a color swatch communicates far more clearly than the words cerise or cerulean. Where a standardized vocabulary exists, words communicate consistently because everyone agrees on the denotation (the dictionary meaning). However, a word's connotation can differ greatly among audiences; the word "love" may be interpreted as a sign of humanity, or of weakness. Similarly, as my January 2007 article, Combining words plus pictures, showed, even relatively abstract images can communicate precisely when we eliminate enough detail that all viewers see the few details that remain, yet still retain considerable subjectivity: should we focus on a vase's elegant shape and color, or on its meaning (a container for liquids and flowers)?

These examples illustrate the kinds of ambiguities that suggest when to use words, images, or both. Where words convey our meaning unequivocally, we can use words alone; where words cannot do so, we may need to rely on graphics. Where neither is sufficient, we combine them: a color image of a graceful chartreuse vase can be made specific by adding the words "Note: The diameter must be at least half the height to ensure stability" to focus readers on the important factor (relative dimensions) rather than on other possibilities (the graceful shape or unusual color).


Bertin, J. 1983. Semiology of graphics: diagrams, networks, maps. (Translated by William Berg.) University of Wisconsin Press, Madison, Wisc. 415 p.

McCloud, S. 1993. Understanding comics: the invisible art. Kitchen Sink Press, Northampton, Mass. 216 p.

©2004–2014 Geoffrey Hart. All rights reserved