I just finished Edward Tufte’s Visual Display of Quantitative Information (2nd ed), a classic modern text (or so I hear) on how to design data graphics, which “visually display measured quantities by means of the combined use of points, lines, a coordinate system, numbers, symbols, words, shading, and color”.
This is a great book that I’m sorry I didn’t read sooner. Some key points I took away:
- Tables are better suited for displaying data-sets with 20 objects or less, while visual graphics are better suited for summarizing a lot of information.
- Color often muddles rather than clarifies data graphics, as the human eye does not easily give visual ordering to colors. Gray-scale shading, however, does convey a natural visual hierarchy, and so better represents varying quantities than color does.
- If color is used, avoid red/green contrasts in consideration of color-blind viewers (5-10% of population). I am VERY guilty of this. Green/yellow/red scales are often my default. Contrasts with blue are a safer bet: color-blind people can generally differentiate blue from all other colors.
- In regards to typography: the more that letters are differentiated, the easier the reading. This means that “serif” rather than “sans serif” fonts are preferable, and all-caps writing writing should be avoided (the more equal height/width/volume of capital letters makes for more difficult reading).
- Do not vary the design of the graphic (ie the scale, symbology, etc), because this distorts how the viewer perceives variation in the data. Variation in the data is after all what the graphic is there to illustrate – and truthfully so.
- The number of dimensions in the graphic should not exceed the number of dimensions in the data. So for example, don’t use area (2D, e.g through differently-sized circles) to represent a 1-D measure, such as the value of a dollar over time.
- Less is more, or, maximize the Data-Ink Ratio. This is the ratio of ink used to represent the data (essential to the graphic) and total ink used to print the graphic (includes grid, frame, axis/scale bar ticks, etc)
- Maximize the “data density” of the graphic – or the number of entries in the data matrix within the area of the graphic.
- “Small multiples”, or a series/block of small graphics indexed by changes in a particular variable, are an effective graphic format because they are inherently comparative and tend to have high data densities.
- Pie charts should never be used, because of their low data-density, and because the human eye is not adept at detecting differences in angles.
Maps in the History of Information Visualization:
One of the most interesting parts of the book outlines the history of data graphics . If you have any doubts that geography is awesome and always has been, consider this. Before charts, before graphs and plots, there were MAPS!
Geographic maps were the first form of data graphics, at least as far as historians can tell. While the first maps found on clay tablets date prior to 3500 BC, thousands of years passed before precise cartographic maps with full grids were created (1100’s AD in China, and 1550 AD in Western civilization), and it wasn’t until 1686 until cartography and statistics merged to create the first thematic map (which Tufte refers to as “data map”, but this has come to mean something else in the IT world, so I avoid the term for clarity’s sake). This early thematic map, courtesy of Edmond Halley, shows the location of trade winds and monsoons. Geographic analysis really blossomed after John Snow’s famous, even mythologized 1854 map of cholera deaths and water pumps in London – some say this kick-started the fields of health geography and spatial epidemiology. Charles Joseph Minard’s multivariate map (published 1869, shown below) of Napoleon’s 1812 Russian campaign is another famous merger of cartography and data visualization that Tufte says “may well be the best statistical graph ever drawn.”