Data Visualization: attribute types and their graphical elements

June 9, 2021

There are two main types of data: categorical and ordered. From all the graphical elements that can be used to present data visually, some are naturally best fitted for categorical data and some others for ordinal data.

By identifying your data types and using the right graphical elements to display them, you can communicate information in a clear way.

Data types are not mutually exclusive, nor restrict you from using a numeric value as a nominal quantity, but they allow you to categorize the data and better understand what visual representation of the data set will be useful.

Categorical Data

Data is categorical when you can define discrete categories or "buckets" to group items in the data set. Examples are:

Categorical data is also called nominal or qualitative data.

Categories can only distinguish whether two things are the same or different (eg: apples versus oranges) so the only mathematical operation you can perform between two cagtegorical attributes is equality, ie: is A == B?

Categorical data often has a hierarchical structure, for example:

Ordered Data

Ordered data has a natural rank that can be used to compare items in the dataset, it allows you to ask is A < B?

The naming might be a bit confusing, but Ordered data can be further divided into Ordinal or Quantitative.

Ordinal data has an intrinsic order without necessarily being numeric, good examples are sizes of clothing (Small < Medum < Large), mood, socio-economic status, etc.

Quantitative attributes on the other hand, are numeric, therefore in addition to having an order order they also support mathematical operations like difference or ratio.

The graphical elements

A lot of amaizing work has been done figuring out what is the best way to present data with graphics. There are two main components working together to convey information:

Examples of marks: dots, lines and different areas.

There's an inherent relationship between data types and the different channels, some channels are naturally better at conveying nominal data (eg: hue, shape) and others are best to convey ordinal data (eg: size, saturation).

While some channels are naturally better than others to display certain data types, there isn't a hard rule for any pair, some channels are OK at showing categorical and nominal data as well, or sometimes the line is fuzzy and you must choose what is best for the visualization at hand.

Let's take a look at some well defined marks and channels along with some examples.

Position & Spatial Region

A great example that portrays position used to compare ordered data and spatial region to distinguish between categories is a bar chart:

A Bar Chart comparing the number of lines of different characters in the U.S. television series "The Office".
 Example: Cumulative confirmed COVID-19 cases per million vs. GDP per capita

A scatterplot chart uses the position channel to compare two ordered attributes. The position of each point depends on the correlation between the values on both axes.

This chart also makes use of hue to group countries by category.

Source: Our World In Data

 Example: Where Are America’s Winters Warming the Most? In Cold Places.

This chart makes great use of position and spatial region to show differences in temparatures across a time period.

Source: New York Times

Size

Size is a good fit to display Quantitative data, it allows the viewer to easily compare different magnitudes side by side. As you can see in the examples below, size is usually combined with hue to convey both category and magnitude together.

 Example: Share of men vs. share of women who drank alcohol in 2010

This Scatterplot Chart makes use of size as an additional channel to convey extra information, at glance we can see the relationship between the values in both axes and also compare how many people drank alcohol between the different countries.

This chart also makes use of hue to group countries by category.

Source: Our World In Data

 Example: The Words Men and Women Use When They Write About Love

This chart is very similar to a Word Cloud, it conveys the frequency of usage of different words by the area (2D size) of each circle.

Hue is also used to separate the circles in two categories.

Source: New York Times

Hue

Hue is great to display categorical data, two shapes with different colors side by side immediately convey to the viewer that they belong to different categories.

 Example: Outdoor air pollution deaths by age, World, 1990 to 2016

In this chart, different hue values are used to convey different categories of age ranges.

The size of each band plus the stacking the different colors allows to easily compare and identify trends between categories over time.

Source: Our World In Data

 Example: Does livestock antibiotic use exceed suggested target?

Hue is used in this chart along with geographic information to convey three different categories. The use of different hue values allows to easy pinpoint areas with high consumption of antibiotics.

Source: Our World In Data

Luminance and Saturation

Luminance and Saturation are a good fit for quantitative data, when comparing items side by side an item with more saturation will convey "more" of the value it represents.

Each row shows variations on luminance (top) and saturation (bottom) for the same hue value.
 Example: GitHub contributions heatmap

In this graphic we can see two categorical variables (day of the week vs month) and the use of luminance to convey an ordered attribute (number of contributions). The rectangles get brighter when the number is higher.

Souce @mbostock GitHub profile

 Example: Disability-adjusted life years (DALYs) from particulate pollution

Luminance and saturation play very well with geographic data, allowing you to see geographic hot-spots easily.

Source: Our World In Data

Motion

When used carefully, motion can be a great channel to convey information, motion is good at displaying the passage of time and periodicity.

 Example: How to visualize periodicity?

Source: @pierreleripoll - Observable

 Example: Hang On, Northeast. In Some Parts, Spring Has Already Sprung.

Source: New York Times

Angle

Angles can be used to express quantitative data, and are often used to show the magnitude of change of something.

 Example: Religion & Attitudes Towards Homosexuality

Source: @gordon.hack - Tableau

 Example: Premier League 2017-18 review: Predictions vs. Reality

Source: @brian7311 - Tableau

Shape

Shape is often used for categorical data, as we naturally tend to group things that have the same shape together as belonging to the same group.

 Example: Women in the German Bundestag by Party

Source @terezaif - Observable

 Example: By age group: The growth of the population to 2100

Source: Our World In Data

Choosing the right elements

The problem of what are the best marks and channels to convey your analysis of a given data set has been explored for some time now, from Jackes Bertin's Semiology of Graphics which was written in 1967, passing by Leland Wilkinson's Grammar of Graphics and some of Edward Tufte's work and many others including many papers from the UW Interactive Data Lab, some of them even exploring computer generated visualizations based on this principles.

At the end of the day, it's a balance of matching the right channel with the right data type and picking the most effective channel.

Resources