Parallel Sets

A visualisation technique for multidimensional categorical data.

Titanic Survivors

Data: Robert J. MacG. Dawson.

Explanation

For each dimension (Survived, Sex, Age and Class), a horizontal bar is shown for each of its possible categories. The width of the bar denotes the absolute number of matches for that category.

Starting with the first dimension (Survived), each of its categories is connected to a number of categories in the next dimension, showing how that category is subdivided. This subdividing is repeated recursively, producing a tree of “ribbons”.

In fact, you can imagine Parallel Sets as being an icicle plot, with icicles of the same category being “bundled” together.

Drag the dimensions and categories to reorder them. You can also click the “alpha” or “size” links that appear next to the dimension name on mouseover, to order the categories by name or frequency.

Women and Children First?

We can see at a glance that the relative proportion of surviving women is far greater than that of the men.

As for children, it becomes clearer when we drag the Age dimension up: around half the children survived. This is proportionally less than the women but more than the men. Can you spot anything else interesting?

Do It Yourself

The code is available as a reusable D3.js chart: d3.parsets. This is a configurable function, which can be called on a D3 selection to produce an interactive SVG visualisation.

The input data should be bound to the target selection. For input, you can either use an array of aggregated objects (pivot table) along with a value accessor, or you can simply use the full dataset and the grouped frequencies will be calculated automatically by default.

var chart = d3.parsets()
      .dimensions(["Survived", "Sex", "Age", "Class"]);

var vis = d3.select("#vis").append("svg")
    .attr("width", chart.width())
    .attr("height", chart.height());

d3.csv("titanic.csv", function(error, csv) {
  vis.datum(csv).call(chart);
});

Alternatives

For multivariate categorical data, the mosaic plot (or Marimekko chart) is a powerful alternative. Personally, I think it’s easier to see the order in which the subsets were derived in a parallel sets visualisation. On the other hand, it seems easier to spot small disparities in a mosaic plot because the subsets are laid out side-by-side. Here is a Marimekko chart in D3.js by Mike Bostock.

For multivariate ordinal data (such as numeric data), parallel coordinates are more appropriate, although you can often generate meaningful categories from such data for use with parallel sets.

Implementation Notes

Probably the most interesting part of implementing this was supporting multiple concurrent transitions on the ribbons. Strictly speaking this wasn’t necessary as it’s unlikely anyone would drag two things within the transition duration. But who would pass up an opportunity to use a custom tween?

This allows the x- and y- components of the ribbons to be animated independently, so that you can drag a dimension vertically even though a horizontal category animation is in progress.

In case you missed it, be sure to click on “icicle plots” in the Explanation section to see the animated transition.

Discuss on HN!

Further Reading