Why I hate Pie Charts

The pie chart should be avoided at all costs. Why?

  • you can only display a limited number of slices (although that doesn’t really stop people from creating pie charts with dozens of slices)
  • they take up a lot of space
  • the human brain cannot easily compare slices based on radial areas. We’re better in comparing areas like squares and rectangles but length is preferred, such as in a bar chart.
  • to make it more effective, you have to add labels and/or a legend, which makes it only worse
  • sometimes people forget pie charts represent a “part of a whole” relationship and if you add the percentages together, you end up over 100%

For these reasons, it’s almost always better to replace the pie chart with a bar or column chart. The only exception is when you have a very limited number of slices (maximum 3 but I prefer 2). In this case, the “part of a whole” relationship can be quite accurately displayed. It’s useful for “omg look how big that slice is compared to the other tiny slice”. An example:

pacman

If you do have to create a pie chart, adhere to these simple rules:

  • start the first slice at 12 o’clock. Thanks to human evolution, we can at least somewhat decently read an analog clock. This means we can quite accurately read the size of the first slice.
  • sort the slices. Preferably by size, starting with the biggest one because that’s probably where you want to focus on.
  • keep the number of slices down. Combine the smallest slices into one bigger slice and label it “Other…” or something like that.

Now, why the reason of this blog post (most of this has already been described before)? I was recently reading the blog post What’s More Popular: SQL Server 2014, or SQL Server 2005? by Brent Ozar (blog | twitter) and it linked to yet another fine specimen of pie chart junk (note: the chart was not created by Brent, just to be clear). It’s a very interesting post – with an interesting discussion in the comments as well – about the adaptation rate of the different SQL Server versions among a large sample of Dell servers.

sql-server-versions

The problems are quite clear, since the chart doesn’t follow any of the rules I explained earlier. There are too many slices and thus too many colors as well. Could you see that 2014 was the double in size of 2000, without using the labels?

I imported the data in Power BI and quickly created this column chart:

barchart_sqlversions

Much more clear, isn’t it? And only one color needs to be used. In the discussion on Brent’s blog someone suggested to sort the data not on size, but rather on release date, which makes this graph actually much better since there is some sort of time aspect related to the data. It’s obvious to see now 2005 is still more popular than 2008 and 2014. You can now also clearly see 2014 is bigger than 2000.

The chart has a skewed normal distribution to the right, which might be expected for adaptation rates of a technology product, but you can see 2005 and 2014 are somewhat outliers. I tend to believe SQL Server 2014 didn’t really had anything substantial to offer; certainly not for BI – aside from the clustered columnstore index, which on itself is not a reason to upgrade – and if your shop didn’t need in-memory OLTP there was no actual reason to upgrade.

SQL Server 2016 on the other hand, will be awesome 🙂


------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂

Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.