Linear Digressions

Anscombe's Quartet

Linear Digressions

Anscombe's Quartet is a set of four datasets that have the same mean, variance and correlation but look very different. It's easy to think that having a good set of summary statistics (like mean, variance and correlation) can tell you everything important about a dataset, or at least enough to know if two datasets are extremely similar or extremely different, but Anscombe's Quartet will always be standing behind you, laughing at how silly that idea is. Anscombe's Quartet was devised in 1973 as an example of how summary statistics can be misleading, but today we can even do one better: the Datasaurus Dozen is a set of twelve datasets, all extremely visually distinct, that have the same summary stats as a source dataset that, there's no other way to put this, looks like a dinosaur. It's an example of how datasets can be generated to look like almost anything while still preserving arbitrary summary statistics. In other words, Anscombe's Quartets can be generated at-will and we all should be reminded to visualize our data (not just compute summary statistics) if we want to claim to really understand it.

Next Episodes

Linear Digressions

Traffic Metering Algorithms @ Linear Digressions

📆 2017-06-12 05:01 / 00:18:34


Linear Digressions

Page Rank @ Linear Digressions

📆 2017-06-05 03:46 / 00:19:58


Linear Digressions

Fractional Dimensions @ Linear Digressions

📆 2017-05-29 04:54 / 00:20:28



Linear Digressions

How to Find New Things to Learn @ Linear Digressions

📆 2017-05-15 03:49 / 00:17:54