Tuesday, February 15, 2011

About Jaanga: John Tukey + Exploratory Data Analysis = Awesomeness

John Tukey - courtesy Wikipedia

John Wilder Tukey (June 16, 1915 – July 26, 2000) was an American statistician.

I am not a statistician and, as yet, have only the vaguest notion of the history and development of statistics and the visualization of data. Therefore there may well be even more wonderful figures in the world of statistics to honor. But for the moment John Tukey provides me with more than enough delight and inspiration.

Why is Tukey so cool?

Tukey invents the computer terms: "bits" and "software". Actually there's a byte more than that.

Tukey writes a book titled Exploratory Data Analysis (EDA). EDA takes some very complex mathematics, a lot of statistical mumbo-jumbo and the insights of Hercule Poirot, Miss Marple and Sherlock Holmes and combines them into a college text book that is a real page turner (if, maybe, your brain is just a little bit kinky ;-). From this book I derive the three things to be concerned with in this post:

1. Accessibility

EDA makes some really complex stuff accessible to fairly normal humans on three levels:
  1. Most any educated and diligent person could do the stuff he suggests you can do. 
  2. Most any normal person can grasp that the data Tukey presents could be instrumental in understanding "what is going on here?"
  3. The tools are free as in beer and free as in liberty. You can do this stuff yourself, at home without doing any harm or needing special stuff or spending any money.

2. No Reductions

Tukey is the opposite of a reductionist. Tukey takes a holistic approach or even one of Cognitive Additivism. Reductionism is not necessarily a bad thing. Reductionism can either mean an approach to understanding the nature of complex things by reducing them to the interactions of their parts, or to simpler or more fundamental things. We are all reductionists some of the time (said he reductionistically). Tukey's work, however, points towards augmenting and supplementing the data at hand. Here is an example:

You can see all the data points. Tukey's thing is to show you how to calculate that bold line in the middle (often called a regression line). Note that he almost always confluences the supporting data with his overlays. His followers will drop the data and just show the regression lines.

Why is this important?

With Tukey you still can see the outliers, the Black Swan, the hair in the ointment. Tukey does not simplify reality. Tukey likes to augment reality.

3. New Ways of Seeing

The final cool aspect of Tukey's to be presented here is that Tukey not only shows you how to build the tools and add them to your production process but he also provides notions and strategies for using these tools to make things you could never make before.

Many of us can agree with the quote attributed to Pablo Picasso: "Computers are useless, they only give you answers!"

What Tukey does is help us use computers to create the picture worth on thousand words, the diagram that begs the question, the chart that forces you to to say "what is going on here?"


In further posts I will continue the exploration into Tukey's world. The idea that we can build news ways of seeing that add to our reality, ideas that most anybody can create and appreciate, is an integral notion of this web site. And Tukey is a great source if not the father of such a vision.