I got me some data!
Done gone put it in a graph!
Added meself a TRENDLINE!
And me answer is THE TREND IS DOWNWARDS!
This is what happens when people get data and have excel.
They stick it in and press that inviting “Add Trendline” button and hey presto INSTANT ANALYSIS. It allows you to cut through all that silly binary comparisons and possibly separate out the noise to find the signal beneath the actual data to see which way it is moving.
I should know, cos that’s what I did for years. But here’s why it is ALL WRONG.
Have a look at that data again, with added trendline…
Trendlines make you think the data HAS been moving in that direction and WILL CONTINUE TO. It is a slope that starts and doesn’t stop. Doesn’t matter if that isn’t what it strictly means to a hardened statistician, if this is presented to a manager what will they think? They will think that the data shows that whatever is being measured has gone down and will therefore continue to decrease.
If they thought that, they would be dead wrong.
Here is why….
I put the data above in to a control chart, using Winchart. Here it is in all its clunky 80s glory…
The blue line on it is the mean, the red line at the top is the upper control limit. But that’s not important, what IS important is that pressing the handy “diagnose” button will run a check on the data using a set of tests. These tests look at how the data behaves in relation to the average and upper and lower lines. It checks for any signs of a change. An actual change in the system, as indicated by the data.
When I pressed the button, it said “look at data point 37, cos from there there were 12 points below the average line” , and there were. This indicates that there was a change in the data, enough for there to have been a change in the system that produced the data.
So I click another clever button that split the data into two parts, to show the system before the change around data point 37, and the system after the change around data point 37.
Then I done gone and put a label to show what I had done gone and done.
This chart shown to a manager will lead them to think that there was ONE change somewhere in the middle of the data, a single step change. One that occurred once, and not again.
This is different from the the message that the trendline one communicates. The trendline says “its decreasing and will continue to”. Not a step change, but an incremental continuous change.
Now I’m no statistician, but these are two different messages.
Deming said “Management is prediction”, it relies on interpreting the past to make theories about how the future might be if you act a certain way.
Management is prediction.
The simplest plan – how may I go home tonight – requires prediction that my automobile will start and run, or that the bus will come, or the train. Knowledge is built on theory.
The theory of knowledge teaches us that a statement, if it conveys knowledge, predicts future outcome, with risk of being wrong, and that it fits without failure observations of the past.”
This means a manager relies on the data being presented and interpreted in a way that most accurately reflects reality. The map is NOT the territory, but a useful map MOST accurately reflects the territory.
In the Grand Battle Of The Charts, which of these is the MOST accurate map of the territory?
You can rarely directly see the territory behind the map, or the generator of the numbers…
History is opaque. You see what comes out, not the script that produces events, […] The generator of historical events is different from the events themselves [link]
But I can cos I cheated.
I made up these numbers myself, using my own random number generator.
I used excel to produce two sets of 30 numbers.
- 30 whole numbers between 4 and 8
- 30 whole numbers between 3 and 6.
Then I put them in a row, the first 30 then the second 30.
So who is the winner in this instance? Which map is closer to the territory?
The control chart!
It said look at point 37, and i did.
I split the data there, the data pointed at around there being a change in the data being produced, and therefore a change in the actual system producing it.
It was correct! (almost, the change was at point 31, control charts are a heuristic but they’re the best heuristic we’ve got)
If I used that control chart to make predictions about the system producing that data, my theory would be that there would be no further changes in the system. I would be correct in that theory.
If I used the trendline I would think there had been continuous decrease and there would be further decrease in the data, and I would be wrong.
This matters. It matters because if I am making potential improvements to that system, I would want to see the effects of my changes. If I was managing that system without making changes to it the I would still want to know if it changed in any fundamental way.
This is why I don’t like trendlines, they are not used by thinking that wants to understand the system producing the data, they are just talk about the numbers themselves. “Hey look, it’s going downwards” as opposed to any insight into the behaviour of the system itself.
As per previous incoherent rants, don’t analyse numbers, analyse the system producing the numbers instead.
An analysts job is not to analyse numbers, it is to solve problems by analysing numbers.
Choosing the right tool to do this follows by asking the right question, not “what’s happening with this data?” but “what is happening with this system?”