Sat 26 May 2007
Data is no good unless it gives you the Right Information
Posted by shoff under UncategorizedNo Comments
Data is great. The ability to collect and store oceans of data has been the greatest achievement of my lifetime. We now have more data on our PDA than we had in banks of filing cabinets 30 years ago. But, smart companies realize that it is not beneficial to only collect the data, but it is important to get the right data and most important to take that data and transform it into the correct information that will allow processes to be improved, hold employees accountable, and give management tools to make timely decisions and improve profits.
Using a baseball analogy (as I like to do) take a look at these two sets of data:
- A:0,3,3,1,0,1,2,1,1,3,3,0,0,0,0,0,1,0,1,0,0,1,4,0,1,0,3,0,0,2,4,1,2,0
- B:1,0,2,0,0,0,1,0,4,0,2,1,0,0,0,4,0,0,0,2,0,0,0,1,0,0,0,0,0,0,2,0,0,0,5,0,2,0,0,0,1,2,0
The data above represents runs given up by game for two particular pitchers during a season.
So, now that we know that, let’s analyze the data and present some information:
- Pitcher A gave up 38 runs in 34 games
- Pitcher B gave up 30 runs in 43 games
Analyzing that information, we determine that:
- Picher A gave up 1.12 runs/game
- Pitcher B gave up 0.70 runs/game
Therefore we would assume based on this information that Pitcher B is better pitcher.
In further analyzing the data we find
- Pitcher A gave up 0 runs in 15 of 34 games (44%)
- Pitcher B gave up 0 runs in 29 of 43 games (68%)
So once again B is better
Ok, let’s add some data:
- A: 1,1,0,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,0,1,0,0,1
- B: 0,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,0,0,0,1,0,0
1 represents a win by the team and 0 represents a lose by the team for each game the pitcher was in.
- Pitchers A team won 24/34 games (71%)
- Pitcher B team won 31/43 games (72%)
Once again pitcher B compare favorably.
So having the above data will tell you that B is the better pitcher.
The following data is critical to getting the correct information for the pitchers’ analysis:
- A: 7,7,9,9,12,11,8,9.2,8,8,9,9,9,9,9,9,9,9,9,9,9,9,11,9,9,9,9,9,10,8,9,8,8,9
- B: 1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,.2,1.1,1,1,1,1,1,1,1.1,1.1,1,1,1.2,1.1,.1,.1,1.2,1, .2,1,2
The above data represents the number of innings each pitcher threw in each appearance. Therefore:
- Pitcher A threw 304.2 innings
- Pitcher B threw 45.2 innings
More importantly
- Pitcher A had an ERA of 1.12 (runs given up/9 innings pitched)
- Pitcher B had an ERA of 5.91
Pitcher A was Bob Gibson in 1968. His ERA that year was the 4th lowest of all time (every other season in the top 41 happened before 1920!).
Pitcher B was Keith Foulke, closer for the Boston Red Sox, in 2005.
As you can see in the above example, having data, but not the right information, can be very deceiving. It may even make you think that Keith Foulke was a better pitcher than Bob Gibson ![]()