We have broken Kona down year-by-year and split-by-split. We’ve looked at the trends, found a few nuggets of wisdom, and picked at a few whys and wherefores. But dissecting the beast and understanding its nature are two different things. To accomplish the latter, we’ve got to reassemble it.
The whole is greater than its parts, and now we’re going to mash the numbers back together and look at the big picture using some new statistical measures. Let’s begin reassembly process by going back to the very first assessment that started this series– the bike split analysis written by the group at Austin Tri Cyclist. Their investigation took two sets of data:
- the average of the bike splits of the top ten finishers at Kona and
- the fastest split out of each year.
They plotted those on a graph and then drew a regression line through the data indicating a linear trend of decreasing times. They did the same for the run.
This method is called regression analysis. The authors used Microsoft Excel to accomplish the data crunch and their numbers and application are correct. There was just one tiny problem. As you can see in the visual data, the lines approximate a trend through data sets whose points are pretty spread out.
Another way to put it is that the data doesn’t fit the trend very well. And that’s actually what’s indicated by the little “R2” values in the image. We’ll spare the readers a lengthy explanation of what the term means or how it’s derived. Suffice it to say, for your line to be accurate, you want an R2 value close to 1.0. The values expressed here mean that the lines don’t describe the data well. However, the group only took a small range of data, using averages and top times. We’ve dived as deep as you can into the numbers. Will greater detail yield a more definitive perspective?
And the answer is no. We get a little stronger R2 value, but not enough to say for sure that the men are getting faster from year to year. Because our data have been so similar for the men’s run as well as the corresponding women’s events, we’ll forego making extraneous pretty pictures.
So what relevant pretty pictures can we make? At first blush, the results seem rather chaotic over the years. Can mathematics and that little R2 term help us make any rhyme or reason of Kona? Indeed they can. As Carmichael Training Systems’ Nick White explained to us in a previous instalment, there’s an interplay between the bike and run that quite often determines who wins and who DNF’s at any Ironman, especially Kona. It’s therefore worthwhile to examine the correlation between bike split, run split, and total finish time.
The regression value (“regular R”) for this series is 0.92, and the vaunted R2 value is 0.85, meaning we can put a lot more faith in this data series. And the graph makes it intuitive that there’s a distinct relationship between bike split and finish time. You don’t have to be Joe Friel to know that (although it helps), but what makes things really interesting is a comparison to how closely run splits correlate to finish times.
The data points scatter out a bit more from the trend line in this graph, indicating that run performance is a bit more varied among athletes. We’re all too aware of why after seeing many a great contender or early leader fall apart running through the energy lab. Consequently, both R and R2 are 0.73. So not only is our line less approximate of the results, it’s less reliable.
Therefore, statistically speaking, the bike is a better indicator of overall performance at Kona than the run. This is interesting since we tend to think that athletes who falter on the run are overtaken by their pursuers. While that’s generally the case, the math says you’re safer betting on a bike course hero to hold on than a fast runner to make the comeback. One final pretty picture gives us a window into why.
Our chart here shows the correlation between bike and run splits among the top ten men over the last 35 years. We have to use the term “correlation” loosely because it’s actually quite weak (R=0.62, R2=0.39). Normann Stadler and Craig Alexander have skewed the data over time. However, we still see a growing concentration down in the bottom-left of the graph, indicating that this is becoming more of an all-around competition between athletes specifically trained for triathlon rather than swim-bike-run.
I got curious about this and strayed from statistical edicts to follow a hunch. I took the bike splits of each athlete and divided them by their run splits. The average ratio was 1.598. For top-ten finishers in the last five years, the average ratio was 1.582. For top-ten finishers in the first five years of Ironman, it was 1.593. Without any idea if this holds any validity, and in honor of the man who inspired me to look into it, I officially dub this number “White’s Ratio.”
All this ought to make the live broadcast of this year’s Ironman World Championships a little more interesting around T2 (keep a calculator near the dip bowl). What makes it even more interesting is that the women’s race is its own entity with unique behavior. The most remarkable element of this instalment is what’s missing — the swim. The times are so condensed among the top ten men that they are negligible in determining the race’s final outcome (at least from a statistical viewpoint; we don’t ignore how swallowing a mouthful of salt water can ruin someone’s day). The top women become a little more spread out in the water, and therefore we have to look at how that influences the overall finish.
We’ll do that in our next instalment.
Follow Jim Gourley on twitter. Buy his books below.