Gladiator 5k Race Times (secs)
Two Saturdays ago, a Gladiator 5k took place in Cary, NC. It was a foot race with a few mandatory obstacles like high walls, nets, tunnels, and a mud pit. I had good fun as a participant. Though I scraped up my knees and I was unimpressed by the puddle of water designated as a “mud pit,” the thrill of leapfrogging 10-foot wooden walls outdid my complaints. Individual results were posted soon after the race in a big table. Though I’m not one to pore over tables, I did notice that one of the women who beat me was 42 (congrats!). I sent my mom an email to inspire her with that information. But surely I was giving her false hope? That question led me to unleash my stats software (JMP) in search of the associations between age, sex, and finishing time. I started with age:
Race Time By Age
The red line is a linear fit (least squares regression) with a slope of 3.16 seconds per year of age. Warning: since that line only explains 0.4% of the variation in race times (RSquare = .004), it really sucks at describing the data. The line would have sucked even more had I not removed three clear outliers—the slowest racers by far—who were in their 20s and 30s. So a linear model was mostly useless here; age and performance had essentially no association across competitors as a whole. If that doesn’t surprise you a bit, it should. After all, look at the ages of the competitors:
Age Distribution (n = 771)
As you can see, there was plenty of dispersion from the mean of 32.3 years (Standard Deviation = 8.4). We’re not just talking about a few fast 40-something and 50-something year-olds: Half of the 771 competitors were between 31 and 59 years of age. Somehow they fared about as well as the younger half in the 14 to 30 range, although the very fastest and very slowest racers were young. My guess is that the older group had a more athletic background than average, compensating for the mild slowing effect of aging. In fact I now suspect that in most amateur races the older competitors will have the same average speed as their younger counterparts due to self-selection effects.
Does a similar story about the compensating effect of self-selection apply to the women racers? No: the differences between the sexes were very noticeable. Here are the histograms of the female and male times, where the heights of the bars again reflect frequencies.
Female Times (N = 372, 1 outlier excluded)
Male Times (in sec, N = 395, 3 outliers excluded)
Above the bars, the median and mean are marked with a line and diamond; there was a 5 and 4 minute difference between them. I’d attribute most of that difference to physiology. The distribution for female times was roughly symmetrical whereas the male distribution was right-skewed. There are many ways to interpret this, but my tentative guess is that relatively slow men were more interested in the event than relatively slow women. This guess fits with the marketing style of the event, which emphasized toughness in a way that I think appealed more to guys. Just to nail home the idea that sex predicted performance more systematically than age, compare these two graphs (on the same scale):
|Race Time By Sex
||Race Time By Age Block
The red boxes enclose the middle 50% of the race times and the red middle line marks the median time. The boxes match much more closely on the right (between age groups) than on the left (between sexes). Of course, matching up box-and-whisker plots isn’t a precise business, so I sought a final verdict with the “difference value” (d), equal to the difference in group means divided by the standard deviation including both groups. For sex, d is 0.56, which Lise Eliot calls “medium”-sized. For age group, it is 0.03, which is tiny. To give some perspective, the d value for height difference between sexes is about 2.6. For scores on standardized science tests, or on evaluations of verbal fluency, d is about 0.35. Since I’m wading into mildly scandalous territory, I’ll throw in the usual caveat: differences within sexes are always much larger than differences between sexes. Only trivial exceptions come to mind (e.g., number of breasts or testicles). An additional, more specific caveat is that males and females at the race almost surely weren’t representative of the American or global population. If for some twisted reason all adults in the U.S. were forced to compete in a Gladiator 5k, I bet females could really give males run for their money in average time because a higher percentage of males are overweight.
The fun in graphing and analyzing data is that previously obscure patterns reveal themselves in fine detail. Even if this Gladiator data isn’t especially intriguing for those of you who weren’t there at the race, I hope I’ve reminded you of the power of statistics to organize information. Next time you find yourself fishing morsels out of a table, especially if that information has a personal connection to you, try graphing it!