In the introduction to my ratings, I used the example of Krista McCarville to show how the Bradley-Terry system can properly evaluate a team that played a limited schedule against mostly inferior competition. It was a heartwarming tale but the fact is that I am heavily incentivized to show the success stories of my ratings. However, any responsible creator of a model, or any kind of tool, really, should be asking ‘what can go wrong?’ when developing a new piece of technology.
One example of something potentially going wrong in the current ratings is the second-ranked team on the women’s side, Tabitha Peterson. The last time we saw Peterson’s team was at the women’s world curling championships in Prince George, B.C. Well, we didn’t actually see them because like every other sporting event in North America, the event was cancelled after Rudy Gobert’s Covid-19 diagnosis, two days before it was supposed to begin.
If this blog were around then it would have previewed the event by using the most recent ratings and Peterson’s team would have had the second-highest chance to win gold. This is noteworthy on two levels. First, Tabitha Peterson’s team didn’t exist until Nina Roth left the former Team Roth for maternity leave in October. (Technically, Team Peterson was still Team Roth at the end of the season, also, but we will get to that momentarily.) And second, it’s been a while since American curling has been relevant on the world scene.
The last American medal in the world championships was in 2006 and an American team has never medaled in the Olympics. The only gold medal in 47 combined worlds and Olympics was won in 2003 when Debbie McCormick’s squad stunned Colleen Jones in the finals in Winnipeg. So just the thought that an American team would be a threat for a world title would have been news.
In fact, I suspect that while many were aware of Peterson’s late-season success, few would have had any combination besides Anna Hasselborg and Kerri Einarson as their top two choices for gold. And most would have had Muirhead, Fujisawa, and possibly Kovaleva ahead of Peterson as well. Part of the issue is that Team Peterson was really Team Roth, and Team Roth was barely in the top ten of the WCT points standings.
If forced to bet a large chunk of my personal fortune, I would have had to confess that I, too, did not believe Peterson had the second-best chance to win the worlds. But here’s the thing: After Roth’s departure, Team Peterson really did play like one of the best teams in the world. Over 33 games, the team lost just six times and most impressively, it went 6-2 against the rest of the top-ten. Here are those results:
1 Hasselborg W 1 Hasselborg L 3 Einarson W 5 Fleury W 6 Tirinzoni W 7 Kovaleva W 9 Jones W 10 Fujisawa L
Like any sport, beating the best teams is difficult, but it seems particularly so in curling. Curling is top-heavy, but even the top of curling is top-heavy. Hasselborg and Einarson went a combined 29-15 against their fellow top-ten teams. Just four other teams managed a winning record against the top-ten (minimum three games) and three of those were only one game above .500. The fourth was Peterson.
Eight games isn’t a large sample, but because it’s hard to fake it against the best curlers in the world, it’s useful in this case. Rachel Homan went 11-10 against the top-ten. Tracy Fleury went 12-13. Jennifer Jones went 10-20. Going 6-2 tells us something.
Chelsea Carey, the reigning Canadian champ, went 6-17 against the top-ten. Peterson could have lost her next 14 games against the top-ten and still had a better record than Carey, who was ranked five spots better than the old Team Roth in the final DoubleTakeout.com ratings. Just based on that information, we have a compelling case that Peterson was a serious medal threat at worlds.
Still, I have a nagging feeling that this might be too good to be true. It’s hard to believe that after exchanging one player, Team Peterson was so much better than 21st-ranked Team Roth. If this was the case, we might have expected Nina Roth to be exiled to the funspiel scene this season. But once curling resumes in the U.S., Roth will be back throwing third on Team Peterson. That tells us that Nina Roth brings something valuable to the team.
So we are left to spend idle moments pondering whether Peterson’s performance was legit or unsustainable. Consider a spectrum with fluke on one end and real on the other. Peterson’s performance exists somewhere in that realm. Regardless of where it is, I think it’s fun for the game. Sports are most compelling when there is mystery.
If Peterson’s performance was heavily flukey then she fooled us for 33 games against very good competition. Maybe you, too, can play over your head for an extended period of time! That’s motivation for everyone. Good teams can play great over a few months. Maybe the sport isn’t so top-heavy after all.
On the other hand, if Peterson was truly the best team in North America on March 12, that’s pretty fun, too. With just a change or two to your team’s approach, a good team can become great. Teams can surprise us. Who doesn’t like surprises?
The next update to the ratings will be to link fragmented teams together and I suspect when that is finished, the combined Peterson/Roth entry will be towards the bottom of the top-ten. Based on all available information, that seems like the most realistic guess for Team Peterson. They may not have been one the favorites to win a medal at worlds, but they were damn close.
Regardless, this is a good lesson in sample sizes. Even a half-season of information leaves us with an incomplete picture of a team. This justifies using more than one season to give the ratings more stability. That’s why we could have more confidence in the rating for Team McCarville than the rating for Team Peterson. Even though McCarville had also only played a partial season, we had a previous season of her games to anchor her rating a little better.
Like any system, Bradley-Terry works best with more data. But since we’re only looking at game outcomes, we don’t just need more data, but more wins and more losses. Teams that don’t lose much (or at all) over a stretch force the system to guess a little more at a team’s true ability. And in fact, this issue comes up often for new teams early in the season. Win an early spiel against weak competition and the system doesn’t know if you’re the best team in the world or just barely better than your opponents.
For these teams, we need a way to make a better guess at their rating. We need a way to rate them before the they play any games. With Team Peterson, we can cheat and guess that they would play like Team Roth when they first started. But with other situations we are not so lucky. The next step is to implement that into ratings. We’ll talk about how to do that soon.