Knowing one’s exact win probability at any stage of a curling game is not possible. However, it can be estimated using the past results from similar cases. If you’re up 2 with 6 ends to go with hammer, the past can inform us about your future.
In the history of curling there have been tens of thousands of such situations in competitive events. But since the five-rock rule went into effect for the 2018-19 season, there have been 1,109 cases, with the hammer team winning 85.8% of the time.
That’s a useful approximation, but it’s not that useful. We should at least separate men’s and women’s. (For brevity, I will look at the men.) On the men’s side, there have been 639 cases, with the hammer team winning 86.5% of the time. But that still includes everything from the lowest levels of juniors to the world championships.
The value of having last rock has an influence on the chance of winning and the value of having last rock increases as a team gets better. This results in a lead being more valuable in games between elite level curlers.
We can look at games based on skill level, though. That was one of the motivations of creating the doubletakeout.com team rating system. The problem is this situation has occurred just 41 times in games between top 25 teams. Amazingly, the hammer team has won all 41 times. Based on this, in a game involving elite men’s teams, if you are down 2 without hammer entering the third end of an eight-end game, you might as well concede.
Except I don’t think anyone would do that even if they knew the recent history. That 41-game winning streak is a fluke. The easiest way to prove this is that is by looking at the same situation one end later. In that situation, the hammer team has gone 36-5 (86.1%). Hopefully we can agree that being up 2 with hammer is not truly a better situation with six ends left than five ends left.
If you want to create win probabilities for elite curling, you have to contend with such inconsistencies if you just use the raw data from the past. Fortunately, there’s a way around this to produce more realistic numbers.
The best way to illustrate this method is to look at the distribution of scores in the third end when a team was up 2 with hammer. (Negative values indicate steals.)
score cases
-3 1
-2 0
-1 2
0 4
1 10
2 12
3 7
4 5
If you wanted more proof why a concession would be silly, in one of the 41 cases, the winning team gave up a steal of 3 and trailed going into the fourth end. We know teams are an underdog in that situation. Even in the two cases where the winning team gave up a steal of one they were in a much more vulnerable situation heading into the fourth end, leading by just one. And after the four cases of blanks, we’ve established that teams have gone 36-5 in that situation. So there are clearly possibilities for victory by the trailing team even if they haven’t been observed recently.
It’s pretty clear the true win probability isn’t 100% in this situation, but then how does one come up with a better estimate?
Well, we don’t assume that the team giving up a steal of 3 wins 100% of the time which is the implied assumption if one uses raw historical data. That team is now down one with hammer, so we assume they win as often as all teams in that situation have won, regardless of how they got there. The same principle is applied to all other scenarios.
Once we work through all of the possibilities (by simulating thousands of games for each case), we get a win probability of 95.1% for the case of a team being up 2 with hammer with six ends left. So definitely don’t concede if you’re behind in that case. A 4.9% chance isn’t much but it’s well above the concession threshold unless you have a plane to catch. The same situation with five ends left actually has a probability of 96.3%, so that 86.1% figure from the raw data was more of a fluke than the 100% we got with six ends left.
I apply that method to every combination of margin/ends/hammer to produce the tables shown here. It’s called a Markov chain Monte Carlo method (I don’t recommend following that link), but “true” win probability is better for marketing. I also split each gender into three flavors: elite (top 25), super serious teams that can hang with the elite teams (ranked 26-100) and serious teams who have full-time jobs and can’t really hang with elite teams (ranked 101 and worse).
Using this method, there are few logical inconsistencies in the results. It gives a pretty realistic picture of a team’s chance of winning in various situations based on its skill level. One thing the MCMC approach also helps with is minimizing the effect of the better team usually starting with hammer. It doesn’t completely eliminate the issue, but the assumption in these probabilities is that teams of equal strength are playing, and the MCMC gives win probabilities closer to that ideal.
I’m going to wrap it up here. Now we have solid team ratings, sharper win probabilities, and a mountain of shot data. It’s a three-pronged arsenal for better curling analysis going forward.