Friday, May 3, 2013

Analysis of a strange Dynamic NTRP DQ in USTA League

A challenge with any rating system is how to rate new players/teams.  Do they get a default starting rating and if so, what is the default?  Or is there no initial rating and it gets generated from the first few matches/games?  In either case, should iterations be performed where the rating at some point in time is used as the starting rating?

In the case of the algorithm the USTA uses for calculating dynamic ratings as part of the NTRP, all indications are that new players, those that self-rate to join a league, do not have a rating and get their initial rating from their first few matches.  I've heard that come year-end rating time, a second pass through the calculations may be done that uses the player's end of year dynamic rating and this probably does make sense.

This is all fine and good, but in a league where the purpose of the rating is to promote competitive play, how can the league keep ringers or sand-baggers from self-rating too low so as to dominate their play?  This is where the NTRP system has provisions for disqualifying a player at a given level if their play demonstrates they self-rated too low.  Specifically, it is a 3 strikes system where if their rating exceeds a particular threshold 3 times, they get DQ'd at the level in question.

By and large, the system works pretty well.  The USTA sets the thresholds such that "natural improvement" is allowed, particularly at lower levels, without a disqualification occurring.  This means though, that some players that likely are playing below level may eke by and not get DQ'd, especially if they are doing the less than honorable manipulating of the system to keep matches close or throw matches to keep their rating low.

But occasionally, even with the thresholds allowing for improvement, a player gets DQ'd where on the surface, they shouldn't be.  One such situation was brought to my attention this week and I thought I'd share my analysis of why it happened.

The player in question self-rated as a 4.5 and proceeded to play in an 18 and over 4.5 league.  He played 6 matches going 2-4 with the following scores, all in doubles:

  • 1/13 - Lost 2-6, 7-6, 1-0
  • 2/2 - Lost 6-0, 6-4
  • 3/10 - Won 6-2, 6-4
  • 3/14/ - Lost 3-6, 6-1, 1-0
  • 4/12 - Won 7-5, 6-3
  • 4/28 - Lost 7-6, 6-0

It appears the DQ occurred after the 4/12 match (their "D" rating date is 4/19) so somehow, in going 2-3, they generated 3 strikes.  How could this happen?  They certainly don't appear to be dominating their league, they've lost more than they won in fact, and they aren't playing up (the usual way to get DQ'd or bumped up).

The key is that the NTRP system doesn't look at wins and losses, instead it looks at games won/lost and the current ratings of all players involved in the match.  Ok, so the 2-3 (2-4) record doesn't strictly matter, but in only 3 of the first 5 matches that appear to have generated the 3 strikes did he even win more games than the opponents.  Does simply winning more games result in a strike?

Not necessarily.  We still need to dig deeper, specifically at the ratings of his partner and opponents.  To aid in that, here is his performance chart.

The first match, despite being a loss, generated a pretty good match rating, over 4.5.  This is for two reasons.  First, his team won 2 more games than the opponent due to the first set.  Second, and more importantly, he played with a 4.0 partner (I had his rating at 3.8) against two 4.5s (4.21 and 4.01).  The NTRP computer calculates that in order to play with a weaker partner and win more games than stronger opponents that his rating had to have been the just over 4.5 you see in the chart.

In the second match, he loses pretty badly, but is still playing with the same 4.0 partner but this time the partners are stronger 4.5s (4.43 and 4.23).  This time, despite the 8 game deficit, because of his weak partner, they were supposed to lose and he still generates a pretty good match rating, just under 4.5.

Now, it is doubtful that either of these matches generated a strike.  My guess is that the threshold for a 4.5 is somewhere around 4.65 so both of these were ok.

The third match was the big one.  He and the same 4.0 partner win 2 & 4 against two 4.5s (4.08 and 4.40) and the computer says for that to happen, he would have had to have a rating of 5.0!  This match also generates his first full dynamic rating of 4.66 which is likely a strike.

The fourth and fifth matches are more of the same, although in the fourth he plays with a different partner.  However it is again with a 4.0 (3.97) and against two strong 4.5s and is reasonably close.  The fifth match is back with the first 4.0 partner and is a win.  These two matches keep his rating right around that 4.65 number and were likely the 2nd and 3rd strikes.

So what have we learned.  The key thing is if you are self-rated and play with a weaker partner, particularly one that is playing up, it is going to be easy to get a relatively high rating, especially if the opponents are stronger and results are competitive.  Had this player had a partner that was stronger and had the same results, he would not have been DQ'd.

Was this DQ just?  I don't know, but it can certainly be explained as I've done above and if he isn't a 5.0, he seems to be a pretty strong 4.5 as he is able to carry weaker opponents to pretty good results against good to very good 4.5 opponents.

Interested in seeing your own performance chart and estimated rating?  See this for an example and contact me to learn more.

Update: I received word that it was indeed the 3/10, 3/24, and 4/12 matches were the strikes so my estimates seem to be pretty accurate.