Sunday, May 19, 2013

USTA Dynamic NTRP Accuracy Part 2

I wrote yesterday on the accuracy of my Estimated Dynamic NTRP Ratings using matches played in the first round of the Northern California Sectionals.  The ratings went 8-2 predicting winners, so let's move on to round 2 and see how they fared.

In one team match, there were the following results:

  • A 4.55 beat a 4.67 6-3, 6-0 - Big miss on this prediction, but see below
  • Court 2 singles was a default
  • A 4.40/4.69 beat a 4.30/4.37 6-2, 6-4 - Pretty much as predicted
  • A 4.63/4.52 beat a 4.49/4.51 6-3, 6-2 - A little more convincing than expected but close
  • A 4.61/4.43 beat a 4.04/4.24 6-3, 6-2 - As expected
So here the ratings went 3-1, again not bad.  Note that in the first match, I'm told the losing player had an elbow injury in the first set and played the second set with his off hand which would explain a lot.  Based on this I'd be inclined to throw this match out and consider the record 3-0.


The other team match had:
  • A 5.0.5 beat a 4.77 6-2, 6-2 - Predicted correct
  • A 4.66 beat a 4.51 6-4, 2-6, 1-0 - Also correct, albeit the loser won more games
  • A 4.61/4.37 beat a 4.81/4.15 6-3, 4-6, 1-0 - Close as expected but correct
  • A 4.49/3.99 beat a 4.36/4.33 7-5, 6-2 - An upset, but the 3.99 is underrated
  • A 4.52/4.47 beat a 4.61/4.57 4-6, 6-4, 1-0 - Another upset, but close
The ratings didn't fare quite as well here going 3-2.


So, a total of 6-2 for this round, or 14-4 overall.  Again, not too bad.  There will always be some variance in play and results based on how players match-up, so predictions like this will never be perfect, but 78% correct certainly isn't bad.

Saturday, May 18, 2013

How accurate is the USTA's Dynamic NTRP at predicting tennis matches?

There is an interesting debate on the Tennis Warehouse Talk Tennis forum about whether or not the NTRP system can predict matches.  While I weighed in early to say the overall system wasn't designed to do predictions, I said this because the goal of the system is to promote competitive play and the detailed ratings aren't published to facilitate predictions.  The truth is that the algorithm itself in essence does predict a match result and makes adjustments to each player's rating based on how much variance there is from the prediction.  Note that the system uses game differential, not win or lose so it doesn't actually predict who will win or lose a match but who will win more games.

Now, doing this prediction does require having the actual dynamic rating, and not just the level the players are playing at, so my Estimated Dynamic NTRP is an excellent way to see how well the "predictions" fare.  For our sample, I'm going to use some matches played at the Northern California Sectionals going on this weekend.  I've done some analysis for one of the teams playing there so have already been looking at the matches in some detail.

In one team match there were the following results:

  • A 4.77 beat a 4.40 6-2, 6-0 - Pretty much on target and expected
  • A 4.51 beat a 4.45 6-0, 4-6, 1-0 - Ratings predict a close win and it happened
  • A 4.57/4.37 beat a 4.67/4.37 7-6, 6-4 - A close match predicted, but went to the lower rated team
  • A 4.47/4.52 beat a 4.30/4.04 6-1, 6-1 - Pretty much expected
  • A 4.52/4.64 beat a 4.36/4.40 6-3, 7-6 - Also pretty much expected.
So my estimated ratings went 4-1 and the miss wasn't radically off.

Another team match had the following results:
  • A 4.81 beat a 4.09 6-3, 6-3 - Expected a worse beating, but still got it right
  • A 4.66 beat a 4.55 7-6, 2-6, 1-0 - Got the win right, not quite like would have been predicted though
  • A 4.40/4.69 beat a 5.05/4.15 6-3, 1-6, 1-0 - The lower rated team won so a miss but the higher rated team did win more games so the algorithm was right in a way.
  • A 4.61/4.43 beat a 4.49/3.99 6-3, 6-7, 1-0 - As predicted
  • A 4.64/4.61 beat a 4.51/4.52 3-6, 6-2, 1-0 - As predicted
Again, my ratings went 4-1, the miss actually being "right" on game differential though.  Also, matches with that big a gap are a wildcard as if the more even team can find the weaker player, the advantage is nullified.

Overall, 8-2 in predicting matches.  I'll take that.

Look for more on the other matches later.

Friday, May 17, 2013

Estimated Dynamic NTRP Ratings for Southwest Washington Posted

My Estimated Dynamic NTRP ratings for Southwest Washington have now been posted.  This list joins the Northwest Washington list and both are for matches played thru April 1st of this year.

If you are interested in an updated rating thru a more recent date and/or a full Estimated Dynamic NTRP report, see this example and contact me.

Tuesday, May 14, 2013

It is USTA League playoff time, Estimated Dynamic NTRP reports are in demand

It is the middle of Spring and in most areas of the country, the USTA Adult 18 and over and 40 and over leagues are in full swing and/or nearing the end of the regular season.  Some sections have even started having their local/district playoffs and preparing for sectionals.

This has generated a renewed interest in my Estimated Dynamic NTRP reports and I've had requests for Captains and Team reports as well as other custom reports.

I worked with one team that was getting ready for the District playoffs and wanted to get an idea of where his team stood (including how many strikes one self-rated player might have) as well as the opponents they were going to face.  With reports in hand, they were able to anticipate the opposition line-ups and get the match-ups they wanted and won Districts and are now off to Sectionals where they are already equipped with the same information.  After Districts they were kind enough to send me this note:

"Your ratings really helped me with lineups for local playoffs. I don’t think we would have won without them."

I've also had other teams with a large number of self-rated players get team reports to get an idea on what if any strikes the players might have.

If you are just interested in what your rating is and if you are on course to get bumped up/down, or if you are on a team headed to playoffs and want a full team report or reports on future opponents, contact me and we'll get you hooked up.  For several sections, I can turn these around pretty quickly but should be able to do them for most any section/district.

Friday, May 3, 2013

Analysis of a strange Dynamic NTRP DQ in USTA League

A challenge with any rating system is how to rate new players/teams.  Do they get a default starting rating and if so, what is the default?  Or is there no initial rating and it gets generated from the first few matches/games?  In either case, should iterations be performed where the rating at some point in time is used as the starting rating?

In the case of the algorithm the USTA uses for calculating dynamic ratings as part of the NTRP, all indications are that new players, those that self-rate to join a league, do not have a rating and get their initial rating from their first few matches.  I've heard that come year-end rating time, a second pass through the calculations may be done that uses the player's end of year dynamic rating and this probably does make sense.

This is all fine and good, but in a league where the purpose of the rating is to promote competitive play, how can the league keep ringers or sand-baggers from self-rating too low so as to dominate their play?  This is where the NTRP system has provisions for disqualifying a player at a given level if their play demonstrates they self-rated too low.  Specifically, it is a 3 strikes system where if their rating exceeds a particular threshold 3 times, they get DQ'd at the level in question.

By and large, the system works pretty well.  The USTA sets the thresholds such that "natural improvement" is allowed, particularly at lower levels, without a disqualification occurring.  This means though, that some players that likely are playing below level may eke by and not get DQ'd, especially if they are doing the less than honorable manipulating of the system to keep matches close or throw matches to keep their rating low.

But occasionally, even with the thresholds allowing for improvement, a player gets DQ'd where on the surface, they shouldn't be.  One such situation was brought to my attention this week and I thought I'd share my analysis of why it happened.

The player in question self-rated as a 4.5 and proceeded to play in an 18 and over 4.5 league.  He played 6 matches going 2-4 with the following scores, all in doubles:

  • 1/13 - Lost 2-6, 7-6, 1-0
  • 2/2 - Lost 6-0, 6-4
  • 3/10 - Won 6-2, 6-4
  • 3/14/ - Lost 3-6, 6-1, 1-0
  • 4/12 - Won 7-5, 6-3
  • 4/28 - Lost 7-6, 6-0

It appears the DQ occurred after the 4/12 match (their "D" rating date is 4/19) so somehow, in going 2-3, they generated 3 strikes.  How could this happen?  They certainly don't appear to be dominating their league, they've lost more than they won in fact, and they aren't playing up (the usual way to get DQ'd or bumped up).

The key is that the NTRP system doesn't look at wins and losses, instead it looks at games won/lost and the current ratings of all players involved in the match.  Ok, so the 2-3 (2-4) record doesn't strictly matter, but in only 3 of the first 5 matches that appear to have generated the 3 strikes did he even win more games than the opponents.  Does simply winning more games result in a strike?

Not necessarily.  We still need to dig deeper, specifically at the ratings of his partner and opponents.  To aid in that, here is his performance chart.


The first match, despite being a loss, generated a pretty good match rating, over 4.5.  This is for two reasons.  First, his team won 2 more games than the opponent due to the first set.  Second, and more importantly, he played with a 4.0 partner (I had his rating at 3.8) against two 4.5s (4.21 and 4.01).  The NTRP computer calculates that in order to play with a weaker partner and win more games than stronger opponents that his rating had to have been the just over 4.5 you see in the chart.

In the second match, he loses pretty badly, but is still playing with the same 4.0 partner but this time the partners are stronger 4.5s (4.43 and 4.23).  This time, despite the 8 game deficit, because of his weak partner, they were supposed to lose and he still generates a pretty good match rating, just under 4.5.

Now, it is doubtful that either of these matches generated a strike.  My guess is that the threshold for a 4.5 is somewhere around 4.65 so both of these were ok.

The third match was the big one.  He and the same 4.0 partner win 2 & 4 against two 4.5s (4.08 and 4.40) and the computer says for that to happen, he would have had to have a rating of 5.0!  This match also generates his first full dynamic rating of 4.66 which is likely a strike.

The fourth and fifth matches are more of the same, although in the fourth he plays with a different partner.  However it is again with a 4.0 (3.97) and against two strong 4.5s and is reasonably close.  The fifth match is back with the first 4.0 partner and is a win.  These two matches keep his rating right around that 4.65 number and were likely the 2nd and 3rd strikes.

So what have we learned.  The key thing is if you are self-rated and play with a weaker partner, particularly one that is playing up, it is going to be easy to get a relatively high rating, especially if the opponents are stronger and results are competitive.  Had this player had a partner that was stronger and had the same results, he would not have been DQ'd.

Was this DQ just?  I don't know, but it can certainly be explained as I've done above and if he isn't a 5.0, he seems to be a pretty strong 4.5 as he is able to carry weaker opponents to pretty good results against good to very good 4.5 opponents.

Interested in seeing your own performance chart and estimated rating?  See this for an example and contact me to learn more.

Update: I received word that it was indeed the 3/10, 3/24, and 4/12 matches were the strikes so my estimates seem to be pretty accurate.

Friday, April 26, 2013

Feedback on Estimated Dynamic NTRP Reports and Lists

It is great to get feedback from report customers and other folks that browse the lists I post.  I thought I'd share a few things that might be of interest to others.

First, I posted a complete list of ratings for players in Northwest Washington through April 1st, but I received questions about why some players weren't listed.  In most cases, it is because the player either hasn't played any matches since November 1, 2012, or because the player is self-rated and hasn't played enough matches yet to generate a dynamic rating.  The trick in this latter case is that if a self-rated player plays against another self-rated player, no match rating is generated and so it doesn't contribute to generating their first dynamic rating.

Second, my Estimated Dynamic NTRP reports are useful in many ways, from giving you a good idea what your current rating is, to seeing how your rating has changed and how each match affected your rating.  But for one individual I did a report for this week, the report pointed out a data entry error that was significantly affecting their rating.

Take a look at the chart below.



That first match looks way out of whack. While every player has good days and bad days, and match-ups against specific players or styles of play are better/worse for a given player, the highs and lows you typically see for a player aren't more than 0.5 apart. But this first match is a good 1.5 points below many of the other match results.

It turns out this first match wasn't played by the player I was generating the report for, his name had gotten entered by mistake. But my report clearly pointed this out and he is now working to get the match entered correctly in TennisLink.

Keep the feedback and questions coming. It is appreciated.

Thursday, April 25, 2013

What matches are strikes in USTA leagues?

I just had an inquiry from a player that had been DQd for generating 3 strikes.  She actually knew she'd been playing above level and had joined a higher level team already, but she had me use my Estimated Dynamic NTRP ratings to guess which matches her strikes were.

Turns out, I was very close.  I identified a first match that was likely a strike, two others that definitely were, and a fourth that would be the third strike if the first one wasn't.

I was wrong on the first, but right on the next three.  This probably just means that what I'm guessing the threshold is for a strike for her level is wrong (the USTA seems to give a big margin for improvement at lower levels before generating strikes) so I'm pleased my estimated ratings are so close to being accurate.

If you think you might have strikes and want to know, an Estimated Dynamic NTRP Report is a great way to find out.  Contact me if you are interested.