StackRating

An Elo-based rating system for Stack Overflow
Home   |   About   |   Stats and Analysis   |   Get a Badge

Reputation vs Rating

Reputation is a measure of how much the community has appreciated a users contributions (which strongly correlates to how much time the user has been spent on the site). The reputation systems primary purpose is to asses a users trustworthiness and automatically ascribe moderator capabilities. For details, refer to What does reputation really mean and do you pay attention to anyone’s but your own?.

StackRating on the other hand measures a users “skill”, which in this context corresponds to a users ability to provide answers which the community appreciates.

How the rating is computed

Each user has an initial rating of 1500. When a user answers a question, his/her rating will get a positive update if the answer gets more upvotes than the other answers on the same question, and a negative update if it gets less upvotes. The magnitude of the rating update is determined by the ratings of the other users and whether or not the outcome was expected.

Example: You and Jon Skeet each post an answer to the same question. Jon’s answer gets 20 upvotes, yours get 5. Since Jon beat you, he gets a small positive update, and you get a small negative update. The updates are small, because the result was expected according to your prior ratings. If, on the other hand, you beat Jon, you will get a large positive update, and Jon a large negative update.

The algorithm is called the Elo rating system after Arpad Elo who invented it to track ratings of chess players.

Generalization to N players

The original formula is only defined for two players. Since an arbitrary number of users can answer the same question, the algorithm has been generalized as follows: If four users answer the same question, the algorithm treats this as six separate games where each player is compared to each other player. This gives three rating updates for each player, and the final rating update is the average of these three updates.

Scoring

Elos rating system gives 1, ½ and 0 points for a win, draw resp. loss. When applying the formulas to answers on Stack Overflow, we can leverage the actual votes and get a finer granularity. The function that StackRating uses, gives you

This ensures that the user with the top answer always gets a positive rating update, and a close runner up dosen’t get a large negative update.

A downvote counts as -1 upvote and an accept counts as +1 upvote.

Convergence

The key property of the Elo rating system is that the ratings converge. This means that once you’ve answered enough questions, your rating will reflect your actual ability, and you can expect it to be somewhat stable. The graph below illustrates this nicely:


The rating of cnicutar

How fast the rating converges depends on the K-value, which represent the maximum update for a game. A high value gives volatile ratings that converge quickly and a low value gives stable ratings that converge slowly. It’s common practice to let this value depend on how many games you’ve played. StackRating uses the following function:

 8
if you have posted less than 100 answers
K = 
 1
if your opponent has posted less than 100 answers
 4
otherwise

Further adjustments

To mitigate the Fastest Gun in the West Problem the upvotes are normalized by the age of the answer.

To avoid problems that arise when new answers are posted to old questions (“Here’s the new Java 8 way of doing this…”) which might get an unfair advantage, only answers that are posted within 3 months of the time the question was posted are taken into consideration.

How to interpret the rating

A user’s rating is increased if the he/she provides an answer to a question which gets more upvotes than other answers to the same question. In other words the rating reflects the users ability to provide answers that “end up on top”. Now the obvious follow up question is of course: How well does a users capability to provide highly upvoted answers reflect his/her actual proficiency?

My personal experience using Stack Overflow says that the best answers typically end up on top. (While it is not hard to find questions where the objectively best answer is on second or third place, these belong to the exceptions, and the Elo rating system is stable enough to not wreck havoc in these cases). This reduces the question to: Does the ability to answer programming questions reflect programming proficiency? Personally I’d say yes, absolutely. The capability of being able to describe a technical topic correlates to how well you understand the topic, and even if you’re really good with inventing algorithms and fine tuning assembly code, what good is this ability if you can’t explain your work to a fellow programmer. Communication (which Stack Overflow happens to be all about) is key!

Finally, I’ve looked through numerous users with high and low rating and my (completely anecdotal) observation is that the rating reflects quality and proficiency really well.

Live Monitoring

The Stack Overflow site is monitored continuously through the StackExchange API (thanks Sanjiv for the Java API!). Currently it scans all questions from the past 90 days at least once every 24 hours.

Links to relevant StackExchange posts

About the webpage

Original idea and code by me, aioobe. Lots of valuable feedback from dacwe.

If you want to do a Greasemonkey script that embeds rating on Stack Overflow, or if you want to create a badge or something, you can use stackrating.com/rating/<userid>.

Source code available on GitHub: https://github.com/aioobe/stackrating.