Update :
The server is running smoothly for a week, I'm finishing implementing TrueSkill for the leader board.
What is TrueSkill ? One of the biggest feature of my project !
TrueSkill
---------
- Why choose trueskill over ELO ?
ELO system got many flaws. First one is that it's unable to handle anything else than head to head games.
Teams Games
------------
ELO teams games must be 2vs2 or 3vs3, can't be 2vs2vs2. And each team will be considerate as a single player, meaning that the leader board will show a result for each pair of player, for every pair that will ever exists.
TrueSkill can handle any match up. Teams are the weighted sum of the players inside, and results are correctly propagated from the team result to the players in the teams. It can also easily handle FFA, 2vs2vs2 and more.
Draws
-------
ELO can't handle draws correctly. For ELO, an drawn is a half-win half-lost game, and that's it.
TrueSkill takes draw very seriously.
Each map got a draw percentage based of all games played on that map.
TrueSkill considers draws as a meaningful outcome : You were matched with a equally skilled opponent.
But Let's considers two players with the same skill.
On a "normal map" where draws are unlikely, a draw game will lead to no difference in skill, but the system is learning the player better (actual skills are accurate - see below).
That result will also increase the draw probability on that map.
Now, on a map like "Winter Duel" where draws are 80% chances of outcome.
As draw is expected as result, the skill won't move, but the system doesn't know the players better, so that game is meaningless compared the "normal" map case.
That result will also increase the draw probability on Winter Duel.
Now let's say that player 1 wins.
On the "normal" map, the player will gain some points. Let's say it gain +4 points, and the loser -4.
That result will also decrease the draw probability on that map.
Now, on the Winter Duel map. As draw was highly expected, the fact that player 1 is winning means that's he is actually WAY better than his opponent. On that map, instead of +4, he will gain +6, and the loser -6.
That result will also decrease the draw probability on Winter Duel.
Inflation
----------
ELO system tend to inflate rating over time.
Because it's only comparing 2 players ratings to determine an new rating, a better player who plays often will gain more and more points over time.
Think of GPGNet ladder. At the beginning, the top 10 players were around 1900.
And the "end", they were around 2500.
Does that means that their skill is increasing ? Maybe. But not that much.
The rating increase because, as all good top tier players, they plays often. And as they are goods, they win games, and gain points, increase the rating over time.
To reduce inflation, ELO system got a "K-Factor", limiting the maximum points a player can have for a game. GPGNet got an default K-Factor of 30 if I remember correctly.
The Chess leaderboard make the K-Factor varying depending of the rating of the player (A 2400+ players got a K-Factor of 16 where a noob got 32).
That's arbitrary, not accurate and only artificially decrease the inflation problem. But it's still there.
Trueskill is less sensible to inflation. When you start a game, TrueSkill compute the possible -and probable- outcome of the game : it estimates what are your chances of winning.
Let's say it predict that you will win that game.
If you really win the game, as it was the expected outcome, you gain points depending of the "chances of ranking" factor (itself depending of the difference in skill between players, and the outcome probability).
If you lose the game, as an unexpected result, you will lose more points.
On paper, it sounds a lot like ELO, but the algorithms behind are more evolved, and once you reach your real rating, unless you play really badly or improve a lot, you will stay at that rank.

That graph represent a trueSkill rating for football teams. As you can see, team 1 is the best, and their rank stay stable.
Team 5 had a bad start (you lost your first games), and were badly rated. Over time, TrueSkill manage to correct that and find a stable skill.
Team 2 is the most interesting case : It's a new team, really good. At start, it was rated way under their real skill.
But you can notice how fast the system was able to find their real place !
That's another advantage of TrueSkill : It can rate good and fast !
TrueSkill advantages
----------------------
TrueSkill can rate ANY game.
That's why ANY custom game or ranked will contribute to your skill rating.
TrueSKill can lower the impact of your result : A FFA is less meaningful than a 1v1, so the outcome of a FFA will contribute less.
Of course, there is a separated 1v1 leader board, but any game rates you.
That's mean that EVERYONE will have a rating.
That's mean that TrueSkill can show you the rating of the players when you join ANY game.
That way, you can know if the game you will play will be balanced for you or not.
Better !
As explained before, TrueSkill compute an outcome of the game before it plays.
But it can also compute a match quality factor : How well the game is balanced.
That's mean that for a team game, TrueSkill can give you the best team combination in order to balance the teams !
Of course, it doesn't take in account the fact that some players plays better in team, or teamed with a particular player, or play awfully on that map or that particular spot. But hopefully, as any game contribute to your rating, these abnormalities will smooth and the game quality factor can be trusted.
Some examples :
(mean is skill and standardDeviation is how the game know you - lower = closer to mean)
Code:
2 vs 2 :
player 1 : mean=20.0000, standardDeviation=8.0000
player 2 : mean=25.0000, standardDeviation=6.0000
player 3 : mean=27.0000, standardDeviation=8.0000
player 4 : mean=40.0000, standardDeviation=5.0000
configuration : 2 teams
the best composition for teams is
team 1
player 1(mean=20.0000, standardDeviation=8.0000)
player 4(mean=40.0000, standardDeviation=5.0000)
team 2
player 3(mean=27.0000, standardDeviation=8.0000)
player 2(mean=25.0000, standardDeviation=6.0000)
Game Quality : 45.7996125513%
3 vs 3 :
player 1 : mean=20.0000, standardDeviation=8.0000
player 2 : mean=25.0000, standardDeviation=6.0000
player 3 : mean=27.0000, standardDeviation=8.0000
player 4 : mean=40.0000, standardDeviation=5.0000
player 5 : mean=32.0000, standardDeviation=2.0000
player 6 : mean=24.0000, standardDeviation=4.0000
configuration : 2 teams
the best composition for teams is
team 1
player 4(mean=40.0000, standardDeviation=5.0000)
player 1(mean=20.0000, standardDeviation=8.0000)
player 6(mean=24.0000, standardDeviation=4.0000)
team 2
player 2(mean=25.0000, standardDeviation=6.0000)
player 3(mean=27.0000, standardDeviation=8.0000)
player 5(mean=32.0000, standardDeviation=2.0000)
Game Quality : 57.6735412498%
in 2 vs 2 vs 2 :
(same players as 3v3)
configuration : 3 teams
the best composition for teams is
team 1
player 2(mean=25.0000, standardDeviation=6.0000)
player 3(mean=27.0000, standardDeviation=8.0000)
team 2
player 4(mean=40.0000, standardDeviation=5.0000)
player 1(mean=20.0000, standardDeviation=8.0000)
team 3
player 6(mean=24.0000, standardDeviation=4.0000)
player 5(mean=32.0000, standardDeviation=2.0000)
Game Quality : 30.2684790963%
That would be an information available anytime in the Forged Alliance Lobby, and in the list of games !
Now the core of trueSkill. You can stop here this if you are not interesting in deep details

In TrueSkill, skill is separated in 2 variables. One is your estimated skill.
The others is the incertitude of your skill :
For example, when you start, we assume that you are an average players, meaning a skill of 1500 (this is an arbitrary value matching GPGNet values).
But as the system doesn't know you, you have a incertitude value of 500.
That's mean, more or less, that your real skill is between 1000 and 2000.
That's a big range, and that's why an uncertainty value of 500 mean nothing for the matchmaking system. (So you can be matched with almost anyone until the system learn you)
The more you play, the less uncertain the value of your skill will be. If you are a perfect robot playing others perfect robots with various skills, your uncertainty value will decrease toward 0, meaning that your estimated skill is closer and closer to your real skill. (meaning that 100% of the games you play have the outcomes predicted before the start of these games)
But that's for your skill, not your rank. For ranking you, trueSkill use a conservative estimate of your skill. That value is 3.
For a skill of 1500, and an incertitude of 500, that's mean that your rank will be 1500-3*500 = 0.
Trueskill always rank you with a value that is likely lower that your true rank would be, but unlikely worst : You can think at your rank as your skill in a real bad day.
That's not really fair, but that way we can compare players efficiently : given 2 ranks, 1600 and 1650, we can be almost sure that the 1650 player have more chances to be slightly better than the one at 1600.
TrueSkill ladder will be more "compact" with numbers, but a little difference in these numbers have a bigger impact that on ELO.
More details here :
http://research.microsoft.com/en-us/projects/trueskill/even more details here :
http://www.moserware.com/2010/03/comput ... skill.htmlMy implementation is the Moser one. It will be open-sourced separately to the FA Forever project.