Index:
1. Introduction (Link to Part 1)
2. Data Analysis
2.1 Classification: Infrastructure Development (Link to Part 2)
2.2 Classification: Cards Played (Link to Part 3)
2.3 Separating players with TrueSkill (Current Article)
3. Analysis for Boardgamers
3.1 Infrastructure Development (Link to Part 4)
3.2 Cards Played (Link to Part 5)
2. Data Analysis
2.3 Separating players with TrueSkill
TrueSkill is a system invented to calculate the relative skills between players for games with non-deterministic outcomes. We use the default model in python that starts a player with skill=25, and standard deviation of 8.33. Since we will continue to use the approach of separating "good" and "bad" performance in a game by 90% of the winner score, update the TrueSkill by the same standard. Every player in a game with a "good" performance "defeats" everyone with a "bad" performance.
We go through the results of 30k games to update the every players TrueSkill.
The final distribution of TrueSkill looks like a good Bell Curve with a few sharp noise. In the meanwhile, it is also fun to see how many games has someone played.
See how this almost looks like a x*y=const. curve? That means on average, every game has a roughly equal share of experienced and inexperienced players.
With the TrueSkill information, we can look into a few more interesting statistics. Cross-referencing these different statistics allows us to get more strategic insights. Note that I am using the final TrueSkill instead of developing TrueSkill along the way. The implicit assumption here is that for those who played many games, they stay at the same skill level for the majority of games they played. Thus the final state is roughly the equilibrium state, and the entire history of games I included are also in the equilibrium state.
First of all, since TtA has hidden information and multi-player interaction, "good moves" do not always win the game. Instead of classifying with respect to the outcome of a game, we can classify with respect to the skill of a player. This can answer the question of "what do better players do?" Maybe better players already saw through the luck-dependent noises and know how to consistently perform well.
For example, in this chart we compare the prediction of outcome with the prediction of player skill, based on the infrastructure development. We can see that in the beginning, stronger players are already doing something measurably different from weaker players, despite that the outcome of the game is still pretty unclear. Then, for a long time, to almost the end of the game, we actually cannot see further differences between stronger and weaker players. All we know is that as the game progresses, it becomes more and more clear that who will win.
Also, as discussed earlier, classifying with respect to the outcome can sometimes be biased by the post-selection effect. A card can appear to be good simply because it's adding frostings on a cake, instead of being the true key of victory. The above cross-referencing method, with the appropriate choices of subset of cards, we can try to remove the post-selection effect. We can also recognize important lessons that even good players have not seen yet.
We can also calculate the average TrueSkill for players in a game. If it is larger than 25, we say it is a "good game". Otherwise we say it is a "bad game". We can then perform the previous analysis on the "good games" to see if there is any difference. This can tell us "how to win if you are playing with good players?" We can see whether it is very different from the situation with typical players.
3. Analysis for Boardgamers
3.1 Infrastructure Development (Link to Part 4)
3.2 Cards Played (Link to Part 5)
2. Data Analysis
2.3 Separating players with TrueSkill
TrueSkill is a system invented to calculate the relative skills between players for games with non-deterministic outcomes. We use the default model in python that starts a player with skill=25, and standard deviation of 8.33. Since we will continue to use the approach of separating "good" and "bad" performance in a game by 90% of the winner score, update the TrueSkill by the same standard. Every player in a game with a "good" performance "defeats" everyone with a "bad" performance.
We go through the results of 30k games to update the every players TrueSkill.
The final distribution of TrueSkill looks like a good Bell Curve with a few sharp noise. In the meanwhile, it is also fun to see how many games has someone played.
See how this almost looks like a x*y=const. curve? That means on average, every game has a roughly equal share of experienced and inexperienced players.
With the TrueSkill information, we can look into a few more interesting statistics. Cross-referencing these different statistics allows us to get more strategic insights. Note that I am using the final TrueSkill instead of developing TrueSkill along the way. The implicit assumption here is that for those who played many games, they stay at the same skill level for the majority of games they played. Thus the final state is roughly the equilibrium state, and the entire history of games I included are also in the equilibrium state.
For example, in this chart we compare the prediction of outcome with the prediction of player skill, based on the infrastructure development. We can see that in the beginning, stronger players are already doing something measurably different from weaker players, despite that the outcome of the game is still pretty unclear. Then, for a long time, to almost the end of the game, we actually cannot see further differences between stronger and weaker players. All we know is that as the game progresses, it becomes more and more clear that who will win.
Also, as discussed earlier, classifying with respect to the outcome can sometimes be biased by the post-selection effect. A card can appear to be good simply because it's adding frostings on a cake, instead of being the true key of victory. The above cross-referencing method, with the appropriate choices of subset of cards, we can try to remove the post-selection effect. We can also recognize important lessons that even good players have not seen yet.
We can also calculate the average TrueSkill for players in a game. If it is larger than 25, we say it is a "good game". Otherwise we say it is a "bad game". We can then perform the previous analysis on the "good games" to see if there is any difference. This can tell us "how to win if you are playing with good players?" We can see whether it is very different from the situation with typical players.
No comments:
Post a Comment