Thursday, February 1, 2018

A Data-Driven Strategy Guide for Through the Ages, Part 2

Index:

1. Introduction (Link to Part 1)
2. Data Analysis
    2.1 Classification: Infrastructure Development (Current Article)
    2.2 Classification: Cards Played (Link to Part 3)
    2.3 Separating Players with TrueSkill (Link to Part 6)
3. Analysis for Boardgamers
    3.1 Infrastructure Development (Link to Part 4)
    3.2 Cards Played (Link to Part 5)
    3.3 Mistakes made by Good Players (Link to Part 7)

2. Data Analysis

2.1 Classification based on Infrastructure Development

The first idea is a simple 2-type classification---"good" or "bad".  We can then learn from the good behaviors and avoid the bad ones.  We will say a "good" outcome is >90% of the winner's score, and a "bad" outcome is below that. This threshold is motivated by statistics.
If we remove one winner from every game, the scores of remaining players follow the above distribution.  We can see that the median of non-winners is about 80%, which should not be defined as "good".  Choosing 90% means that about 53% the results are "good" (the actual winner included), and 47% are "bad".  This is a pretty comfortable ratio for a classifier without special tuning.  Also note that the game allows resigning.  That represents the small bump around 0 in the above Figure.  Because they did not finish the game, their data are incomplete.  All resigned players are removed from our consideration.  From 10k+ games, we get 30k+ results to classify.

Next, we look at the development of infrastructure.
This examples shows the amount of foods generated each round by each player in a 4-player game (note that grey resigned at round 15).  One can find the same information for 5 other aspects in the infrastructure.  This game lasted 17 rounds, which means that the data dimension is 17*6 = 102.  We will not directly use the value in the above figure.  Since the final result is a relative quantity with respect to other players, it only makes sense for the input to be relative quantities.  We will normalize this vector by the mean of the game, indicating whether a player is doing relatively better or worse than other players at the same game and the same round.

We are not ready to throw this into a Machine Learning Classifier yet.  The number of rounds actually varies from game to game, but a typical classifier wants all data points to have the same dimensionality. 
We have two ways to circumvent this problem.

We first consider the entire game duration, but rescale that into 11 portions independent of how many rounds there are. This gives us a 66-dimensional "infrastructure development" vector per game per player, and we can classify the final result accordingly. We use the support vector machine classifier (svm.svc) from Sci-kit Learn.  It is trained on a random subset of N points, and validate it on a disjoint subset of the same size. For 3k<N<10k, the validation performance stays around 73%, and the linear kernel performs equally well with nonlinear ones. (Exception: The sigmoid kernel performs no better than random guesses. I have not figured out why.)

73% may not sound impressive, but it is already useful for our purpose. Unlike GO, TtA is not fully deterministic. There are hidden and random elements. Actually, our choice of 6 aspects does not even cover all deterministic information. The null model that always predicts "good" would have had a performance of 53%, with a standard deviation less than 1% at N=3k. Thus, the classifier is performing quite well and already learned some strategic lessons.  We train with the linear kernel 10 times and take the average of its coefficients.
These coefficients tell us "being better than your opponent at what aspect, during which time of the game", is more likely to help you win the game. We will look closer at these results and analyze them in the actual game context in Section 3.1.

We can see that in the above Figure, the coefficients tend to be larger in later portions of the game. That is expect, but also a bit problematic for our purpose. TtA simulates an economical development.  Small investments early in the game can snowball into huge benefits later.  In the last few rounds, players are typically cashing in those benefits.  Monitoring those "cashing in" moves is the most accurate way to predict the outcome. However, that is not exactly what we want to learn here.  We want to know the subtle effects of early investments. 

It is not safe to just look at the earlier portions in the above Figure. When we play this game, we make early decisions without knowing the later developments. The above classifier is already contaminated by information from the future. This will create a post-selection effect such that the coefficients on earlier portions can be misleading. In the next Section, we will provide an obvious example. For now, if we want to learn things from earlier portions, it is the best to ask our classifier to ignore future information.

This brings us to our second method.  We will only use the information from the first X rounds, with X up to 11. Despite the variability of the actual duration of each game, the first 11 rounds are almost always early-mid stages of the game.  We will take these (6*X) dimensional vectors and feed into the classifier.  After the same training and validation process, we get their performance as a function of X.
We can see that after the 1st round, the performance is better than blind guesses, and monotonically increasing.  With all 11 rounds of data, the classifier is correct 66% of the time, not too far from the 73% while using the full duration.  This implies that early developments already have small but measurable effects on the result.
For example, the above Figure shows the coefficients from round 0 to round 4. One can see clear difference between different aspects.  Military actions, foods and culture are unimportant, or even bad.  Civil actions, science and resources are generally better. We will look closer at these results and analyze them in the actual game context in Section 3.1.

The development of infrastructure does provide strategic lessons.  It tells us when and what aspects are more important to victory. However, such lessons might be a bit vague.  In other words, the Intermediate Status chosen here might be a bit far away from individual moves.  In the next Section, I will consider a different choice.

No comments:

Post a Comment