Thursday, February 1, 2018

A Data-Driven Strategy Guide for Through the Ages, Part 1


1. Introduction (Current Article)
2. Data Analysis
    2.1 Classification: Infrastructure Development (Link to Part 2)
    2.2 Classification: Cards Played (Link to Part 3)
    2.3 Separating players with TrueSkill (Link to Part 6)
3. Analysis for Boardgamers
    3.1 Infrastructure Development (Link to Part 4)
    3.2 Cards Played (Link to Part 5)
    3.3 Mistakes made by Good Players (Link to Part 7)

1. Introduction

1.1 About this game:

Through the Ages (TtA) is a very popular board game first published in 2006.  Its most current revision (2015) is ranked at top 3 in the world.  For people who are familiar with boardgames, this page should tell you everything you want to know about it.

For everyone else, here is a small diagram that introduces the concept of general Euro/Economy/Strategy games, and some specific features of this game.
Basically, every player will have access to some resources.  Throughout the game, they decide how to invest those resources.  One option is direct conversion into points, since the player with the most points wins the game in the end.  On the other hand, it is often wiser to invest resources on various infrastructure.  These are the things that can continuously help you generate points and resources.  Choosing when and what to invest on is the key to improve your efficiency, and the key to victory.

TtA is a Civilization Simulation game.  You manage resources like foods, ores and knowledges; you invest them to develop technologies that improves farms, mines, and various other aspects of your country. Then finally, the country with the most aspiring cultural legacy is the winner.

Typically, a player has to make more than hundreds of decisions during a game.  The consequence of one decision often remains unclear until ten (or more) decisions later.  Thus, the strategy manifests mostly as human intuition and high-level (vague) reasoning.  This is exactly the type of problem that modern data science might be useful.

1.2 A Data-Driven Strategy?

After the tremendous success of AlphaGo, the world knows that AI can play deep strategy games.  It turns out that how an AI plays a game is very similar to how a person does.  We both follow the middle flow chart in the following diagram.

For example, in TtA, you can make a single move to build a farm.  When you do that, it usually comes with a train of reasons. "This will produce foods for me, which enables me to increase population in a future move. Then I can use that population as miners/soldiers/... etc."  Such reasoning probably stops here, because you don't exactly know how that extra miner/soldier helps you win the game.  Therefore, you cannot exactly calculate the actual effect of this farm, nor its difference to other choices you have.  Experience and intuition takes over here, which gives you a rough feeling of how "good" a farm is, and allows you go move on and evaluate your other options.

The place where explicit derivation stops and intuition takes over is marked as Intermediate Status in the above diagram.  For human beings, the derivation from individual moves to the intermediate status is usually called "tactics".  Analysis from the intermediate status to the final result is often called "strategy".  An AI basically uses two algorithms to perform these two functions.  For example, a Monte-Carlo tree search can go through individual moves and see their outcomes; a Neural Network can learn from millions of examples of the intermediate status and tell you which ones are closer to victories.

I am not taking the right path all the way as a human, nor am I taking the left path all the way as an AI.  I wish to take a diagonal path goes from top-right to bottom left.  Therefore, a Data-Driven Strategy Guide should tell us what intermediate status are more likely to win, and a human player will be in charge of finding the best tactics to achieve such intermediate status.

There is a very practical reason why I am doing this. In order for an algorithm to derive from individual moves, one must hard-code all the rules. GO is a somewhat biased example that the rules are extremely simple.  The rules of GO can be written in 10 sentences, while the rules for games like TtA are often booklets of 10+ pages.  The "rules" of a real-life problem may not even be fully captured by any finite number of words.  In these kind of situations, human minds are still superior in creativity and thinking outside the box.  Thus in general, I find it more natural for human to come up with "what can be done", and then consult an AI to understand the final consequence of such option.  In other words, AI can give you answers, but we are in charge of asking the right questions.

1.3 Data Source

Boardgaming-online has a very nice implementation of Through the Ages (TtA).  It also keeps journals of all past games.  A journal is a pretty detailed record of a game. Although it is insufficient to reconstruct the actual entire game, it should contain some information to offer a glimpse into the strategy.

Using Python packages Requests and BeautifulSoup, I scraped the content of game journals as Pandas DataFrames. There are more than 100k stored games, and I have managed to scrape 10k+ games at this moment.  Hopefully these will be enough to provide some insights.

A complete journal has about 5-10 pages, which represents 2-4 data points (depending on the number of players).  The first challenge is to use my knowledge of the game and the interface to parse the journals, in order obtain simple information that can be fed into machine learning algorithms.  This is a tedious process involving not only standard selections in pandas frame, but also quite a few customized parsing routines. I will not bother the readers with details here.  Let us jump ahead to some "clean" data I extracted from the journals.

1.4 Outline:

In Section 2, we will explore a few different choices of Intermediate Status, and see how Machine Learning can estimate the final result from them. Naively speaking, we want the Intermediate Status to be close enough to the final result, such that the Strategic Evaluation is accurate. On the other hand, we want it to be close enough to individual moves, such that Tactical Derivation is not too difficult.  GO is again a biased example that there is a clear choice for Intermediate Status: a 19x19 matrix with 3 possible values at each entry. In TtA, given the form of data we have, even choosing the form of the Intermediate Status is a main challenge.

I will first setup a classification problem and explain the process of how to obtain the training/validation set from the raw data. I will also explain the how the Machine Learning algorithms can help us formulate strategy. I will apply this classification problem in Section 2.1 and 2.2. These is the main section that Data Scientists might be interested in.

In Section 3, we will analyze the results of Section 2 in the actual context of the game.  This is the main section that boardgamers might be interested in.

First Major Update (02/12/2018)

(Link to Section 2.3)    (Link to Section 3.3)

In Section 2.3
, I will introduce another algorithm--TrueSkill. TtA is not an entirely deterministic game. TrueSkill takes that into account and allows us to ask more relevant questions. In addition to classification based on individual game results, we can instead classify behavior based on players' TrueSkill.

In Section 3.3, we look into all cards during Age A and Age I. By cross-referencing the outcomes of individual games with players's skill, we discover a few interesting mistakes made by stronger players.


  1. This is excellent! Can you share the data? So we don't have to scrape this if we want to look at the data ourselves.

  2. This comment has been removed by the author.

  3. You can find the processed data here:

  4. is it possible share not only the processed data, but the raw data?

    1. Raw data is here: