Thursday, August 13, 2020

Malik the Grey

Story:

As an orphan on the street of Burnt Hill, Malik's dream job used to be the local gang's leader.  With his street-smart and natural charisma, it didn't seem to be an unreasonable goal.  By age 16, he was already the "leader" among some trouble-looking teenagers.  When a group of strange outsiders arrived with a truck of unconcealed valuable objects, Malik was eager to reprioritize the target of their next heist.  Words of caution from senior gangsters only fuelled their enthusiasm.  And of course, they did not know what they were up against until it was too late.

Malik led the squad to create distractions, while his trusted friend, Rita, led the other squad to gather the prize.  When they slipped past the three sentinels as planned, none of them could imagine the brutal trap waiting in the end.  Stoic knights appeared from nowhere when Rita's squad reached for the seemingly unguarded treasure, and cut them down without mercy.  Malik was forced to watch that while surrounded by mercenaries with cruelly playful intentions.

Fate is a strange thing.  Driven by his friends' death and a strange sense of loyalty, Malik tried to throw a hail mary---a clearly self-sacrificing act for a slim chance for the rest of his squad to escape.  That had no effect other than adding to the entertainment of the seasoned mercenaries.  However, all of their lives were spared by the command of the expedition leader.  This mysterious woman appeared with an obvious commanding aura, and the mercenaries immediately obeyed.  Malik and his squad was captured but not harmed.  

She told him it was due to his courageous act.  She apologized for his dead friends on behalf of her late intervention, and gave them proper burials as the circumstance allowed.  For a while, Malik believed her.  The five teenagers accepted the offer to be the expedition's local guides and errand boys.  In the following month, Malik learned to operate strange machines, and was overwhelmed by the information from the outside world.  His dream job, had changed from local gang's leader, to an adventurer.  The others felt exactly the same way.  When the expedition ended, they watched the departing caravan with aspiration.  Armed with this new determination, and significant payment for their work, their adventure party would have been ready for the road in a few months.

Unfortunately, in the end, Malik took the road alone.  While they were busy preparing, his friend started to get sick in various ways, and died one by one.  When his hairs started to turn grey and his fever rose, Malik thought it was his turn, too.  He also realized that it was not a coincident.  The dots started to connect.  Why entrusting the operation of these amazing machines to the hands of local boys.  Why people avoided them during the operation.  They were spared that day not because of his courageous act.  It was because expendable labours were needed.

The fever did not take his life.  This 16-year-old was left with a full head of grey hairs, way ahead of his time, but will live on with his tormenting memory.  Malik took the road and became an adventurer.  What gleamed in his eyes, was not aspiration.


Description:

In the beginning of the adventure, Malik was less than 20 years old.  Although he is quite used to pretending that he's around 40.  He does not hide his origin from Burnt Hill.  So night-vision and knowledge about alien technology is known to companions and maybe employers.  While on the road, he will use typical rogue skills to perform recon jobs.  Although he is also good at planning and leading operations, he is reluctant to do so due to some past trauma.


Character Plan:

Rogue at the 1st level, with the intention to take Arcane Trickster at 3rd.  

Pretty resistant against harmful effects from proximity to Orichalum.  That's the reason he did not die.

Wednesday, June 13, 2018

RPG Material: The True Way of Gruumsh

The TWG is an emerging cult in the Many Arrow Kingdom.

Its followers are usually orcs and half-orcs. Many of them are Lawful Good Clerics and Paladins.

These orcs and half-orcs are very different from our common impressions. They strongly believe that Orcs should, and are able to live peacefully together with other surfaces races---except for the treacherous elves.  These well-disciplined acolytes will go out of their ways to help other surface races in order to build the new reputation of Orcs.

Instead of the widely spread tale that the Elven patron, Corellon Larethian, took Gruumsh's eye in combat, the cultists advocate a different story. Gruumsh the True Seer took his own eye, so he can be forever vigilant against Elven treacheries. The greatest treachery among all is the Elven guise of culture, magic, and elegance. Elves were the first ones to tap into the power of the weave. They exploited the benefit for thousands of years, which sustained their extravagant society. However, the dire consequence of a weakened weave, and the chaos from the constant fight over its control, is suffered by all races across Faerun.

From these Orcs, any non-elf race can expect nothing less than a common paladin. Stern, kind, and very helpful. Their attitude toward elves varies a lot, though. For example:

1. Wild elves, especially wild elf druids, are viewed with the highest esteem. They recognize the mistake of their ancestors and are trying their best to seek remedy and restore balance. TWG acolytes will even consider these Elves more devout in the course of Elven repentance.

2. Elf commoners should not be held guilty for the sins of their ancestors. However, they should at least be aware of it and admit the truth.

3. Half-Elves are the tragic consequence of other races fallen prey of Elven treachery. They should be treated with pity and sympathy.

4. Drows are as treacherous as their golden brothers, but at least they admit it.

5. Elven nobles continues to exploit the legacy of their ancient treachery. They do not even deserve a quick death by the honorable Orcish blades. Their secrets should be exposed and they should be publicly shamed for the rest of their lives.

Special Perk:
An TWG acolyte roll at advantage against any Elven illusion and deception, although they tend to disbelieve Elves even if they are telling the truth. It is unclear whether such effect is magical, or is it a simple consequence of their constant paranoia over everything Elven.


Rumors:
A few scholars in Candlekeep noticed this emerging cult and are astonished by such blatant contradiction between TWG and the typical, salvage icon of Gruumsh. A popular theory is that the cultists are actually worshiping a deceiver. It is well known that any deity of the Trickery domain, such as Mask, has the ability to disguise as another deity and steal the divine power from worshipers gathered in the process. It could also be done by a powerful outsider. For example, a popular theory favors a Spectator---for its Lawful Neutral alignment, and its icon of a single eye which mimics the holy symbol of Gruumsh. None of these theories can be directly confirmed though.

Thursday, March 15, 2018

Removing Skill Bias from Gaming Statistics

While I was analyzing the data for a board game, a very interesting and important issue came up. In all kinds of gaming data, it's always about two things.

1. What moves did a player make during the game?
2. Did that player win?

With modern progresses of machine learning, there are many different ways to look at the data. However, doesn't matter how we look at the data, anything we can directly get from this type of data is conditional probability.  Namely,

The probability to win given a certain move.

The inconvenient truth is that, this conditional probability has no definite connection to the intrinsic value of such move. Colloquially, the reason is very simple. Whenever we see a move with a high conditional winning probability, we never know which the following 2 cases is true:

(1) This move really helps you win.
(2) Good players prefer to make this move, and good players win more often.

Unless all players who showed up in the data are exactly equally skilled, the conditional winning probability always come from the combination of these two effects. In order to get the intrinsic value of such move, we need to isolate effect (1). Thus, we will be referring to effect (2) as the ``skill bias'' and attempt to remove it. Also, the conditional probability is something easily computable from the data. We would hope that the removal of skill bias is not much more complicated.

To my surprise, I asked this question on stat.stackexchange but no one pointed me to any really relevant existing literature. I also asked a few Math/Econ researchers and none of them is particularly aware of this topic. So, I decide to solve this problem myself with a simple Mathematical model.

If you are interested in the exact math behind the answer, you can read my paper here.

Long story short, to my pleasant surprise, I got a somewhat simple answer.

Intrinsic Value of a Move = (P1 - P2)/(1-8d^2)

P1 is just the conditional probability to win given this move.

P2 is more subtle. It is the expected chance for someone from Group 1 to defeat someone from Group 2. Group 1 and 2 can be thought of as random subsets with the following selection rule:
There are 100 games in the data, player A appears in 17 of them, made this move in 8 games, and did not make this move in the other 9 games. Player A has a 8% chance to be selected from Group 1, and 9% chance to be selected from Group 2.

The readers can probably appreciate that P2 is trying to calculate contribution from effect (2) and allows us to remove it. Acute readers will also notice that such removal will result in an under-estimation. That is because part of the advantage of being a good player is that they make this move more often. Removing that altogether is removing part of the intrinsic value of this move. That is why after the subtraction, we divide the answer by a number that is slightly smaller than 1. d is a small number that I won't explain here. Please read my paper if you are interested. The paper also include instructions on how to calculate P2 and d in general.

The End of Group Thinking?

From their definitions, it is not guaranteed that P1 > P2. In other words, it is possible for a bad move (Intrinsic Value < 0) to have a high conditional winning probability (>50% in 2er games), simply because good players prefer to make this move (because they mistook this move as being good). Without a proper removal of skill bias, it is really difficult to recognize this situation. Good players will continue to make those bad moves that they consider good, and the conditional winning probabilities of those move will be high. It will appear convincing as if those moves were good. 

This self-fulfilling prophecy of good moves is one of the main reasons that Group Thinking occurs.  It is fortunate that by removing a skill bias, we have a way to directly combat group thinking.

In fact, Group Thinking is also nurtured by our habit of  "learning form the experts". We listen to advices from good players, watch and mimic their moves. Therefore, we inherit their biases. One amazing property of the Intrinsic Value is that it does not care about the average skill of players in the data! You can do it with a group of experts, or a group of mediocre players. You will get the same answer. Such an objective approach can help us to get rid of existing biases.



Monday, February 12, 2018

A Data-Driven Strategy Guide for Through the Ages, Part 0

Through the Ages (TtA) is a multi-player, deep strategy game with partially hidden information and variable background. All these properties make it hard to learn strategic lessons from statistical analysis.

I scraped 30k+ game data from boardgaming-online.  I use Support Vector Machine to predict the result of a game up to 70% accuracy, and the skills of a player up to 60% accuracy (out-of-sample performance).
Together with these predictions, we can learn important strategic lessons about which aspects are more important during different stages of a game.
We can also learn which specific cards are helpful or harmful to your strategy.
Read more here......

Also, I found a way to remove player skill bias and strategic misconceptions from the data. You can read about that here.

A Data-Driven Strategy Guide for Through the Ages, Part 6

Index:

1. Introduction  (Link to Part 1)
2. Data Analysis
    2.1 Classification: Infrastructure Development (Link to Part 2)
    2.2 Classification: Cards Played (Link to Part 3)
    2.3 Separating players with TrueSkill (Current Article)
3. Analysis for Boardgamers
    3.1 Infrastructure Development (Link to Part 4)
    3.2 Cards Played (Link to Part 5)
    3.3 Mistakes made by Good Players (Link to Part 7)

2. Data Analysis

2.3 Separating players with TrueSkill

TrueSkill is a system invented to calculate the relative skills between players for games with non-deterministic outcomes. We use the default model in python that starts a player with skill=25, and standard deviation of 8.33. Since we will continue to use the approach of separating "good" and "bad" performance in a game by 90% of the winner score, update the TrueSkill by the same standard. Every player in a game with a "good" performance "defeats" everyone with a "bad" performance. 

We go through the results of 30k games to update the every players TrueSkill.
The final distribution of TrueSkill looks like a good Bell Curve with a few sharp noise. In the meanwhile, it is also fun to see how many games has someone played.
See how this almost looks like a x*y=const. curve? That means on average, every game has a roughly equal share of experienced and inexperienced players. 

With the TrueSkill information, we can look into a few more interesting statistics. Cross-referencing these different statistics allows us to get more strategic insights. Note that I am using the final TrueSkill instead of developing TrueSkill along the way. The implicit assumption here is that for those who played many games, they stay at the same skill level for the majority of games they played. Thus the final state is roughly the equilibrium state, and the entire history of games I included are also in the equilibrium state.

First of all, since TtA has hidden information and multi-player interaction, "good moves" do not always win the game. Instead of classifying with respect to the outcome of a game, we can classify with respect to the skill of a player. This can answer the question of "what do better players do?" Maybe better players already saw through the luck-dependent noises and know how to consistently perform well.  
For example, in this chart we compare the prediction of outcome with the prediction of player skill, based on the infrastructure development. We can see that in the beginning, stronger players are already doing something measurably different from weaker players, despite that the outcome of the game is still pretty unclear. Then, for a long time, to almost the end of the game, we actually cannot see further differences between stronger and weaker players. All we know is that as the game progresses, it becomes more and more clear that who will win.

Also, as discussed earlier, classifying with respect to the outcome can sometimes be biased by the post-selection effect. A card can appear to be good simply because it's adding frostings on a cake, instead of being the true key of victory. The above cross-referencing method, with the appropriate choices of subset of cards, we can try to remove the post-selection effect. We can also recognize important lessons that even good players have not seen yet.

We can also calculate the average TrueSkill for players in a game. If it is larger than 25, we say it is a "good game". Otherwise we say it is a "bad game". We can then perform the previous analysis on the "good games" to see if there is any difference.  This can tell us "how to win if you are playing with good players?" We can see whether it is very different from the situation with typical players.


A Data-Driven Strategy Guide for Through the Ages, Part 7

Index:

1. Introduction  (Link to Part 1)
2. Data Analysis
    2.1 Classification: Infrastructure Development (Link to Part 2)
    2.2 Classification: Cards Played (Link to Part 3)
    2.3 Separating players with TrueSkill (Link to Part 6)
3. Analysis for Boardgamers
    3.1 Infrastructure Development (Link to Part 4)
    3.2 Cards Played (Link to Part 5)
    3.3 Mistakes made by Good Players (Current Article)

3. Analysis for Boardgamers

Disclaimers: 
(1) We can only learn correlations from data.  Whether these correlations actually imply causation is up to our interpretation.
(2) The data comes from 30k+ recent games at boardgaming-online . 

3.3 Mistakes made by Good Players 

As explained in Section 2.3, we calculated the TrueSkill of each player. This allows us to use joint-statistics to answer quite a few questions which were unclear if looking at the game result along. We have also gotten enough data that we can afford to separate 2er games from 3, 4 ers. The zero-sum nature and the relative quantity of hidden information might make 2er games quite different.

We will be showing similar charts as Section 3.2, with a few improvements. First of all, the title of the chart will tell you whether this is for predicting the result of the game, or predicting the player's skill. The single number (less than 1) in the title is replaced by a percentage. It still tells you how good the prediction is. I also subtracted the performance of trivial guess already, so it is easier to see how good such a prediction really is.

Let us first look at all the cards during Age A and Age I for 2ers.
We should first note that the first chart here has a bad performance, consistent with 0% (trivial guess). That means the usage of Age A and Age I cards fails to predicting the final outcome of the game. That is not very surprising. 2ers are probably decided by big military and/or culture swings which comes much later in the game. However, we can still predict players' skill quite well. We know which cards tend to be played by stronger players (blue bars), and which tend to be played by weaker players (red bars).  We cannot directly know whether these choices help you win. All we know is that those choices appear to be related to player skill.

Next we do the same thing for 3ers and 4ers.
Now this is interesting. The prediction for the game outcome is no longer consistent with the trivial guess. It becomes very informative to compare these 2 charts.  The prediction for player skill is better. This is somewhat expected. Since it is quite early in the game, it will be difficult to predict the final outcome. However, good players tend to follow certain strategy, which might already be distinguishable in their early choices. 

I can see 4 most striking differences here.

Pyramids vs Library of Alexandria.
Stronger players slightly prefer Pyramids over Library, more than weaker players do. However, Library performs significantly better for winning the game. I cannot see any alternative interpretation here and must conclude that this is a mistake made by strong players. Note that strong players must have performed better in many other places to make up the difference. However, there is very little doubt that at a 1-to-1 comparison, Library is better. Start to use it more!

I don't think that in the long run, Pyramids' ability is weaker. I looked at the statistics of how early do people build Age A Wonders. Pyramids and Library tend to be built as early as possible, which means delaying the 2nd Philosophy. 1 CA is better than 1 Science later in the game. However, this early in the game, Science might be slightly better, or at least equal. 1 turn earlier into any Age I technology can have a compounding effect that snowballs your economy. Combining with other benefits from Library makes it somewhat more powerful than Pyramids built at the same time.

Thus, maybe we should stop rushing to complete Pyramids. I know, it is a tempting package deal to get that CA early, delay population growth and get 1 extra food. But maybe it's not worth delaying the extra Science production. 

Age I Wonders.
Stronger players do not build Age I Wonders more often than weaker players. However, except for the Great Wall, all other Age I Wonders seem to win a lot. The interpretation here requires a bit more steps. Recall that if we use Age III cards to predict the game result, Wonders performs exceptionally well. That does not mean those cards win the game. They simply "indicate" that the player has enough CA, science and resources to complete a big Wonder. Leveraging those into other developments probably wins the game, too. 

We did not have to consider this "post selection" effect for Age A Wonders, because everybody starts at roughly the same condition. For Age I, we should ask whether there could be significant differences in infrastructure, which determines a player's ability to build a Wonder.

That does not seem to be the case to me. Even if someone upgrades Iron early, it will take 3 rounds before an upgraded mine to generate a net resource gain. During the prime time to build Age I Wonders, a few timely Yellow Cards probably give you more resource advantage, and that is not much. Thus, it seems like these Wonders are really contributing to victory, instead of just "indicating" someone's exiting advantage. Furthermore, if that were the case, Great Wall should have been an equally good "indicator", but it is not.

Thus, I again conclude that Taj Mahal, University, and Basilica, are under-valued by good players. Investing your early resources and CA in them seems to be a good deal, compare to other things you might have done. This is probably because all other ways to use resources require population and science. You are likely in shortage of either or both during Age I-II transition.

Code of Law vs Warfare.
Good players value them almost equally. However, CoL appears to perform significantly better. I again do not see an alternative explanation here. They have the same number of cards, and the science costs differ by only 1. Thus again, good players seem to over-value Warfare. Probably a bit paranoid and trying too hard to prevent early aggression. I won't say that's wrong. If you are indeed the better player in a game, then the most likely way for you to lose is probably a devastating early aggression. It is a good circumstantial strategy to be slightly paranoid about that and go with the mediocre Warfare. As long as a stronger player do not suffer from an early aggression, she can count on later moves to cover the lost ground.

Knights vs Swordsmen.
Good players value them almost equally, but Swordsmen performs better than Knights. This, even to me, is a surprise. Knights open up more Age I tactics, but Swordsmen are easier to discover and build. At Age II, such difference is almost gone. I would have expected them to perform equally well. My only explanation is that on average, people do spend 15% more CA to take Knights from the card row, and that is bad enough to undermine Knights' record.


Thursday, February 1, 2018

A Data-Driven Strategy Guide for Through the Ages, Part 5

Index:

1. Introduction (Link to Part 1)
2. Data Analysis
    2.1 Classification: Infrastructure Development (Link to Part 2)
    2.2 Classification: Cards Played (Link to Part 3)
    2.3 Separating players with TrueSkill (Link to Part 6)
3. Analysis for Boardgamers
    3.1 Infrastructure Development (Link to Part 4)
    3.2 Cards Played (Current Article)
    3.3 Mistakes made by Good Players (Link to Part 7)

3. Analysis for Boardgamers

Disclaimers: 
(1) We can only learn correlations from data.  Whether these correlations actually imply causation is up to our interpretation.
(2) The data comes from 10k+ recent games at boardgaming-online . I did not separate the "level" of the games.  Thus, it represents the behavior of all players, not just good players.

3.2 Cards Played

Again, the actual data analysis procedure is described in Section 2.2. Here is the summary for boardgamers.

The most important information I will use here, is whether a card is played or not. That includes elected Leaders, discovered Technologies, FINISHED Wonders, and new governments. The data did not include when was such card played, taken, or how was it played (revolution or peaceful). It also does not include yellow cards either.

Just like in the previous section, you will see a number (weight) associated with each card in several charts. Comparing those weights within the same chart tells us how important those cards are, relative to each other.  In addition, the title of a chart will be a number, a larger number there implies how much impact it will have if we follow what the chart teaches us. This title number will not be higher than 0.7. If it is smaller than 0.54, the chart probably has no impact.

Lesson One: Leaders, Leaders, Leaders!

This chart includes all Stage A and Stage I cards. We can see that all leaders have relatively higher weights than other cards available at the same time. Also, Age I leaders are significantly better than Age A ones. So, it is ok to miss out on a Wonder, or a Technology, but we should always plan to have a leader and maximize its benefit.

This should not be too surprising. The (opportunity) costs for these cards are not identical. Leaders have (strictly) the least costs. You only spend CA to take them, and you often don't even spend CA to play them. Everything else requires you to spend CA plus science/resource/population.

Lesson Two: Early CA is key.
Among the Age A leaders, Hammurabi clearly stands out. This echoes our observation in the previous section that early extra CA is important.  In fact, the only other two technology cards which are almost as good as an early leader are Code of Laws (CoL) and Monarchy. They both give you extra CA. In fact, Warfare costs 1 less Science than CoL, but has an obviously less weight (consistent with 0). This does not mean that 1 MA is not important, but it certainly is not as good as 1 CA with a similar cost. Pyramids being lower than Library is a bit surprising. I initially suspect the reason is that even average players tend to overspend their CAs for Pyramids, but that does not seem to be the case.
Here is a chart for all Age A, I and II Wonders, regarding how often they are taken, and on average how many CA a player spends to get them. We can see that Pyramids is taken 20% more often than Library, but people only spend slightly more CA for it on average. Thus, maybe Library is really better.

Lesson Three: St. Peter's Basilica is the best happiness solution.
By the end of Age I, almost everyone will need at least 2 happy faces or some extra yellow tokens. While the 1st happy face can come from the free temple, we need to work on the 2nd one. The possible solutions are Theology, Theocracy, Bread & Circuses, Hanging Gardens, Great Wall, St. Peter's Basilica, Homer, and (Columbus + Vast Territory).  Only the last 3 options have significantly positive weights in the above chart. The opportunity cost for Columbus or Homer is another leader of the same age, which has a comparable weight. Thus, the only solution that stands out is Basilica.

In fact, we can compare across all sources of Happy Face here.
We can see that this chart has a low impact (title number close to 54%). However, the two Stage I Urban solutions for happiness are just terrible. Basilica not only stands out among contemporary cards, it even looks better than later game stuffs which produces more cultures.  If we cross-reference to the previous chart, we can see that Basilica is taken with 2 CA, which is the minimum for your 2nd Wonder. Maybe it deserves to be taken with more CA.

Lesson Four: Be opportunistic.
This chart is for all Age I and II techs for farms, mines, urban buildings and military units. First we note that the impact is quite low (55%).  We should probably avoid Theology and Bread & Circuses, but almost everything else is fine. Let us again cross reference this to how often they are taken, and how many CA spend on them.

Iron is taken twice more often than Coal, but it does have 1 more card. So that difference is not huge. The same story goes for Alchemy v.s. Scientific Method, and Cannon v.s. others.  People get more CA to spend later, thus it is nor surprising that Age II techs costs more CA on average. Among Age I techs, people do take Iron, Knights and Alchemy with more CA.

I guess the conclusion here is that almost anything here will work. Nothing is a must have, but most of them are bad either. You just get what you can, and adjust with yellow cards accordingly. The military situation is similar.  Any units can shine with the right tactics, and the chance to match with the right tactics does not seem to vary significantly.

Lesson Five: Leaders again, but Strategy catches up.
This chart shows all cards in Age II. Leaders are still good, with Napoleon on top as expected. Strategy stands out among other cards and is almost as good as Bach. The 2 CA plus military power might decide whether you are the predator or the prey. The special technologies kind of have the 2nd lowest cost on average. You only spend science, and usually an affordable amount. In fact, let us look at all those things that requires science only.
Lesson Six: "Pure" techs are good.
The above chart compares all Governments and Special Techs. They all cost you only science.  First we note that they almost all have positive weights. That is because if you spend the science on Urban Building or Military Units instead, you also need to spend population and resources to get the actual benefit. You don't always break even after those extra costs. For these things, right after spending your science, you start to get benefits.

Lesson Seven: Constitutional Monarchy sucks!!??
This incorrect impression was due to a bug in the code that confuses Monarchy with Constitutional Monarchy. After fixing that bug, CM performs slightly better than Republic.

Lesson Eight: Many ways to seal your victory, but too late for your misery.
Finally, Age III is no longer dominated by leaders. With sufficient CA and resources, Wonders seem to be the most certain way to seal your victory. But you can also leverage your science or population advantage into Democracy or Air Force. It is also not too late to boost your CA with Civil Services. Note that everything here has a positive weight. This is the post selection effect that "if you can afford these, you are probably winning anyway". Thus, it is probably only worth looking at weights significantly different from average here. 

In particular, we see that Gandhi is terrible compared to other leaders. We probably only take him as the last and usually futile defence against strong aggressors. Professional Sports also appears to be terrible, which suggests that we should have solved potential happiness problem earlier.