Two weeks ago, I wrote about search in a search tree. The idea was that a game (like Magic) can be broken up into a series of positions, and a good way to determine the best strategy is to evaluate all of these positions and find a path that gets you to a winning state, even when your opponent makes their best move as well. I left a few holes in that article, including how one turns the evaluation function into rules about what to do and what other shortcuts one can find in searching over this very large tree. So let’s finish that.
I briefly mentioned minimax at the end of the last article, and that’s where I’m going to start. Minimax is a strategy for evaluating these trees that should be fairly intuitive. How do players usually make decisions? Based on which outcomes are mostly likely to allow you to win. When you can look at the entire depth of a tree from your current position (as in, you can see every possible way that you win or lose), then you’re trying to reach one of those winning states. To have that happen, you probably need to make a choice on that second to last position to win. But you can also assume there that your opponent is going to make a choice one position before that that tries the most to keep you from winning. What this means is that you can guess at how a game is going to go, and how good a position is by propagating the alternating best and worst positions for you.
Let’s make that more concrete. You are at 2 life, have 2 Ashenmoor Gougers and a summoning sicknessed mogg fanatic in play before your combat step and no cards in hand. Your opponent is tapped out with a foul imp untapped, 1 foul imp tapped, no cards in hand, and is at 5 life*. What do you do?
It should be fairly obvious that you swing with both gougers and then pop the fanatic for the kill. So let’s see how you came to that.
Your current position is listed above and you’re in your declare attackers step. Your choices are attack with 0, 1, or 2 gougers, each of which represents a different branch from your current position. If you attack with 0 gougers, nothing happens. The next major choice (I cheat a lot since magic has tons of intermediary positions for phases changes and such) is your opponent’s attack step. We assume your opponent is rational and makes his best move, which is to kill you. Technically, he could’ve done nothing during his attack step, but we won’t count on that. So you lose. We can propagate losing up a level to say that not attacking results in you losing.
The same applies for attacking with 1 gouger. You can do that, so the next position to evaluate is how your opponent blocks. He can either block with his imp, or not. Since he’s rational, he sees the lookahead 1 move to where you pop your fanatic and determines that the result of not blocking is losing, and blocking is winning. So he should choose to block, so this position for you can be generalized as a loss.
But if you attack with both, all moves for him result in losing. Thus, that can be propagated to say that your current position is a winning on because the best outcome out of all the possible next positions is a win.
So that’s minimax in its complete form. As I mentioned in the previous post, though, it’s often unfeasible to enumerate the entire tree to the end of the game. Instead, one might look maybe 5 moves ahead and consider all of those possibilities. Since these don’t have definite win/loss states, we can instead use the evaluation function to give them numerical values. We can using the same propagation method to assign the values for positions 4 moves ahead by considering the values of the 5-ahead states that it’s linked to. We can continue to work backwards until we’ve figured out the values for the next move, then pick the best one.
Even so, this is slow. So here’s where alpha-beta pruning comes in. Alpha-beta pruning applies a heuristic to minimax to avoid looking at entire trees. The basic idea here is that we don’t need to consider entire subtrees (all of the positions following a given position) if we know that that subtree can be skipped. From any position, you’re basically doing a lookahead 2 positions to see if you’re in a better spot after your opponent makes their choice. Since we can find maximums and minimums along the way, certain subtrees don’t matter as they’re already less optimal. That was ambiguous; let’s take a trivial example.
Let’s say your opponent has a prodigal pyromancer in play but with summoning sickness. It’s your turn, you have 1 life, and you know your opponent has a goblin piker on the top of their library because you excommunicated it with no cards in hand. You have a lightning bolt and goblin piker in hand with only enough mana for one of the spells, with your opponent at 6 life. From this position, you have 4 choices: bolt the pyromancer, bolt your opponent, make your opponent discard a card, or do nothing. So we can consider these each as a different path down the search tree.
The first is to bolt the pyromancer. If you do this, your opponent’s board is clean. Next, it’s your opponent’s move. They’re only options are to either play or not play their piker. If they play it, your evaluation function puts it at -10, while if they don’t play it, you’re at -5. Since your opponent is trying to minimize your evaluation function, they will play, so the action of bolting the pyromancer has a net value of -10.
The next is to bolt your opponent. If you do this, your opponent is now at 3. On their turn, they can poke with the pyromancer or not, play the piker or not. If they poke you, you die, resulting in an evaluation of -1000. Here’s where alpha-beta pruning comes in. At this point, you know that the action of bolting the pyromancer results in a worst case of -10. Bolting your opponent results in at least a worst case of -1000. There’s a chance that other actions your opponent will make will be worse (probably not), but that doesn’t matter because if you bolt your opponent, one of your opponent’s action is worse than the worst case of another action. So you don’t need to worry about considering the cases where they play the piker and so forth. This is the pruning action of not considering this sub-tree.
This example is a little trivial, but it’s hopefully very clear. Alpha-beta pruning can be used at multiple levels. For example, the position we discussed above might be 3 levels deep in another tree rooted at what you’re actual position is, and the result of that will propagate into a larger situation.
Hopefully, this all seems intuitive. Indeed, we as humans actually prune a lot more deeply than this; in a game, it would never have crossed my mind to not bolt the pyromancer, but those are other heuristics. Maybe that’ll come up in a future post. In the mean time, you can google around for alpha-beta pruning and get more rigorous explanations of how it works and why it’s actually called alpha-beta pruning.
*You’ll note I cheated majorly and made sure that no one has any tricks because minimax, in its most naive form, assumes perfect information. Maybe I’ll deal with this later
Hello Kevin,
I was browsing the internet when I stumbled upon this website, and I just want to say, I think what you are doing is very cool. If they taught a class about game theory and playing which used Magic as a backdrop at my University, I know I would be fervently refreshing my browser to make sure I got in on the day of course registration. I’m not sure which school you’re teaching this class at, but I look forward to reading this site in the future and seeing what ideas you’ve created. If there’s anything I can do to help you collect ideas or information for your course, let me know and I’d be happy to help provide you with the tools to do so. I know a lot of players and do both a podcast and article each week (as you know,) and I would love to see this class taught to its full potential so that it can continue to be taught in the future.
Keep up the great work – this is all very interesting!
Gavin Verhey