Just this morning, I figured I should pick up the starcitygames feed to make sure that I was seeking more important articles. That was a great decision, because I already have an article to write about.
In my last post, I was talking about how a Magic game can be formalized into a series of positions, each with its own set of values representing the entirety of the game. We can take all these positions and put them into a search tree, where each position is linked to others depending on what actions can lead from one to another. With this, we can now talk a little more formally about how search through that works (if you’re trying to follow in the original dailymtg article, we’re still in first section, “Basic Concept”).
So the search algorithm that a computer would use on a search tree like this would be to find the best (ambiguity in defining best because I’ll address it later) path of transitions between positions that lead from the current position to a winning position. All that really means is that the computer’s algorithm for winning is to determine each step it needs to make that gives it the best chance of winning from where it is now. That doesn’t sound nearly as nerdy.
Because that’s exactly what we as humans do, and Gavin Verhey just posted 2 articles* on starcitygames discussing that exact topic. Basically, he says that magic is a series of “decision points,” and at each, you have new choices and decisions to make and you can make the right decision by looking ahead on that road and seeing what options are there. This shouldn’t sound surprising; this is both intuitive, and very similar to what Patrick Buckland had to say in the dailymtg article.
I guess the real trick here is determining which choice is the best and doing that relatively quickly. There are obviously tradeoffs between these two goals, but I’ll see if I can walk through the intuition for how this is done step by step.
So let’s temporarily assume that we have infinite time. You’re, say, trying to figure out whether you should chump block a shambling remains with a mogg fanatic now, or wait and hope that you don’t die before you can use the mogg fanatic more effectively when you can deal 1 more point of damage to the shambling remains. Since we have infinite time, we can just see exactly where both of these decisions would lead. Even though the trees below both are still huge, they’re finite, so it can be done in infinite time (maybe think 5 monkeys with typewriters and infinite time writing the entire works of shakespeare?). This way, you can determine which is better by, say, taking the ratio of wins versus losses in each sub-tree and picking the more favorable one. So if blocking results in 1 million wins to 500,000 losses and not blocking is 2 million wins to 1.9 million losses, let the fanatic die now.
So your first instinct here is that just taking the ratio isn’t a good way to determine which path is better. Certain paths are more likely (have a greater probability) of happening. Which is true. And maybe not all of those options are equal. In that case, maybe we can weight each outcome by the probability that it would occur. But in the end, it seems like our best method is to somehow quantify (probably numerically, since this is going to be a computer algorithm) the value of various paths and pick the option that has the better summed up value in its sub-tree.
Unfortunately, we don’t have infinite time. Because our finite time is probably significantly less than the size of the tree, we can only look a few decisions and positions ahead (perhaps even as few as 1). In this case, we take shortcuts, or heuristics. One of these is to evaluate the goodness of any position and compare those. We can do that using an evaluation function, which is just a fancy word for taking a bunch of important factors, weighing them, and getting an end value. For example, our 2 choices may result in something like:
2 * 1 (for having the mogg fanatic) + .1 * 16 (being at 16 life) + .1 * 1 (for having just dealt 1 dmg to your opponent with the fanatic’s ability) + …other stuff = 5
2 * 0 (no mogg fanatic) + .1 * 20 (20 life) + .1 * 0 (didn’t pop the fanatic) + …other stuff = something less than 5
(Ignore the 2 most obvious faults that popping the fanatic is another decision and that there isn’t a linear value to life)
So it seems like in the position immediately resulting from blocking or not blocking, it’s more valuable to block. This, however, is obviously short-sighted. Maybe, your opponent can immediately do something else (like, say, play a figure of destiny that you now no longer have a mogg fanatic to kill before it pumps). So what you really want to do is look as far as you can in the future and evaluate that position, assuming that your opponent made intelligent decisions all along the way. What actually ends up happening is that you need to alternately need to look through states where your evaluation function is maximized (for when you made the best choice) and then a state where your evaluation function is minimized (for when your opponent made a choice and tried to screw you). This is a rule in game theory called “minimax” where you’re trying to maximize your own evaluation function in a game with some adversary.
These are getting really long. Anyways, I’ll continue to talk more about this in another post. In that, hopefully I’ll finally get to talking about alpha-beta pruning and more exciting, AI-y sorts of things.
* I should make a couple comments about these articles. 1) I honestly thought they were kind of dumb, but clearly, I appreciated them enough to link to them. In some sense, he’s doing exactly what I’m doing, and taking a more complex idea and explaining it simply. So maybe I should like them more? 2) Decision-point theory, I’m pretty sure, is a term he came up with. I’ve never heard of it before, and it doesn’t really mean anything. It can either be part of game theory or ai, but it definitely isn’t legit on its own. 3) The best point I thought he made was that pros often play complex decks that require a lot of decisions, and good playskill often shows that they choose options that give them more options down the road. This actually deals with another part of search algorithms called “constraint-satisfaction problems”. In that, “forward checking” is something of a way of doing lookahead, and “least-constraining variable” is a heuristic for making choices. But that’s all another post, if it comes up again.
[…] See original here: Search in a Search Tree and "Decision-Point Theory" […]
[…] weeks ago, I wrote about search in a search tree. The idea was that a game (like Magic) can be broken up into a series of positions, and a good way […]