UCT and Artificial Intelligence on Jocly
Jocly is a big and complex project that aims to provide abstract strategy games with a number of features common to all those games.
As for any project, our development resources are limited and we often reach the point where we ask “what do we do next in the project ?”. Sometimes we answer with “let’s add a new game” to improve our game range diversity, or “add this feature” like we recently did with the addition of WebGL and WebRTC, or it might be “improve this” because we estimate that given the overall Jocly quality, that particular feature became a weakness.
Currently, we identified two main points where we are below Jocly standards:
the overall User Interface (not the games but the application itself)
the Artificial Intelligence when playing against the computer
We are working actively on the User Interface and we will come back to you on this in the near future.
Regarding the AI, the main criticisms are “it’s too slow on my device” or “it does not play well enough”, so we did our best to address those issues.
Introduction to Articial Intelligence in games
You can see a game solving problem like this: given a current board state, you consider all moves you can possibly play. This gives you a root with a number of branches at the end of each you get another board state as a leaf.
Figure 1 - note that traditionally, the tree is represented upside-down with the root at the top and the leaves at the bottom
However, in each board state leaf (unless it’s terminal, i.e the game is over), the opponent can play moves that lead to other board states, and so on.
Obviously, if the total game tree is not infinite, it is huge and there is no way we can explore it all (except at the end of the game). So the exploration must reach a limit, like having a maximum depth to be visited.
When we are at the leaf (we don’t want or can explore the states below), we can evaluate the board states. For instance, in checkers, you can count the remaining pieces and do:
evaluation = <white pieces count> - <black pieces count>
(Of course, you can be smarter in including many other criteria in your evaluation)
So if the higher the evaluation is, the best it is for the whites, the lower is best for the blacks.
For a given board state S, white will look at the child nodes and will prefer the move that leads to the highest evaluation, so the evaluation of S is the maximum of its children evaluation. If it’s black to play, the evaluation is the minimum of its children.
So each node in the tree can be evaluated by the maximum child node evaluation, the minimum child node evaluation or a static evaluation if we don’t want to explore the children.
This method of choosing the (presumably) best computer move is known as Minimax and has been found in the 20s.
More on Minimax.
Since the very first days of Jocly, Minimax is used with an improvment called Alpha-Beta
Given that we explore the tree in the following order:
You can notice that after exploring node 11, we can realize that there is no need to explore node 12.
Indeed, whatever the evaluation of node 12, if it’s greater than 4, it won’t be propagated to the parent node (because it’s a ‘min’ node) and if it’s less than 4, it won’t propagate to the grand-parent node (because we already have a 5 and the grand parent is a ‘max’ node).
As a result, there is an entire branch we don’t need to explore because we know its outcome won’t affect the final decision.
Alpha-beta may not be very spectacular in this example, when exploring large trees, it has drastic effects and can easily remove from the search 80% of the entire tree.
More information about Alpha-Beta pruning.
The bad thing about alpha-beta is that you don’t really know when to stop the tree exploration. You can say, i go down all branches (except the ones that have been alpha-beta pruned) to depth 5. In Jocly, we also assign to a node a potential number of child nodes to be explored. If a node has 4 children, each child inherits ¼ of its parent’s potential. When the potential goes below 1, we stop the exploration. This method allows to explore deeper situations with forced moves for instance.
However, the main problem is that we waste time exploring branches below moves that are unlikely to be played because they are simply bad moves (but we don’t know that yet).
Jocly now implements the UCT AI algorithm as an alternative to alpha-beta. It produced spectacular results with the newly released Margo game and we plan to use this AI on all other Jocly games.
UCT is fundamentally different from Alpha-Beta in such that instead of exploring the tree in depth, it proceeds by iteration in order to build the tree progressively, adding one or more branches at each iteration.
The benefits are highly valuable:
we can give a limited duration for a computer to decide for a move: at the end of the timer the move is picked.
and/or on a CPU limited device, we can give a maximum number of iteration
and/or on a memory limited device, we can give a maximum number of nodes to be created
And the best of all is that from the tests we run, the quality of the picked move is much better than the regular alpha-beta algorithm for equivalent computing times.
UCT works by growing the tree with successive iterations:
Obviously, the branch that is expanded depends on the apparent quality of the corresponding move. This allows to explore deeper branches where the game is likely to go in the next moves.
If we only the best node to expand, we will completely miss situations where a sacrifice leads to a better board state. A sacrifice move has a low apparent quality, so we won’t explore the branches below that would have shown a win of the game.
Fortunately, UCT has a mechanism to balance this behavior. It’s called UCB (for Upper Confidence Bound).
When walking down the tree, we pick the child node that has the best UCB. The UCB value involves the number of visits made to this node in such a way that when a node is not frequently visited, its UCB increases, so that if at some point its UCB become bigger than its brother nodes, it will be picked next time.
For information, the UCB formula is:
UCB = V + C x sqrt ( ln (node parent visits) / ( node visits) )
Where V is the intrinsic value of the node (obtained from static evaluation if leaf node, or the value that has been minimaxized from the child nodes), and C, a constant value that has to be approximated manually for a given game type.
When a leaf node has been reached, we decide whether to evaluate the node or to expand it (i.e create its child nodes). The node evaluation is then propagated back to the upper nodes using the minimax method (since we now run minimax from bottom to top, it’s very fast).
In regular UCT, once you have reached a leaf node, you may run a ‘playout’ from the current board state in order to improve the node static evaluation. A playout means finishing the game by playing random or heuristic oriented moves to the end or at least for a number of moves. Once you have run enough playouts on a node, you may decide at the next that it’s worth expanding the node. We have the code in place for doing running playouts, but our experiments on Margo showed that the CPU time spent in playouts was better used at expanding the tree, so we configure the game levels in order to run no playouts. Maybe it will be worth running playouts on other games where generating the possible moves is less expensive than on Margo.
For more information about UCT, check this site (note it’s written by Cameron Browne, the inventor of the Margo game).
A little bit of genetic
Once a game is implemented you generally ends up with a number of parameters to be tuned for the evaluation function. For instance, in checkers, you can evaluate your pieces like this:
evaluation = <W soldiers> + P x <W kings> - <B soldiers> - P x <B kings>
You can easily count the number of soldiers and kings, but what is the value of P ? Of course, you can try to guess something that looks reasonable, but when you have a number of parameters to setup and the optimal value of each depends on the value of others, the tuning quickly becomes a nightmare. For instance, in Margo, we use 11 different constants to generate an evaluation of a board.
Plus, when using the UCT AI, there are a number of other parameters that influence the computer play, starting with the C parameter of the UCB formula.
To address this issue we had from the beginning of Jocly, we developed a tool using genetic programming to find the most suitable parameters set.
A genetic algorithm mimics the real life evolution rules: only the best ones survive and reproduce.
The idea is to have individuals with a set of chromosomes, each of those representing a parameter to be tuned (which can be an integer, float, boolean or discrete value). You start with a population of individuals whose chromosome values have been picked randomly. At each iteration, you play a big tournament between all the individuals. You keep the ones that perform the best and kill the others. You breed randomly the remaining population by combining their chromosomes and allowing some mutations (a chromosome value may be altered). After a number of iterations, you can pick the best performer with a reasonable confidence that it represents a close-to-best set of game parameters.
This simple algorithm works remarkably well. The bad point being the time it takes to run. Measuring the fitness of each individual requires running a large number of games and this takes quite a long time. It is not rare that we have to run the genetic selector over 24 hours to get a satisfying result.