2.1 Opponent Modeling
When working with multiple intelligent agents, the common approach is to create machines that
attempt to model their associates in some way. Various approaches have been developed in an
attempt to solve these opponent modeling problems. Perhaps the most direct form of this is in
the area of policy reconstruction. These agents attempt to reconstruct the other agent’s decision
making processes through one of several means. This has been done through Fictitious Play [5],
wherein average frequencies of moves are used to approximate the other agent’s behaviour. This
can be modified by considering the other agents’ actions based on the agent’s own actions as in [6].
Another approach to modeling others is through case-based modeling, wherein the agent compares
its current situation with other cases and uses these to determine how to act [7]. Machine-learning
techniques have also been used to model others, including deterministic finite automata [8], decision
trees [9], and neural networks [10].
While making models to fit a specific opponent may lead to ultimately better results, it
is not always practical to do so. In situations where a machine must quickly adapt to a new
partner, one option is to fit them to previously determined types. Here again, methods including
deterministic finite automata [11], decision trees [9], and neural networks [12]; [13] can be used
to represent types which can be fit to the strategy used by the agent’s counterpart. By including
multiple strategies, AVA is in some ways similar, though the approach is not exactly as these.
Just as one would design an intelligent agent to model those it interacts with, so too would it
be reasonable to believe that the agents it will interact with may also model it. The understanding
that other intelligent beings have their own models to describe the world is referred to as Theory of
Mind [14], which is something people develop at a young age to understand how others behave [15].
Through Theory of Mind, an agent may learn to understand the intentions of others [16]. Similar
attempts to leverage this ability have included agents which recursively modeling what each being
thinks the other believes until reaching some predetermined depth at which the agent assumes the
other agent will act rationally [17]. As AVA is designed to predict humans, it would be a potential
candidate for guiding an AI agent’s own Theory of Mind.
When developing a machine meant to model people there are a few differences not typically
seen in other machines that must be considered. For instance, many agents assume that others
only use one model. While this works well for simple agents and may be able to approximate more
complex ones, it will not necessarily cover human opponents as effectively. If a person sees that their
strategy of interaction isn’t working to their expectations, they will almost certainly adopt a new
strategy, which is an element AVA attempts to cover by learning from multiple pflayers. This does
add a layer of complexity, but a few approaches have been attempted previously to account for this.
For instance, [18] created an algorithm to start by varying models until convergence is reached. [19]
have constructed agents which are allowed to change between static models periodically. Meanwhile
others have created algorithms to map interaction histories to models [20]; [21], while still others
have adjusted models by primarily weighting recent interactions [22].
When humans interact, not only do they sometimes change strategies, they also sometimes
act irrationality. Often, attempts to create intelligent systems are interested in achieving some
optimal value, and behaving in any manner that does not support that goal is rendered impossible.
People do not always act in their best interest, however, allowing emotions and fallacies to cause
them to act in ways a designer may see as suboptimal. Still, there has been some work done to
attempt to model this behavior, and for an agent to interact with people it may be useful to capture
some essence of this. In their work, [23] use a satisficing algorithm for an agent to learn how to
behave in the prisoner’s dilemma, learning to adjust expectations based on the opponent’s actions.
In a variety of other works (e.g., [24]), agents are constructed to attempt to mimic human behavior
2