cs188 lecture4
Deterministic Games
Many possible formalizations, one is:
- States: S (start at \(s_0\))
- Players: P={1,2,...,N} (usually take turns)
- Actions: A (may depend on player / state)
- Transition Function: \(S\times A \rightarrow S\)
- Terminal Test: \(S \rightarrow \{t,f\}\)
- Terminal Utilities:\(S\times P \rightarrow R\)
Solution for a player is a policy: \(S \rightarrow A\)
Zero-Sum Games
Zero-Sum Games:
- Agents have opposite utilities (values on outcomes)
- Let's think of a single value that one maximizes and the other minimizes
- Adversarial, pure competition
General Games:
- Agents have independent utilities (values on outcomes)
- Cooperation, indifference, competition, and more are all possible
- More later on non-zero-sum games
Adversarial Search
Value of a state
Value of a state: The best achievable outcome (utility) from that state.
Non-Terminal States: \[ V(s) = max_{s' \in children(s)}V(s') \] Terminal States: \[ V(s) = known \]
Minimax Values
States Under Agent's Control: \[ V(s) = max_{s' \in successors(s)}V(s') \]
States Under Opponent's Control: \[ V(s') = min_{s \in successor(s')}V(s) \]
The problem solved means we can compute the value of the root state.
Adversarial Search (Minimax)
- Deterministic, zero-sum games:
- One player maximizes result
- The other minimizes result
- Minimax search:
- A state-space search tree
- Players alternate turns
- Compute each node's minimax value: the best achievable utility against a rational (optimal) adversary
Minimax Efficiency
- How efficient is minimax?
- Time: \(O(b^m)\)
- Space:\(O(bm)\)
Alpha-Beta Pruning
- General configuration (MIN version)
- We’re computing the MIN-VALUE at some node n
- We’re looping over n’s children
- n’s estimate of the childrens’ min is dropping
- Who cares about n’s value? MAX
- Let a be the best value that MAX can get at any choice point along the current path from the root
- If n becomes worse than a, MAX will avoid it, so we can stop considering n’s other children (it’s already bad enough that it won’t be played)
- MAX version is symmetric
Alpha-Beta Pruning Properties
- This pruning has no effect on minimax value computed for the root!
- Values of intermediate nodes might be wrong
- Important: children of the root may have the wrong value
- So the most naive version won't let you do action selection
- Good child ordering improves effectiveness of pruning
- With "perfect ordering":
- Time complexity drops to \(O(b^{\frac{m}{2}})\)
- Doubles solvable depth
- Full search of, e.g. chess, is still hopeless...
- This is a simple example of metareasoning (computing about what to compute)
Resource Limits
- Problem: In realistic games, cannot search to leaves!
- Solution: Depth-limited search
- Instead, search only to a limited depth in the tree
- Replace terminal utilities with an evaluation function for non-terminal positions
- Guarantee of optimal play is gone
- More plies makes a BIG difference
- Use iterative deepening for an anytime algorithm
Evaluation Functions
- Evaluation functions score non-terminals in depth-limited search
- Ideal function: returns the actual minimax value of the position
- In practice: typically weighted linear sum of features:
\[ Eval(s) = w_1f_1(s)+w_2f_2(s)+...+w_nf_n(s) \]
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!