cs188 lecture4

Deterministic Games

Many possible formalizations, one is:

  • States: S (start at \(s_0\))
  • Players: P={1,2,...,N} (usually take turns)
  • Actions: A (may depend on player / state)
  • Transition Function: \(S\times A \rightarrow S\)
  • Terminal Test: \(S \rightarrow \{t,f\}\)
  • Terminal Utilities:\(S\times P \rightarrow R\)

Solution for a player is a policy: \(S \rightarrow A\)

Zero-Sum Games

Zero-Sum Games:

  • Agents have opposite utilities (values on outcomes)
  • Let's think of a single value that one maximizes and the other minimizes
  • Adversarial, pure competition

General Games:

  • Agents have independent utilities (values on outcomes)
  • Cooperation, indifference, competition, and more are all possible
  • More later on non-zero-sum games

Value of a state

Value of a state: The best achievable outcome (utility) from that state.

Non-Terminal States: \[ V(s) = max_{s' \in children(s)}V(s') \] Terminal States: \[ V(s) = known \]

Minimax Values

  • States Under Agent's Control: \[ V(s) = max_{s' \in successors(s)}V(s') \]

  • States Under Opponent's Control: \[ V(s') = min_{s \in successor(s')}V(s) \]

The problem solved means we can compute the value of the root state.

Adversarial Search (Minimax)

  • Deterministic, zero-sum games:
    • One player maximizes result
    • The other minimizes result
  • Minimax search:
    • A state-space search tree
    • Players alternate turns
    • Compute each node's minimax value: the best achievable utility against a rational (optimal) adversary

Minimax Efficiency

  • How efficient is minimax?
    • Time: \(O(b^m)\)
    • Space:\(O(bm)\)

Alpha-Beta Pruning

  • General configuration (MIN version)
    • We’re computing the MIN-VALUE at some node n
    • We’re looping over n’s children
    • n’s estimate of the childrens’ min is dropping
    • Who cares about n’s value? MAX
    • Let a be the best value that MAX can get at any choice point along the current path from the root
    • If n becomes worse than a, MAX will avoid it, so we can stop considering n’s other children (it’s already bad enough that it won’t be played)
  • MAX version is symmetric

Alpha-Beta Pruning Properties

  • This pruning has no effect on minimax value computed for the root!
  • Values of intermediate nodes might be wrong
    • Important: children of the root may have the wrong value
    • So the most naive version won't let you do action selection
  • Good child ordering improves effectiveness of pruning
  • With "perfect ordering":
    • Time complexity drops to \(O(b^{\frac{m}{2}})\)
    • Doubles solvable depth
    • Full search of, e.g. chess, is still hopeless...
  • This is a simple example of metareasoning (computing about what to compute)

Resource Limits

  • Problem: In realistic games, cannot search to leaves!
  • Solution: Depth-limited search
    • Instead, search only to a limited depth in the tree
    • Replace terminal utilities with an evaluation function for non-terminal positions
  • Guarantee of optimal play is gone
  • More plies makes a BIG difference
  • Use iterative deepening for an anytime algorithm

Evaluation Functions

  • Evaluation functions score non-terminals in depth-limited search
  • Ideal function: returns the actual minimax value of the position
  • In practice: typically weighted linear sum of features:

\[ Eval(s) = w_1f_1(s)+w_2f_2(s)+...+w_nf_n(s) \]


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!