cs188 lecture5

  • Why wouldn’t we know what the result of an action will be?

    • Explicit randomness: rolling dice
    • Unpredictable opponents: the ghosts respond randomly
    • Actions can fail: when moving a robot, wheels might slip
  • Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes

  • Expectimax search: compute the average score under optimal play

    • Max nodes as in minimax search
    • Chance nodes are like min nodes but the outcome is uncertain
    • Calculate their expected utilities

Expectimax Pseudocode

The value function is:

1
2
3
4
def value(state):
if the state is a terminal state: return state's utility
if the next agent is MAX: return max-value(state)
if the next agent is EXP: return exp-value(state)

The max-value function mentioned before is:

1
2
3
4
5
def max-value(state):
initialize v = -inf
for each successor of state:
v = max(v,value(successor))
return v

The exp-value mentioned before is:

1
2
3
4
5
6
def exp-value(state):
initialize v = 0
for each successor of state:
p = probability(successor)
v += p * value(successor)
return v

Expectimax Search can't pruning because you will never know the affect of the value of the next node

Other Game Types

Mixed Layer Types

  • Expectiminimax
    • Environment is an extra "random agent" player that moves after each min/max agent
    • Each node computes the appropriate combinations of its children

Multi-Agent Utilities

The game is not zero-sum, or has multiple players

  • Generalization of minimax:
    • Terminals have utility tuples
    • Node values are also utility tuples
    • Each player maximizes its own component
    • Can give rise tio cooperation and competition dynamically

Maximum Expected Utility

Why choose average utilities instead of minimax?

Minimax consider a lot of bad situations with quite small probabilities

  • Principle of maximum expected utility:
    • A rational agent should chose the action that maximizes its expected utility,given its knowledge of the world
  • But it also get some questions:
    • Where do utilities come from?
    • How do we know such utilities even exist?
    • How do we know that averaging even makes sense?
    • What if our behavior (preferences) can't be described by utilities?

Utilities

  • Utilities are functions from outcomes (states of the world) to real numbers that describe an agent’s preferences

  • Where do utilities come from?

    • Utilieties sumarize the agent's goals
    • Theorem: any “rational” preferences can be summarized as a utility function
  • We hard-wire utilities and let behaviors emerge

    • Why don't we let agents pick utilities?
    • Why don't we prescribe behaviors?

Preference

  • An agent must have preferences among:

    • Prizes: \(A,B,\)etc

    • Lotteries: situations with uncertain prizes: \[ L = [p,A;\;\;(1-p),B] \]

  • Notation:

    • Preference: \(A \succ B\)
    • Indifference: \(A \sim B\)

Rational Preferences

  • We want some constraints on preferences before we call them rational: \[ Axiom\;of\;Transtivity: (A \succ B) \,\wedge\,(B \succ C) \,\Rightarrow\, (A \succ C) \]

Intransitive preference will leads to error(bad utilities)

lec5-1

Theorem: Rational preferences imply behavior describable as maximization of e

  • Given any preferences satisfying these constraints, there exists a real-valued function U such that:

\[ U(A) \geq U(B) \Leftrightarrow A \succeq B \\ U([p_1,S_1;...;p_n,S_n]) = \sum_{i}p_iU(S_i) \]

  • Maximum expected utility(MEU) principle:
    • Choose the action that maximizes expected utility
    • Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities
    • E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!