cs188 lecture5
Expectimax Search
Why wouldn’t we know what the result of an action will be?
- Explicit randomness: rolling dice
- Unpredictable opponents: the ghosts respond randomly
- Actions can fail: when moving a robot, wheels might slip
Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
Expectimax search: compute the average score under optimal play
- Max nodes as in minimax search
- Chance nodes are like min nodes but the outcome is uncertain
- Calculate their expected utilities
Expectimax Pseudocode
The value function is:
1 |
|
The max-value function mentioned before is:
1 |
|
The exp-value mentioned before is:
1 |
|
Expectimax Search can't pruning because you will never know the affect of the value of the next node
Other Game Types
Mixed Layer Types
- Expectiminimax
- Environment is an extra "random agent" player that moves after each min/max agent
- Each node computes the appropriate combinations of its children
Multi-Agent Utilities
The game is not zero-sum, or has multiple players
- Generalization of minimax:
- Terminals have utility tuples
- Node values are also utility tuples
- Each player maximizes its own component
- Can give rise tio cooperation and competition dynamically
Maximum Expected Utility
Why choose average utilities instead of minimax?
Minimax consider a lot of bad situations with quite small probabilities
- Principle of maximum expected utility:
- A rational agent should chose the action that maximizes its expected utility,given its knowledge of the world
- But it also get some questions:
- Where do utilities come from?
- How do we know such utilities even exist?
- How do we know that averaging even makes sense?
- What if our behavior (preferences) can't be described by utilities?
Utilities
Utilities are functions from outcomes (states of the world) to real numbers that describe an agent’s preferences
Where do utilities come from?
- Utilieties sumarize the agent's goals
- Theorem: any “rational” preferences can be summarized as a utility function
We hard-wire utilities and let behaviors emerge
- Why don't we let agents pick utilities?
- Why don't we prescribe behaviors?
Preference
An agent must have preferences among:
Prizes: \(A,B,\)etc
Lotteries: situations with uncertain prizes: \[ L = [p,A;\;\;(1-p),B] \]
Notation:
- Preference: \(A \succ B\)
- Indifference: \(A \sim B\)
Rational Preferences
- We want some constraints on preferences before we call them rational: \[ Axiom\;of\;Transtivity: (A \succ B) \,\wedge\,(B \succ C) \,\Rightarrow\, (A \succ C) \]
Intransitive preference will leads to error(bad utilities)
Theorem: Rational preferences imply behavior describable as maximization of e
- Given any preferences satisfying these constraints, there exists a real-valued function U such that:
\[ U(A) \geq U(B) \Leftrightarrow A \succeq B \\ U([p_1,S_1;...;p_n,S_n]) = \sum_{i}p_iU(S_i) \]
- Maximum expected utility(MEU) principle:
- Choose the action that maximizes expected utility
- Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities
- E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!