cs188 lecture5

Expectimax Search

Why wouldn’t we know what the result of an action will be?
- Explicit randomness: rolling dice
- Unpredictable opponents: the ghosts respond randomly
- Actions can fail: when moving a robot, wheels might slip
Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
Expectimax search: compute the average score under optimal play
- Max nodes as in minimax search
- Chance nodes are like min nodes but the outcome is uncertain
- Calculate their expected utilities

Expectimax Pseudocode

The value function is:

def value(state):
	if the state is a terminal state: return state's utility
	if the next agent is MAX: return max-value(state)
	if the next agent is EXP: return exp-value(state)

The max-value function mentioned before is:

def max-value(state):
	initialize v = -inf
	for each successor of state:
		v = max(v,value(successor))
	return v

The exp-value mentioned before is:

def exp-value(state):
	initialize v = 0
	for each successor of state:
		p = probability(successor)
		v += p * value(successor)
	return v

Expectimax Search can't pruning because you will never know the affect of the value of the next node

Other Game Types

Mixed Layer Types

Expectiminimax
- Environment is an extra "random agent" player that moves after each min/max agent
- Each node computes the appropriate combinations of its children

Multi-Agent Utilities

The game is not zero-sum, or has multiple players

Generalization of minimax:
- Terminals have utility tuples
- Node values are also utility tuples
- Each player maximizes its own component
- Can give rise tio cooperation and competition dynamically

Maximum Expected Utility

Why choose average utilities instead of minimax?

Minimax consider a lot of bad situations with quite small probabilities

Principle of maximum expected utility:
- A rational agent should chose the action that maximizes its expected utility,given its knowledge of the world
But it also get some questions:
- Where do utilities come from?
- How do we know such utilities even exist?
- How do we know that averaging even makes sense?
- What if our behavior (preferences) can't be described by utilities?

Utilities

Utilities are functions from outcomes (states of the world) to real numbers that describe an agent’s preferences
Where do utilities come from?
- Utilieties sumarize the agent's goals
- Theorem: any “rational” preferences can be summarized as a utility function
We hard-wire utilities and let behaviors emerge
- Why don't we let agents pick utilities?
- Why don't we prescribe behaviors?

Preference

An agent must have preferences among:
- Prizes: \(A,B,\)etc
- Lotteries: situations with uncertain prizes: \[ L = [p,A;\;\;(1-p),B] \]
Notation:
- Preference: \(A \succ B\)
- Indifference: \(A \sim B\)

Rational Preferences

We want some constraints on preferences before we call them rational: \[ Axiom\;of\;Transtivity: (A \succ B) \,\wedge\,(B \succ C) \,\Rightarrow\, (A \succ C) \]

Intransitive preference will leads to error(bad utilities)

Theorem: Rational preferences imply behavior describable as maximization of e

Given any preferences satisfying these constraints, there exists a real-valued function U such that:

\[ U(A) \geq U(B) \Leftrightarrow A \succeq B \\ U([p_1,S_1;...;p_n,S_n]) = \sum_{i}p_iU(S_i) \]

Maximum expected utility(MEU) principle:
- Choose the action that maximizes expected utility
- Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities
- E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

网课笔记 CS188

CS188

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

cs188 lecture6 上一篇

cs188 lecture4 下一篇