Acting humanly the Turing Test
Thinking humanly Cognitive Science Neuroscience
Thinking rationally laws of thought Aristotle logic
Acting rationally doing the right thing maximizing goal achievement
An agent perceives and acts a function from percept
histories to actions.
Given an environment and a task, we seek the best performance.
Perfect rationality is unachievable (percept histories exceed memory limits.)
Agents include humans, thermostats, softbots etc.
The agent program runs on the physical architecture to produce f, which maps
percept histories to actions.
A rational agent chooses whichever action maximizes the expected value of the
performance measure given the percept sequence to date.
Rational is not omniscient, or clairvoyant or successful.
Rational is exploration, learning, autonomy.
PEAS performance measure, environment, actuators, sensors.
Observability partially, fully.
Deterministic partially, fully.
Episodic yes, no.
Static, fully, semi, no.
Discrete yes, no.
Single-agent yes, no.
Reflex a set of condition actions rules that look at
current state and choose an action.
Reflex with state an internal model of the world plus condition-action rules.
Goal-based test possible future states against a goal state.
Utility-based measure of how good a state is.
Learning internal critic measures performance in generated problems.
Offline eyes closed search in a state space.
Deterministic, full-observable environment single-state problem.
Non-observable environment conformant problem.
Non-deterministic and/or partially observable contingency problem.
Unknown state space online exploration.
Single state initial state, successor function, goal test, path cost solution is a sequence of actions state space is an abstraction of the real world.
Tree search nodes contain state, parent node, path cost, depth counter.
Search strategies completeness, time/space complexity,
optimality.
Branching factor, depth of least cost solution, maximum depth of search tree.
Uninformed breadth-first, uniform-cost, depth-first, depth-limited,
iterative deepening.
Breadth-first FIFO queue. Complete, time/space exponential
in depth, optimal if step cost = 1.
Uniform-cost queue ordered by path cost. Complete, time/space exponential in
optimal path cost, optimal.
Depth-first LIFO queue . Not complete (Infinite depth, loops), time
exponential in maximum depth, space linear in max. depth, not optimal.
Depth-limited depth-first with depth limit.
Iterative deepening gradually increase depth of depth-limited search.
Complete, time exponential in depth of solution, space linear in depth,
optinmal if step cost is 1.
Graph search checks states already generated can be exponentially better than tree search.
Best-first expands nodes in desired order greedy, A*.
Heuristic function evaluates states h(n) is estimated cost from node n to the
goal.
Greedy search expands node that appears to be closer to the goal. Not complete
(loops), time/space exponential in depth of tree, not optimal.
A* - avoid expanding nodes that are already expensive. Evaluation is path cost
+ heuristic estimate.
An admissible heuristic always underestimates the actual cost. Gives optimal
solution. Complete, time exponential in heuristic error, space exponential in
depth of solution, optimal if heuristic admissible.
A* expands all nodes with f(n) < solution cost, some nodes with f(n) =
solution cost, no nodes with f(n) > solution cost.
A heuristic that dominates another (Both admissible) is always better.
Heuristics can be derived from exact solutions to a relaxed problem.
Iterative improvement path is irrelevant.
Use complete states for search.
Good for online as well as offline search.
Hill-climbing climbing Everest in a fog with amnesia.
Problems local maxima, shoulders, saddles.
Random-restart is trivially optimal.
Simulated annealing allows bad moves. Gradually decrease temperature to
zero gradually decreases number and size of bad moves.
Local beam search keep k best states choose randomly.
Genetic algorithms a stochastic beam search in which states mate using
crossover (representatiuon-dependant) and undergo random mutation. Only the
fittest states are used in the next generation.
State is defined a set of variables with assignments made
from a domain of values.
Goal test is a set of constraints defining allowed assignments of values to
variables.
Binary CSP constraints involve at most two variables.
Domains can be finite (map-coloring) or infinite (integer equations).
May need a constraint language for complex constraints.
Continuous variable problems can be solved if linear constraints.
Search formulation start with empty assignment and choose next variable to
assign from among its domain values. Goal is complete assignment. Can use
depth-first search since solution depth is known usually a backtracking
search backtrack when a domain is empty.
Improvements Minimum Remaining Values choose variable with smallest domain
size (eliminating illegal values.) Degree heuristic choose variable with most
constraints on remaining variables. Least Constraining Value the one that
rules out most values in remaining variables. Forward checking eliminate
illegal values in unassigned variables. Constraint propagation repeatedly
enforce local constraints. Arc consistency make each arc consistent if a
variable loses a value, recheck its neighbors.
Some problems can be reduced to a tree-structured CSP solvable in quadratic
time.
Local search can use complete states with some constraints not met operators
reassign values.
Min-conflicts heuristic picks values that violate fewest constraints.
Deterministic versus chance. Perfect information versus
imperfect.
Most games are 2-player, zero-sum, with players alternating turns.
Perfect play for deterministic, perfect information games is
minimax choose move to a position with highest minimax value.
Minimax is complete in finite spaces, optimal against an optimal opponent, time
exponential in tree depth, space linear in depth if using depth-first search.
Branching factor can be large (35 in chess.)
Alpha-beta pruning doubles search depth.
Evaluation function is complex function of state variables linear weighted
sum of feature values.
Use a cutoff depth or evaluation function test to stop search in very deep
trees.
Behavior is unaffected by a monotonic transformation of eval function.
Chance nodes have branches determined by chance element
(dice, coin-toss.)
Expectiminimax gives perfect play.
Alpha-beta much less effective.
e.g. card games. Card deal is like one big dice
roll at the start of the game.
Intuition that value of an action is average of values in all possibilities is
wrong.
Instead with partial observability, value of an action depends on the information
state or belief state the agent is in search a tree of information states.