L2 Audio
of [Aug 25, 2010] (Video of the lecture video (4gb) Trends in AI; Start of Planning; different kinds of planning; atomic account of classical planning and its limitations; propositional account--STRIPS representation; Progression.
L3 Audio
of [Aug 30, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
STRIPS representation-->ADL representation; conditional and quantified
effects and compiling them into canonical
representation. Issues on handling multi-valued fluents
(state variables); Progression; Regression; blind-search
tradeoffs.
L4 Audio
of [Sep 1st, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Different ways of proving the correctness of plans. Causal proof and
plan-space (partial order) planning. Discussion of the
partial order planning algorithm. Observations on flaw
selection heuristics etc. Discussion on handling
conditional effects in regression and plan-space
planning. Discussion on handling lifted (partially
specified) actions in regression and partial order planning.
L5 Audio
of [Sep 8th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Reachability analysis and planning graph heuristics. Understanding
planning graph as an optimistic projection of
reachability. h_level, h_sum, h_max and h_{relaxed-plan}
heuristics. Relaxed plan extraction (and how it becomes
hard with mutual exclusions).
L6 Audio
of [Sep 13th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Heuristics vs. search strategies; PG heuristics for progression
vs. regression; progression vs. regression--can the
balance have something to do with ergodicity of the
benchmark domains? backchaining as a meta-idea with
multiple realizations.
L7 Audio
of [Sep 15th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Negative interactions and the idea of capturing them with
level-specific n-ary mutexes; static mutex identification rules and
which of them are minimally required; mutex propagation
rules for binary mutexes. mutexes, memos and graphplan
completeness theorem. Converting plan extraction from
planning graph into a SAT problem.
L8 Audio
of [Sep 20th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Majority of the class on compiling bounded length planning into SAT
and CSP. Planning graph compilation first, followed by the
more general view of encodings coming from lines of proof
of correctness. State-based vs. causal-proof
encodings. Planning graph encoding as explanatory
frame-axiom encoding with mutex propagation. Use of
negative interactions in planning graph heuristics (and
the adjusted sum heuristic); using planning graph
heuristics in partial-order
planning.
L9 Audio
of [Sep 22nd, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Majority of the class on landmark heuristics (using Richter/Karpas
ICAPS 2010 tutorial), with digressions into causal-graph
heuristic (used in Fast Downward), and cost-propagation on
planning graphs (used in cost-based landmark
analysis). Final 10 minutes are devoted to motivating the
atomic model for stochastic worlds and general reward
structures.
L11 Audio
of [Sep 29th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Computing the value of a policy. Optimal policy construction for
finite-horizon MDPs. Relations between finite-horizon MDPs
and bounded length planning. Brief discussion of
indefinite horizon problems--and their need for sink
(absorbing) states. Infinite horizon problems--and how
discount factor affects the convergence rate. The idea of
infinite horizon MDP value iteration as just a "repeat
until" version of the finite horizon MDP value iteration
(where the until condition checks that the max-norm
difference between two iterations is less than epsilon)
L12 Audio
of [Oct 4th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Infinite horizon MDP value iteration; understanding bellman update as
a contraction operator, greedy policy for a given value
function; policy iteration algorithm; improvements for
value and policy iteration. Other MDP models--with focus
on stochastic shortest path and probability maximization
models--the bellman equations for those models.
L13 Audio
of [Oct 4th, 2010] (Video of the
lecture
is here
Relating MDPs to directional search such as A*; use of lower-bound
heuristics; prioritized sweeping
for stochastic shortest path (SSP) based on real time dynamic
programming; comparing RTDP trial to Bellman update;
properties of RTDP--and improving convergence of RTDP;
heuristics for SSP using determinizations--all outcome
vs. most-likely outcome determinization.
L14 Audio
of [Oct 11th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
IPPC; Factored action representations for stochastic planning; 2-TBNs
and Probabilistic STRIPS operators (PSOs); PPDDL standard;
IPPC domains and the fact that only one was non-ergodic;
IPPC evaluation metric; two broad approaches--off-line
vs. online. The FF-replan story. Connecting FF-replan
success to online anticipatory scheduling and birth of
FF-hop.
L15 Audio
of [Oct 13th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Discussion of FF-hop; issues in sampling futures; list of
approximations made by FF-Hop; understanding the relation
between V_hs, V* and V_ffra; computational improvements to
ff-hop including (1) handling low-likelihood futures; (2)
reusing plans (3) using dynamic sampling width and time
horizon. General view of determinizations as a type of
relaxation of a stochastic planning problem. Seeing -ve
interactions as an orthogonal relaxation; and realizing
the spectrum of approaches. Understanding the advantage of
factored appraoches in reusing the computational effort
spent in doing a plan.
L16 Audio
of [Oct 18th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Handling state uncertainty--the road map; atomic model with and
without stochastic uncertainty; belief states; applying
actions to belief states (and why action preconditions can
get in the way); why stochastic uncertainty--which is
additional knowledge-- seems to
increase the difficulty of the problem by exploding the
belief space; two ways of reducing state uncertainty--with
causal actions and with observational ones (and realizing
that in general we can have actions that have both casual
and observational effects); observation model; how observations
partition the state space--and how the number of
partitions corresponds to the degree of observability (the
notion of idf of the observation). State estimation and
planning problems in belief-space. For planning, the
difference between conformant and contingent planning; the
difference between full vs. limited contingency planning.
L17 Audio
of [Oct 18th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Discussion of factored approaches for belief-space planning;
discussion on BDDs and BDD-based planning; progression
and regression for conformant planning; sensing
actions; progression in the presence of sensing actions;
heuristics for conformant planning--all-states
determinization; labelled uncertainty graphs.
L18 Audio
of [Oct 25th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Part 1: Discussion of heuristics for belief-space planning; the idea
of state interactions (in addition to action
interactions). The merged; unioned and LUG planning
graphs. The notion of cross-world mutexes and how that
leads to CGP (conformant graphplan); a little on
heuristics for sensing actions.
L18 Contd: Part 2: POMDPS start. The
model. The non-markovian nature of decisions based on
observations and the need for observation history. Two
ways of compactly representing the observation history--as
belief-space and as a policy represented by a finite-state
controller. The depressing complexity results on POMDPS.
L19 Audio
of [Oct 27th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
POMDP discussion continued. Formally showing that POMDP is an MDP in
the belief space. Discussion of the value iteration for
finite horizon POMDP. Ideas for improving the complexity
of value iteration.
L20 Audio
of [Nov 1st, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Approximating POMDP value function (with FOMDP one as the upper bound
and NOMDP one as the lower bound). Online approaches for
POMDP. (Comparing POMDP online search to non-deterministic
belief space search with observations).
L21 Audio
of [Nov 3rd, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Reinforcement learning--the problem, the dimentions of RL
algorithms. Passive RL with montecarlo. Passive RL with
ADP. Generalization in RL. Model correctness/completeness considerations
and notions of robustness.
L22 Audio
of [Nov 8th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Active learning, exploration policies; GLIE policies; model-free
learning--Temporal difference learning; Q-learning; SARSA
and on- vs. off- policy learning.
L23 Audio
of [Nov 10th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Monte-carlo vs. Temporal difference learning; and the idea of
TD(lambda). Generalizationi in RL; basics of feature
functions and least-squares by gradient descent. Policy
Gradient search and the need for mixed policies.
Evauating the policy grdient empirically.
L23 Contd: Part 2: Statistical learning start. The idea of bayes
learning as using data to update hypothesis prior to
hypothesis posterior. The point that there is no
difference between learning and inference in Bayes
learning. The fact that bayes learning doesn't 'choose'
winners among hyptheses, but lets them all weigh in on the
prediction task. Comparing bayesian learning to medical
diagnosis task.
L24 Audio
of [Nov 15th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Foundations of statistical learning. Density learning--generative and
discriminative cases. Bayesian learning--and its
computational challenges; MAP and MLE learning approaches
as a way of making Bayesian learning tractable. Religious
wars between Bayesians and frequentists, and how they both
agree on the MLE case--albeit for different
reasons. Agenda for the rest of the discussion.
L25 (No) Audio
of [Nov 17th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Dimensions of variations in statistical learning; BN learning with
complete data and MLE; the process of MLE learning;
multi-parameter BN learning; continuous parameter
learning; some discussion of how it becomes useful in
Bayes learning.
L26 Audio
of [Nov 22nd, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Bayesian learning. Conjugate priors and their role (and comparison to
Kalman filtering). Bayes learning on bayes nets; parameter
independence assumption. The scenario of learning
generative models for relational data, and lead-up to
EM. How EM works and Why EM works (connection to step-less
hill-climbing).
L27 Audio
of [Nov 24th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
EM Variants; EM examples with soft-K-means and hidden-variable bayes
network learning. Structure learning issues. BIC score and
its use in a hill-climing for structure
learning. The Dim(G) measure and its dependence on both
number of edges and the parameterization. Relational bayes networks. Motivation, the idea
of schema level dependency specification. The advantages
of the schema-level specification. Issues in doing
inference on relational networks--full compilation to
ground bayes networks vs. partial (as needed) compilation
vs. Lifting. Issues in learning (and
how learning is helped by the reduction of number of
parameters).
L28 Audio
of [Nov 29th, 2010] (Video of the
lecture
part 1 (4gb)part 2 (1gb)
Probabilistic temporal processes--why the state estimation is more
interesting here than in the nondeterministic case.
Markovian and stationarity assumptions. How stationarity
leads to dynamic bayes networks--which are templated
networks. Types of inference in DBNs--filtering;
smoothing; prediction; most-likely explanation. Inference
algorithms for DBNs--exact based on recursive computation
and approximate based on modifying likelihood weighting to
particle filtering.
Schedue
11/29:
12/1: Final project presentations
12/6: Final project presentations
12/15: Final Exam (scheduled) 9:50--11:40 in BY 510