CSE 571: Grading and notes for Homework 3

The total points for Homework 3 is 82. The point "distribution" for each question is as follows.

- Part I (4 questions): 15 (3 + 3 + 6 + 3)

- Part II (4 questions): 13 (4 + 6 + 3)

- Part III: 16 (4 for the look-ahead tree, 6 for calculating belief state, 6 for FOMDP approximation)

- Part IV:

+ Question 1: 5 for (a), 2 for (b).

+ Question 2: 8

+ Question 3: 3 x 3 for (a), 3 + 3 + 1 for (b), 6 + 1 for (c)

The statistics on students' points:

- Max: 78

- Min: 35

- Mean: 59.08

- Stdev: 12.51

- Median: 60

Here are some of my notes:

Part I:

- The size of belief space is the number of sets of proposition set, not the number of nodes in your transition diagram.

- Wrong observations:

+ the initial belief state has 2 states: one with P being true and the other with Q being true ==> since proposition R has not been mentioned, it can take any value. So we actually have 4 possible world states: I1 = P & ~Q & R, I2 = P & ~Q & ~R, I3 = ~P & Q & R and I4 = ~P & Q & ~R.

+ R, as a condition of A4, has no effect in the initial belief state ==> among 4 possible world states, two of them has R = true, and A4 takes effect.

- To find the result of applying A1 in the initial belief state, the easiest way is to break it into two parts A'_1: P => Q; A''_1: P => R, applying each to four possible initial world states {I1, ..., I4}: f({I1}, A'_1) = P & Q & R, f({I2}, A'_1) = P & Q & ~R, f({I3}, A'_1) = I3, f({I4}, A'_1) = I4, and taking the union to get the resulting belief state.

Part II:

- With sensing actions available for every proposition, this problem is fully observable. The size of belief space is equal to the size of the world state space.

- Sensing actions do change the belief state, though not the world as causative actions. So question 2, you need to include sensing actions as well.

Part III:

- The tree should have the depth of 2: starting from the initial belief, applying actions "stay" and "go" to get new belief states, and for each of them two observations can be given, and each resulting one new belief state. If you had this tree correct, you got 4 points. The other 6 is on computing the belief states. Most students did correctly this part.

- Many however didn't do the FOMDP approximation correctly. To solve this problem, first we assume that the world is fully observable, and given that we have the underlying MDP specification of the domain (meaning: states, transition functions, rewards at states), we can compute the value of states V*(0) and V*(1) using for example value iteration. (Note: as long as you knew that you had to find state values from the underlying MDP, you got full points for this step).

Next, back-up step: at the belief state achieved from b0 after applying action "stay" and observing "0", thus denoted here by <b0, stay, 0>, we compute the upper-bound for its value. Since we can be at state 0 with V*(0) value or state 1 with V*(1) value, and we know the distribution of this belief state, the expected value can be compute. It is the upper-bound U(<b0, stay, 0>) we need. Do the same to get U(<b0, stay, 1>), U(<b0, go, 0>) and U(<b0, go, 1>).

Next we need to back-up to these values to the belief states <b0, stay> and <b0, go> (their meanings should be clear). For <b0, stay>, we first need to compute the probability of observing 0 and 1 in <b0, stay>, Pr(o=0 | <b0, stay>) and Pr(o=1 | <b0, stay>). The upper-bound for value of <b0, stay> can then be computed as: U(<b0, stay>) = R(b0) + [ Pr(o = 0 | <b0, stay>) x U(<b0, stay, 0>) + Pr(o = 1 | <b0, stay>) x U(<b0, stay, 1>) ] (you can have discounted version here), where R(b0) is the expected reward at b0 (since the reward does not depend on actions applied, eventually it does not matter). Do the same to get U(<b0, go>).

Finally, the upper bound for b0 is the maximum between U(<b0, stay>) and U(<b0, go>). Note: some students took "max" for observations and "expected" for actions.

Part IV:

- Q1: In the scheduling part, some students model the problem with dates as variables, and their domains are sets of jobs. If you do this way, it is harder to write constraints correctly. The easier way is having variables for jobs, and their domains are dates.

Some students even model both jobs and dates as variables, and their domains contain other variables---this is very confusing...

For this problem, there is a disjunctive constraints involving 3 variables, therefore it is not a binary CSP (it's true that you can convert it into binary).

- Q2: You should show the value assignment for variables following the order given. The correct solution is that when you get to A3, there is no valid assignment, and the backtracking must be to H, and then up to A4.

- Q3: For (a), there should be constraints between (x1, x2), (x2, x8) and (x6, x8). Note that the constraint graph is asked for the whole problem, i.e. with 8 variables. For (b) note that in the min-induced-width procedure after you pick one variables with minimal degree, then you need to connect its parents to each other before selecting the next one. Since there's tie to break in this part, there're many correct ordering. Given an order d=<x1,..., xn>, the induced graph should have xn on the top, and again parents of a variables (at higher level) must be connected). The induced width is the max width of this graph. Some students did not understand this notion.

For (c): you need to start from xn, then revise the domain of variables having constraints with it---not the domain of xn. Let (x_n, x_k) be a constraint, then we remove values of x_k that does not "match" with *any* values of x_n.

Then do the same for x_{n-1} down to x1. Note that this order of revising domains is opposite with the order of picking variables during the search (and because of the way you remove values of v_k, any value you pick for x_k will have a matching value for x_n---thus the search is backtrack-free). Many students did this part for only 4 variables, and I still did the grading if they did the Min-Induced-Width and DAC correctly (but without giving full points, since with only these 4 points, it's so hard to know if your understanding on directed arc-consistency is correct).

Please let me know if you have any questions.

Tuan