[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Homework Grading Rubric
- To: undisclosed-recipients:;
- Subject: Homework Grading Rubric
- From: William Cushing <email@example.com>
- Date: Fri, 17 Feb 2012 00:55:24 -0700
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=iaLqFczvhlsJqmqI98yDTUIa30HHpQ2KBcBSN8NqgSc=; b=NJUHs5QhYH79Glk+iTNWp32b13CYN7pSHDzieSKFSZO/1386ruSu6e72CF6BsAsanI kTX5GWeKRPTMBfsjFq0JkeoMH8dqVbJEHaOThj5D/CNRSWO1teOYLNUw8Yo3LJAaj8V0 WYjx3zOcZFDAoaNXjU5x2RJC5X0IPr9nIMSoU=
The maximum possible points are 76.
The maximum homework scores are 65 and 64. After that are a handful of mid-50s. The median score is something like 43.
The mean is 39ish, the mean excluding the lowest scores is 41ish. The vast majority of the class scored in the mid 30s to mid 40s.
(The minimums are several 0s, of course, and among actual assignments are several sub-20 scores such as 6 and 12, as I recall.)
So a mid 40 is something like a B or B+. Low 50s and higher is definitely an A; I think there are about 7 such scores.
As I'm sure you can tell I began the grading in `reviewer' mode, i.e., how I would review a submission to an international conference or journal on artificial intelligence.
This is a mistake which was fortunately halted before total catastrophe ensued. Please
don't take my comments personally, indeed, feel free to entirely ignore them. Note that the grading is consistent and fair, at least, as well as can be expected
from a human being. In particular the drastic change in the level of commentary between the first and second halves means little besides a drastic change
in the level of commentary --- the grading itself was just as `harsh'.
It is worth noting that at least one (if not several) student(s) got a perfect (or above perfect) score on any given question (not the same students each time).
Also, the entire class average would likely go up by 10 or even 15 points had all the assignments been in fact complete.
The following is a break-down of how the final judgments were converted into points. This varies a little from question to question,
but the general idea is that correctness is worth a point and justification is worth a point, and that each question no matter
how large or how small was worth 2. Except question 1, where its 8 sub-questions are only worth 1 each.
All rounding is rounding up; all sub-scores are truncated to the maximum possible where scoring over maximum was possible.
Question 1: 8 possible points.
0/3 of a point for no example,
1/3 of a point for a flawed example,
2/3 of a point for a dubious example,
3/3 of a point for a fine example.
For any agent whatsoever in any domain, say X, a fine example consists of:
X with faulty/good sensors, with/without multiple agents, and with faulty/good effectors.
Other idioms are more appropriate for certain domains. For driving, for example, one might say:
Driving (not) in a blizzard, (not) on a closed course, and (not) drunk.
Poker with(out) Extra-sensory perception, with(out) opponent's families as hostages, and drunk/sober. (Or dealer/not-dealer.)
Question 2: 6 possible points.
1/4 of a point for a incorrect answer with faulty explanation.
2/4 of a point for a correct answer with faulty explanation, or, for an incorrect answer with a good explanation.
3/4 of a point for a correct answer with a dubious explanation, and in some cases perhaps even for an incorrect answer (with fantastic explanation).
4/4 of a point for a correct answer with correct explanation.
The correct answers are that accessibility and determinism depend on the agent design, respectively due to sensor and motor/effector quality, while staticness is an environmental
parameter usually independent from agent design. (But there are exceptions.)
Questions 3 through 6: 22 possible points.
These are summed and rounded in one group. There are three possible marks in three possible positions:
A leading +/- indicates that the answer provided was deemed technically correct/incorrect. (X for not committing to a single clear answer.)
The following +/- indicates that the explanation provided was deemed adequate/inadequate. (X for not giving any explanation at all, or in some cases, an explanation worse than nothing at all.)
Each of those is worth a point.
Optionally a trailing +/- is a positive or negative half-point, modifying the score for the explanation.
All combinations are possible. In particular one could give an interesting explanation for a wrong answer and receive as much as 1.5 out of 2 points.
It is also possible to score 2.5 out of 2 points, which is the point of summing all of the questions together before rounding and truncating to the maximum score.
As the score "- - -" would be worse than no answer, these are first altered to "- - X", removing the negative modifier.
Most of the time the trailing modifiers cancel out or nearly so.
In theory one could have scored 2.5 points on questions 3 and 4, 10 points on 5.1 and 5.2, and another 2.5 points on question 6, which I may have neglected to truncate to the maximum score
which should be possible on the entire set: 2+2+8+8+2 = 22. If you have a sub-score above 22 on that part, you might want to avoid bringing my attention to it ;).
Question 7: 8 points total.
7.1 used the scale above. The correct answer to 7.1 is of course DFS, because of its vastly
superior space-complexity and no loss of quality under these
The other 3 parts were marked using +/- letter grades with the following meaning, as I recall:
A+ = 2.5
A- to A = 2
B+ = 1.5
C+ to B = 1
D to C = 0.5
X = 0.
The order these are listed in is 7.2.1, 7.2.3, and 7.2.2. A+ on 7.2 was for drawing the tree to depth 3 proper (and getting everything else right).
The correct answers to 7.2.1 and .3 are virtually the same; a barren sub-tree is effectively a special case of a highly non-uniform goal distribution at the leaves. The grading was
looking for non-uniformity as the reason to prefer iterative-broadening over DFS in 7.2.1, and the grading was looking for an appropriate connection to barrenness being made in
the answer to 7.2.3. Many answers were extremely difficult to parse or understand. Using specific examples greatly benefited those students who did so.
It was also a good observation to note that: if the left-most goal is halfway through the leaves in such a way that it is found at the cutoff b'=b/2 then IBDFS has a huge (astronomical for, say, b=100)
advantage. Note the solution guide suggests that the 3.1 etc. nodes are good for IBDFS --- this is incorrect. b'=b is the worst-case for IBDFS. For b=3 and d=3 there are actually
only 4 possible leftmost goals for which IBDFS would perform better than DFS, and so, `on average' it likely would not perform better except of course that we have little idea what
`on average' means except that it *isn't* a straight average over the leaves of the tree treating each as having equal likelihood of being the leftmost goal.
Questions 8-12: 32 possible points.
2 points per question mark plus 2 more for "Is the manhattan distance heuristic admissible." which I think was missing a question mark.
+ means full credit, i.e., 2.
- means partial credit, i.e., 1.
x means no credit.
These were all pretty straightforward.
Further notes on what was considered correct/incorrect:
Question 8 (2 points): There were quite a few similar strategies for attempting to prove question 8 using simpler arguments than the standard of "consider any other path at the time A* returns".
Virtually no one got points for this question.
Question 3 (2 points): In order of difficulty: partial-accessibility, complete inaccessibility, full accessibility. With partial-accessibility one has the option of designing the agent to pre-plan for contingencies, or re-plan when
the unexpected happens. With no sensors, neither is possible, so neither need be implemented. Not implementing is of course easier than implementing.
Question 4 (2 points): Detail is not always helpful. Up to a point a better model can help an agent improve its performance --- but a model of the entire universe at nanometer resolution is too far. There are two points: the computational burden
of too much detail, also, the more precise your model the more likely it is that it is inaccurate, perhaps gravely so. (In learning theory this is called *overfitting*, and deliberate simplification of a model to avoid such is called regularization. In planning
one would say abstraction rather than regularization.)
Question 5.2: There was a lot of confusion on this question. One point in particular is that the solution guide uses a very strict notion of what it means
to achieve a goal -- ruling out stochastic achievement almost as a matter of definition. It also isn't very clear that A2 isn't meant to be doubly handicapped.
Question 6 (2 points): IDDFS is a complete algorithm and so cannot possibly be performing infinitely more work than DFS --- the conclusion doesn't make sense.
The actual ratio is the ratio between the sum of the integers from 1 to D (which is D choose 2) by just D, which is (D+1)/2. (Where D=d+1, the number of vertices in the single path that is the entire search tree.)
EC (4 points): The ratio (b+1)/(b-1) stems from an average-case analysis, which involves much more annoying algebra than the far easier worst-case analysis.
I think that everyone who made a serious attempt got at least 1 point? Elegant psuedo-proofs got 2.