[Thinking Cap] on learning.. (May be the last or penultimate chance to don that cap)

To: "Rao Kambhampati" <rao@asu.edu>

Subject: [Thinking Cap] on learning.. (May be the last or penultimate chance to don that cap)

From: "Subbarao Kambhampati" <rao@asu.edu>

Date: Mon, 26 Nov 2007 17:20:45 -0700

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; bh=yNStqs61TJ838pL/asdof6/c42+k3ImZtgJNdzQg2yU=; b=uMbe1rigH7PhTOV0NFs0WtVG2+3DwWy2SKkCCI/f0DLORlOC+wviGZy80T93PFTB4zOIz+/QC0rp9hiyWCbbl4KVqxBOy3Wl946Lu0aeovooKWzp0yksaJvVefiI3544J0O1TUBMt75gEPfalwg+9l9r9q7qJMIr/0RxQFkAaUI=

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=Pnyren+m3enBPDOmhY2x9WixaC3p7wMJF2jzp8UUoLfie18XaCtHFbu57Ef20rXUDTRkr2wPm0dO7WWNWl3pEKAbkbYmAyqA2SHW7EFPZFubZZDaycco8pc/rfgqGboKk3eboTNMIdZf4ED+HKFBRaFli1NpFgFoBNriJOsBkvo=

Sender: subbarao2z2@gmail.com

Qn 0. [George Costanza qn] Consider two learners that are trying to solve the same classification problem with two classes (+ and -). L1 seems to be averaging about 50% accuracy on the test cases while L2 seems to be averaging 25% accuracy. Which learner is good? Why is this called the George Costanza question? ;-)

Qn 1. Consider a scenario where the training set examples have been labeled by a slightly drunk teacher--and thus they sometimes have wrong labels (e.g. +ve are wrongly labelled negative etc.). Of course, for the learning to be doable, the percentage of these mislabelled instances should be quite small. We have two learners, L1 and L2. L1 seems to be 100% correct on the *training* examples. L2 seems to be 90% correct on the training examples. Which learner is likely to do well on test cases?

Qn 2. Compression involves using the pattern in the data to reduce the storage requirements of the data. One way of doing this would be to find the rule underlying the data, and keep the rule and throw the data out. Viewed this way, compression and learning seem one and the same. After all, learning too seems to take the training examples, find a hypothesis ("pattern"/"rule") consistent with the examples, and use that hypothesis instead of the training examples. What, if any, differences do you see between Compression and Learning?

Qn 3. We said that most human learning happens in the context of prior knowledge. Can we view prior knowledge as a form of bias?

In particular, can you say that our prior knowledge helps us focus on certain hypotheses as against other ones in explaining the data?

that is all for now.

Rao