[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Thinking Cap: A Post-Easter Resurrection..

Considering that this is the last quiet weekend before the beginning of the end of the semester, I could sense a collective yearning
for one last thinking cap. So here goes...

1. We talked about classification learning in the class last couple of days. One important issue in classification learning is access to training data that is "labeled" --i.e., training examples that are pre-classified.   Often, we have a lot of training data, but only part of it is pre-classified.
Consider for example, spam mails. It is easy to get access to a lot of mails, but only some of them may be known for sure to be spam vs. non-spam.   It would be great if learning algorithms can use not just pre-labeled data, but also unlabeled one. Is there a  technique that you can think of that can do this?  (Hint; Think a bit back beyond decision trees..)  

(Learning scenarios where we get by with some labeled and some unlabeled data are are "sem-supervised learning tasks").

Okay. One is enough for now, I think..