Homework Assignments

  • Homework 4 (assigned: [Apr 8, 2013]; Due: [Apr 18, 2013])
    1. Question 1. K-means on documents
      The question asks you to use bag similarity (also called Jaccard
      Similarity) instead of vector similarity. The similarity measure is
      defined in terms of the ratio of  the cardinality of intersection and
      union of a bag.  The thing to note here is that the intersection of two bags
      (containing multiple instances of iterms, say e1 and e2) is
      a bag that contains as many e1 as the minimum of e1 in both bags and
      as many e2 as minimum of e2 in both bags.
      So B1= 2 e1, 5e2
      B2= 4 e1, 2 e2
      B1 .intersection B2 = 2 e1, 2 e2
      B1 .union. B2 = 4 e1, 5 e2
      Cardinality of a bag is of course the number of items in that bag.

    2. Question 2: Do hierarchical agglomerative clustering on the data in question 1 (use single link measure for inter-cluster distance)

    3. Question 3: Text classification using NBC classifier.

    4. Question 4: Collaborative Filtering

    5. Solutions

    Subbarao Kambhampati
    Last modified: Wed Dec 7 11:55:46 MST 2011