The question asks you to use bag similarity (also called Jaccard Similarity) instead of vector similarity. The similarity measure is defined in terms of the ratio of the cardinality of intersection and union of a bag. The thing to note here is that the intersection of two bags (containing multiple instances of iterms, say e1 and e2) is a bag that contains as many e1 as the minimum of e1 in both bags and as many e2 as minimum of e2 in both bags. So B1= 2 e1, 5e2 B2= 4 e1, 2 e2 B1 .intersection B2 = 2 e1, 2 e2 B1 .union. B2 = 4 e1, 5 e2 Cardinality of a bag is of course the number of items in that bag.