Here are a few questions on complexity and search engines--mostly to check what got through. You are required to
1. My friend was asked by his boss to come up with an efficient algorithm for a time-tabling problem for the company.
My friend was unable to come up with anything that runs in less than exponential time. He wanted to convince his boss that
no one else could have done any better. He went and told the boss this: "Look, this time-tabling problem can be easily (i.e., in polynomial time)
converted into a boolean satisfiability problem. Now, it is well known that boolean satisfiability is an NP-Complete problem (i.e., it is one of 'em
monster problems in class NP). So it is no bloody surprise that I am unable to come up with anything other than an exponential algorithm".
What do you think of my friends' reasoning?
2. Explain why the arguments based on reductions to/from NP-complete problems is needed? Can't we just prove that a problem can't have anything
other than exponential (or polynomial) solution? Do you know of any famous problems that were thought to be exponential for a long time until someone found a polynomial algorithm?
3. One of the CSE faculty members used to have a bunch of magnetic words stuck to his (then metallic) doors. All the students passing by will try to arrange the magnetic words to make interesting english messages. If we assume that each student was making it a point to use *all* the words in his/her message, then what would the bag-of-words similarity measures say about the pair-wise difference among those messages? Considering that I told you that search engines use these as the default similarity metrics, what does it make you think about the intelligence of search engines?
4. Suppose I have a document D1: "Rao is a happy chap" I make another document D2 by copying all the text from D1 and pasting it 2 times into D2 (so D2 is
"Rao is a happy chap Rao is a happy chap"). What will the bag similarity metric say about the similarity between D1 and D2? What will the cosine-theta (or vector) similarity metric
say about the similarity between D1 and D2? Which do you think is better for this case?