[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Thinking cap] on anchor text/page importance etc. (with a carrot about homework 2 deadline extension)



Some of you have asked for extension on the homework 2 due date. Since the requester set includes non-zero number of "rain-or-shine" brigade (i.e., those who actually show up to the class regularly), I am willing to extend the deadline to next Tuesday (2nd March).

Here however is the catch. Since NAACP says a mind is a terrible thing to waste, I want you to keep yours busy by thinking about (and *commenting on* the following) thinking-cap questions:

1. We said that the relative importance of the anchor text characterizations of a page P depends on how many other pages are pointing to P with that  characterization. How should the "number of pages" be taken into account?  Should the "type of pages" somehow matter? (i.e. does it as to *which* page is saying those things about page P?) If so, how do you propose it should be taken into consideration?


2. We had a slide about the page importance desiderata. Comment on
  2.1. To what extent are each of those desiderata actually subsumed by link-based analysis?
  2.2.  In the old days, we used to put links to various pages because that was the easiest way to get back to them when you need to. Now, with search engines getting more and more effective, there is not as much of a reason to put links to each other. How does this affect the utility of link-based analysis techniques for finding page importance?
  2.3. Can you give some examples of how current day search engines actually handle notions of importance that are not strictly subsumed by link-based analysis?


3. [The "I finally had an orgasm, but my doctor said it was the wrong kind" question]: At the start of IR discussion, we said what we are trying to compute is the "relevance" of a document d, given a user U and query Q. We then decided to approximate the relevance by a similarity computation between the document and the query (and spent the intervening weeks getting deeper and deeper into how best to compute this similarity). Now that we decided to throw in the notion of page importance, do you think this should be seen still as a part of relevance (just a more accurate computation of relevance..) or is it some other orthogonal dimension?  (extra credit: Why does the orgasm quote related to this question?).

4. [The "The woods will be silent indeed, if no birds sang except those that sing the best" flame war]:  Suppose you search google to find the exact quote and source of the bird quote by Henry Vandyke  you get frustratingly many minor variations of the quote (including one by yours truly, which attributes it to Henry David Thoreau! ). It looks as if letting the unwashed masses put up web pages is leading to all sorts of inaccurate information. Don't you think it would be simpler to go back to the  peace-and-quiet of the age of  poll-taxes and control web page creation?  (I know this is beginning to look suspiciously like a SAT essay prompt.... you can focus also on whether life will be more or less interesting for CS folks if the society were to go to this model.)

cheers
Rao