More on how pagerank paper got rejected from SIGIR (and some philosophy on conferences)--comments?

[You can read the following either as a discussion of conference reviews etc., or as a discussion of importance estimation in
social networks using link-analysis.]

I mentioned in passing yesterday that the original pagerank paper was rejected from SIGIR (the 1998 conference). It never did
get published as a separate paper. All the cites to the idea are redirected instead to their WWW 1998 paper.

This brings up the issue of how we are supposed to judge the importance of a conference (and researcher). Here are two methods:

1. Consider the importance of a conference in terms of it's *acceptance ratio*--i.e., what exceedingly small fraction of the papers submitted to the conference
are actually accepted for publication. The researchers can then be judged in terms of how many hard-to-get-into conferences did they get their papers into.
The attractiveness of this approach of course is its very immediacy--as soon as the conference acceptance decisions are made, we have the acceptance ratios--which can be
used to judge the conferences as well as researchers. The disadvantage is that there is really no connection between the long term impact of a paper/conference and the acceptance ratio. The page rank paper example above says that just because SIGIR is very selective doesn't mean it is right (actually there is anecdotal evidence that the total sum of citations to all SIGIR papers is smaller than the citations to the pagerank paper it rejected ;-).
Similarly, high acceptance ratio doesn't necessarily mean that the selection process is easy. consider that if Presidential race were to be seen as a conference, it has a whopping 50% acceptance ratio (I hope the Ralph Nader fans will forgive me for saying this). In the end, acceptance ratios tell us how much of a fad the area currently is.

2. Consider the importance of a conference in terms of the citations to the papers appearing in that conference. This is link-analysis--except generalized to a "cluster" of nodes (corresponding to the papers of that conference) rather than to individual nodes. The importance of a researcher then is in terms of the aggregate citations to their papers
(an example measure is H-index http://en.wikipedia.org/wiki/Hirsch_number ). This statistic is *not* immediately obvious--since citations are a "lagging" measure. [The day after
Lincoln's gettysburg address, the news papers around the country were split almost 50-50 in their judgement of the importance of that address--with many papers calling it a
lack luster speech by a third rate politician). The advantage is that citations do tell us about the long-term importance/half-life of the paper (and a conference's ability to select such papers). I tell my students that if ever they find themselves writing a paper for a confernce A and find that they are mostly citing papers from conferences B and C, they should re-consider their decision to send the paper to A--or admit that they are second-class ;-).

Of course, even this measure can be further improved. There may be conferences on Scientology and/or Creationism whose papers are very well cited--by other papers appearing in those conferences--but the world at large completely ignores it. To evaluate this, you need to separate citations from within the conference/domain and across domains [Notice the connection to the idea of giving separate weights to intra-domain and transverse links that we mentioned in yesterday's class].

Not surprisingly, this is what serious ranking sites go by citations rather than acceptance ratios (although authors tend to put the acceptance ratios in their CVs!). See

http://citeseer.ist.psu.edu/impact.html for a ranking of impact of about 1200 CS publication venues (and check where your conference occurs)

see http://libra.msra.cn for a more refined ranking--where each conference/researcher are ranked with respect to each area (considering in-domain vs. inter-domain citations). Zaiqing Nie--who is the architect of the Libra system--will be here on October 24th to give a talk partly about how this system is built.

===========
1 and 2 above also have an impact on the way conferences are run (and researchers do research). Coming to conferences--it is easy enough to optimize for lower acceptance ratios. But if what we want to optimize for is longer term expected impact, then how does the conference--and its reviewers--do this? Notice that by rejecting just the page-rank paper SIGIR lost about half its cumulative citations. The essay by Ken Church -- at http://rakaposhi.eas.asu.edu/church-precision-reviews.pdf provides a few thoughtful suggestions.
One important point the paper makes is that while precision is easier to judge, recall is harder (you can tell if there are relevant results in the top-10 pages returned by a search engine, but you can't tell if it left out any really important pages from the top-10). [Another related article is http://rakaposhi.eas.asu.edu/patterson-conferences.pdf )

cheers
Rao