CSE494pgRank
Class LinkGen

java.lang.Object
  |
  +--CSE494pgRank.LinkGen

public class LinkGen
extends java.lang.Object

Generate the Link Matrix from the files crawled. The class considers the link mapping for only the files/URL present in the repository. Any URL not crawled and stored is considered not present. LinkGen first maps all the documents to a hastable. Then it recursively goes through each document and extracts URLs that this document points to. Each extracted URL is compared to the list of URLs in hastable and discards those that are not present. Then if document A has a link to B, a entry in link table saying A->B is made. This is done for all the documents. The link matrix so generated is stored in a file.


Constructor Summary
LinkGen(java.lang.String repository)
          Constructor that accepts directory name where crawled webpages are stored
 
Method Summary
 void linker()
          Method to generate the link matrix from the files stored in repository.
static void main(java.lang.String[] args)
          Should be called as: java LinkGen Crawled_Files_Directory (without the ending '/').
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LinkGen

public LinkGen(java.lang.String repository)
Constructor that accepts directory name where crawled webpages are stored

Method Detail

linker

public void linker()
Method to generate the link matrix from the files stored in repository.


main

public static void main(java.lang.String[] args)
Should be called as: java LinkGen Crawled_Files_Directory (without the ending '/'). Calls the methods to generate and store a link matrix.