CSE494
Class Webcrawl
java.lang.Object
|
+--CSE494.Webcrawl
- public class Webcrawl
- extends java.lang.Object
- implements java.lang.Runnable
Crawls the web and stores the first 1000 URLs encountered. The files opened are saved into the directory from where crawler has been invoked.
Usage: java Webcrawl (http://SiteURL).
Method Summary |
int |
crawler(int newdepth)
The crawler which extracts the links recursively and goes through them |
static void |
main(java.lang.String[] args)
Invoke the crawler from commandline. |
void |
run()
|
void |
setStatus(java.lang.String status)
Method to print. |
Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
SEARCH
public static final java.lang.String SEARCH
STOP
public static final java.lang.String STOP
DISALLOW
public static final java.lang.String DISALLOW
SEARCH_LIMIT
public static final int SEARCH_LIMIT
URL_OPENED
public static final int URL_OPENED
- Limit on Number of URLs Opened
Webcrawl
public Webcrawl()
run
public void run()
- Specified by:
- run in interface java.lang.Runnable
crawler
public int crawler(int newdepth)
- The crawler which extracts the links recursively and goes through them
setStatus
public void setStatus(java.lang.String status)
- Method to print. Mimics to System.out.println
main
public static void main(java.lang.String[] args)
- Invoke the crawler from commandline. To invoke type: Webcrawl http://
The HTML files are stored in the directory from where it is invoked