CSE 494 - Information Retrieval - Project Part 1 (Due Feb 23rd, 2010)

Project Description

This is the part A of the project for CSE 494/598. You are provided with a system that can extract web pages and index them. Using this system you will experiment with various Ranking Algorithms.


Example Queries

Download and set up

The code made available as part of the project is written in Java, in an Eclipse project. You can download the latest version of Eclipse here. [Use the second link, the one that says Eclipse IDE for Java Developers (92 MB).] If you choose not to use Eclipse, add the lucene.jar file in the lib folder to your classpath, and you should be ready to compile your code using any compiler of your choice.

Knowledge of Java is a pre-requisite for this course. If you need to brush up your Java skills, you should read The Java Tutorial. A detailed, class-by-class reference for the Java language and libraries can be found here. You could also simply search for the name of the class on your favorite internet search engine.

Getting Started:

  1. Download the following file to your desktop cse494-v1.zip.
  2. Extract the contents of the zip file to a folder of your choice.
  3. Start Eclipse. Click on File > New > Java Project.
  4. In the dialog box that appears, click Create project from existing source. Click Browse... and point to the folder you extracted your files in.
  5. Click Finish. A new project should now appear in your Package Explorer panel.
  6. Expand the src folder in that tree, and double click on SearchFiles.java inside edu.asu.cse494.
  7. Click on Run > Run As > Java Application.
  8. You will now be able to type queries in the "Console" area at the bottom of the screen.

Description and documentation

Additional Files: