CSE 494 - Information Retrieval - Project Part 1 (Due Date Here)

Project Description

This is the part A of the project for CSE 494/598. You are provided with a system that can extract web pages and index them. Using this system you will experiment with various Ranking Algorithms.

You should consider finishing the coding for this project at least 2 days before the actual deadline, so you have enough time for the analysis. Note that most of the points in this assignment are awarded for the analysis

SUBMIT:

Example Queries

Download and set up

The code made available as part of the project is written in Java, in an Eclipse project. You can download the latest version of Eclipse here. [Use the third link, the one that says Eclipse IDE for Java Developers (122 MB).] If you choose not to use Eclipse, add the lucene.jar file in the lib folder to your classpath, and you should be ready to compile your code using any compiler of your choice.

Knowledge of Java is a pre-requisite for this course. If you need to brush up your Java skills, you should read The Java Tutorial. A detailed, class-by-class reference for the Java language and libraries can be found here. You could also simply search for the name of the class on your favorite internet search engine.

Getting Started:

  1. Download the following file to your desktop cse494-v1.zip.
  2. Extract the contents of the zip file to a folder of your choice.
  3. Start Eclipse. Click on File > New > Java Project.
  4. In the dialog box that appears, enter the path to the folder in the Location text box.
  5. Click Finish. A new project should now appear in your Package Explorer panel.
  6. Expand the src folder in that tree, and double click on SearchFiles.java inside edu.asu.cse494.
  7. Click on Run > Run As > Java Application.
  8. You will now be able to type queries in the "Console" area at the bottom of the screen.

Description and documentation

Additional Files: