CSE 494/598 Information Retrieval, Mining and Integration on the Internet

Instructor: Subbarao Kambhampati

Next offering: Fall 2011 (T/Th 4:30--5:45pm, BYENG 270 (**NEW BIGGER ROOM**))

Notice: If you are trying to get into either 494 or 598 sessions of this class for Fall 2011, please show up on the first day. I will not be able to give any individual overrides at this time. My experience however is that there is a lot of churn over summer and everyone who wants a seat typically gets one when the dust settles down..

This course is geared towards exposing students to some of the core technologies for controlling and using the content on the Internet. The following are some of the questions we will consider:

  1. How do search engines work? Why are some pp better than others?
  2. Can we think of the web as a big database/knoweldge base and support efficient database style query processing?
  3. Can we find useful pearls and patterns in the mass of accessible data on the Internet?

This course will be breadth-oriented introduction to the issues involved in answering these questions.

Prerequisites: CSE 310 required. Other courses that will help include CSE 471 (AI) CSE 412 (Databases) and CSE 450 (Algorithms). I am hoping that students have had at least one of these 4-level courses already, but won't insist on them. Students planning to register for this course are encouraged to talk to the instructor (via email at rao wholivesat asu dot edu).

Grading: The grading will be based on class participation, exams and projects.

Textbooks: There is no prescribed textbook. We will read papers (see the reading list.)

Overview: The best overview is the list of topics and lecture notes from the previous offering (shown below).

Additional pointers:

Lecture Notes & Audio & Video (Spring 2010)

  1. Introduction [Jan 19, 2010]

  2. Course Overview + Big themes [Jan 21, 2010]

  3. Information Retrieval

  4. Indexing and Tolerant Dictionaries

  5. Correlation analysis and Latent Semantic Indexing (Annotated version of the matlab session playing with SVD is here.).

  6. Doing IR on Web: Anchor Text; Page Importance Measures

  7. Social networks and their applications on the Web
    Topics below this line are not included for Midterm

  8. Crawling & Hardware Issues; Clustering of search results.

  9. Text Classification

  10. Recommendation Systems

  11. Specifying and Exploiting Structure

  12. Information Extraction
  13. Information Integration
  14. Sayanora

  15. What students seem to remember


  1. 5/11: Final 9:50 - 11:40 AM
Last modified: Mon Aug 15 09:03:52 MST 2011