CSE 494/598 Information Retrieval, Mining and Integration on the Internet

Instructor: Subbarao Kambhampati

Next offering: Spring 2010 (T/Th 10:30--11:45AM; BYAC 150)

Note for Undergrads: If you are unable to get in because of capacity restrictions, show up for the first class anyway. There may well be seats by the end of first week.

This course is geared towards exposing students to some of the core technologies for controlling and using the content on the Internet. The following are some of the questions we will consider:
  1. How do search engines work? Why are some pp better than others?
  2. Can we think of the web as a big database/knoweldge base and support efficient database style query processing?
  3. Can we find useful pearls and patterns in the mass of accessible data on the Internet?

This course will be breadth-oriented introduction to the issues involved in answering these questions.

Prerequisites: CSE 310 required. Other courses that will help include CSE 471 (AI) CSE 412 (Databases) and CSE 450 (Algorithms). I am hoping that students have had at least one of these 4-level courses already, but won't insist on them. Students planning to register for this course are encouraged to talk to the instructor (via email at rao wholivesat asu dot edu).

Grading: The grading will be based on class participation, exams and projects.

Textbooks: There is no prescribed textbook. We will read papers (see the reading list.)

Overview: The best overview is the list of topics and lecture notes from the previous offering (shown below).

Additional pointers:

Lecture Notes & Audio from Fall 2008

Notes from Fall 2008; T/Th 1:30--2:45pm, BY 270

Instructor: Subbarao Kambhampati (Office Hours: T/Th 2:45--3:45PM BY 560)

(TA: Garrett Wolf. Office Hours: Wed: 3:30--4:30pm BY557AD)

  1. Introduction [Aug 26, 2008]

  2. Information Retrieval start [Aug 28, 2008]

  3. Correlation Analysis, dimensionality reduction and latent semantic indexing

  4. Indexing and Tolerant Dictionaries
  5. Doing IR on Web
  6. Social Networks

  7. Link Analysis to Predict Web-page Importance

  8. Clustering (of search results)

  9. Text Classification

  10. Recommendation Systems

  11. Structure on the web

  12. Information Extraction

  13. Information Integration


  14. Interactive review (or what students seem to remember from the course)

classes coming up


Last modified: Mon Jan 11 15:11:24 MST 2010