CSE 494/598 Information Retrieval, Mining and Integration on the Internet

This course is geared towards exposing students to some of the core technologies for controlling and using the content on the Internet. The following are some of the questions we will consider:

  1. How do search engines work? Why are some pp better than others?
  2. Can we think of the web as a big database/knoweldge base and support efficient database style query processing?
  3. Can we find useful pearls and patterns in the mass of accessible data on the Internet?

This course will be breadth-oriented introduction to the issues involved in answering these questions.

Prerequisites: CSE 310 required. Other courses that will help include CSE 471 (AI) CSE 412 (Databases) and CSE 450 (Algorithms). I am hoping that students have had at least one of these 4-level courses already, but won't insist on them. Students planning to register for this course are encouraged to talk to the instructor (via email at rao wholivesat asu dot edu).

Grading: The grading will be based on class participation, exams and projects.

Textbooks: There is no prescribed textbook. We will read papers (see the reading list.)

Overview: The best overview is the list of topics and lecture notes from the previous offering (shown below).

Additional pointers:

Lecture Notes from Fall 2005 (slides in ppt; lectures in .wav)

  1. Introduction (8/22;)
      Audio of the lecture [Aug 22, 2005]

  2. Text retrieval; vectorspace ranking
    1. Audio of the lecture [Aug 24, 2005]
    2. Audio of the lecture of [Aug 29, 2005]
    3. Audio of the lecture of [Aug 31, 2005]

  3. Indexing/Retrieval issues

  4. Correlation analysis & Latent Semantic Indexing

  5. Search engine technology

  6. Anatomy of Google etc

  7. Clustering

  8. Text Classification

  9. Filtering/Personalization

  10. Web & Databases: Why do we even care?

  11. XML and handling semi-structured data

  12. Semantic web and its standards (RDF/RDF-S/OWL...)

  13. Information Extraction

  14. Data/Information Integration/aggregation

  15. Query Processing in Data Integration: Gathering and Using Source Statistics

  16. Bridging Information Retrieval and Databases

  17. Social Networks

  18. Interactive Review + a (corny) ending (Here are the notes by the TA of the student review comments)

11/21; 11/23 (DB & IR); Collection Selection; Webservices
11/28; 11/30: Social network Analysis (Kevin bacon game; Erdos number;
Trust propagation etc)
12/5: Interactive Review
DB+IR --> Imprecise queries (1 week)
Collection Selection (1 class)
web services (1 class)
Social Network Analysis (1-2 classes)

Subbarao Kambhampati
Last modified: Wed Jan 24 15:02:49 MST 2007