CSE 494/598 Information Retrieval, Mining and Integration on the Internet

Instructor: Subbarao Kambhampati (Office Hours: T/Th 2:45--3:45PM BY 560)

******Check out the notes from the latest offering (now *with* videos!)*****

Note for Undergrads:I am particularly interested in seeing more undergraduate students register for this course.

Note for Grads: As of [Aug 20, 2008], additional seats have been added to the 598 section.

This course is geared towards exposing students to some of the core technologies for controlling and using the content on the Internet. The following are some of the questions we will consider:
  1. How do search engines work? Why are some pp better than others?
  2. Can we think of the web as a big database/knoweldge base and support efficient database style query processing?
  3. Can we find useful pearls and patterns in the mass of accessible data on the Internet?

This course will be breadth-oriented introduction to the issues involved in answering these questions.

Prerequisites: CSE 310 required. Other courses that will help include CSE 471 (AI) CSE 412 (Databases) and CSE 450 (Algorithms). I am hoping that students have had at least one of these 4-level courses already, but won't insist on them. Students planning to register for this course are encouraged to talk to the instructor (via email at rao wholivesat asu dot edu).

Grading: The grading will be based on class participation, exams and projects.

Textbooks: There is no prescribed textbook. We will read papers (see the reading list.)

Overview: The best overview is the list of topics and lecture notes from the previous offering (shown below).

Additional pointers:


Lecture Notes & Audio

Spring 2007; T/Th 3:15--4:30PM BYAC 190

Instructor: Subbarao Kambhampati (Office Hours: T/Th 4:30--5:30PM BY 560)

(TA: Bhaumik Chokshi. Office Hours: Wed: 10-11AM BY557BB)

CEAS student evaluations from Spring 2007..

  1. Introduction [Jan 16, 2007]

  2. Text retrieval; vectorspace ranking

  3. Correlation analysis & Latent Semantic Indexing

  4. Indexing; Crawling; Exploiting tags in web pages

  5. Social Network Analysis

  6. Link Analysis in Web Search (A/H; Pagerank)

  7. Clustering

  8. Text Classification

  9. Filtering/Recommender Systems
    1. Optional slides on Map/Reduce
    2. Optional slides on Min-hash/LSH
    3. Optional slides on probabilistic latent semantic indexing

  10. Why do we even care about databases in the context of web?

  11. XML and handling semi-structured data + Semantic Web standards

  12. Information Extraction

  13. Information/data Integration

  14. Query Processing in Data Integration
  15. End

  16. Mini WWW-2007 Conference presentations [May 8, 2007]. (And here is the audio of the presentations).

Topics to come:
[May 1, 2007]: Query processing issues; DB/IR
[May  8, 2007]: Mini-Www-2007 

Subbarao Kambhampati
Last modified: Thu Apr 22 18:52:15 MST 2010