CSE 494/598 Information Retrieval, Mining and Integration on
the Internet
Next offering: Spring 2010 (T/Th 10:30--11:45AM; BYAC 150)
Note for Undergrads: If you are unable to
get in because of capacity restrictions, show up for the
first class anyway. There may well be seats by the end of
first week.
This course is geared towards exposing students to some of the core
technologies for controlling and using the content on the
Internet. The following are some of the questions we will
consider:
- How do search engines work? Why are some
pp better than others?
- Can we think of the web as a big database/knoweldge base and support efficient
database style query processing?
- Can we find useful pearls and patterns in the mass of
accessible data on the Internet?
This course will be breadth-oriented introduction to the issues
involved in answering these questions.
Prerequisites: CSE 310 required. Other courses that will help include
CSE 471 (AI) CSE 412 (Databases) and CSE 450 (Algorithms). I
am hoping that students have had at least one of these 4-level
courses already, but won't insist on them. Students planning
to register for this course are encouraged to talk to the
instructor (via email at rao wholivesat asu dot edu).
Grading: The grading will be based on class participation, exams and
projects.
Textbooks: There is no prescribed textbook. We will read papers (see
the reading list.)
Overview: The best overview is the list of topics and lecture notes
from the previous offering (shown below).
Additional pointers:
Lecture Notes & Audio from Fall 2008
Notes from Fall 2008; T/Th 1:30--2:45pm, BY 270
Instructor: Subbarao
Kambhampati (Office Hours: T/Th 2:45--3:45PM BY 560)
(TA: Garrett Wolf. Office Hours: Wed: 3:30--4:30pm BY557AD)
- Introduction
[Aug 26, 2008]
- Information
Retrieval start [Aug 28, 2008]
-
Correlation Analysis, dimensionality reduction and latent semantic indexing
-
Indexing and Tolerant Dictionaries
-
Doing IR on Web
-
Social Networks
-
Link Analysis to Predict Web-page Importance
-
Clustering (of search results)
- Text
Classification
- Recommendation
Systems
- Structure
on the web
- Information
Extraction
- Information
Integration
End
- Audio
of the lecture on [Dec 2, 2008] Information
Integration. Alteranative views of II. Three
architectures for II--surfacing, mediator and
warehousing. Dimensions affecting II architectures
including source autonomy, horizontal vs. vertical
integration, level of up-front effort etc., main phases
of mediator architectures--source selection, source data
access (wrappers), source/data alignment (schema mapping),
query processing.
- Audio
of the lecture on [Dec 4, 2008] Query procesing in
Information Integration. GAV/LAV models for connecting
mediator and source schemas (and their relative
tradeoffs). Query optimization issues in information
integration (data aggregation). Computing and using
source statistics. Use of source coverage and overlap
statistics in source selection.
- Audio
of the lecture on [Dec 9, 2008] Query procesing in
Information Integration continued. More on
coverage/overlap statistics; discussion on local closed
world assumptions as a qualitative analog of coverage
statistics; latency statistics with respect to binding
patterns; trust ranking of
sources. Handling imprecision/incompleteness in
databases. Bridging the DB/IR divide. Computing value
similarities with super-tuples. Review of Structure on
the Web. Review of the sememster and Sayanora with a
corny ending.
- Interactive
review (or what students seem to remember from the course)
classes coming up
- 12/9: Final class
- 12/16: Final exam (12:10--2:00pm)
Todo
- Search contextualization..
- Measures for comparing ranked lists..
Last modified: Mon Jan 11 15:11:24 MST 2010