CSE 494 Readings
The main papers are those starting un-indented. The indented
numbered papers are optional readings
General (non-technical) readings
Manning
et al book on information retrieval. Chapters from the book would provide good text-book material for IR topics
A
textbook (in development) on mining massive datasets. Has
good coverage of link analysis, map-reduce, advertising
on the web etc.
Text Retrieval (a draft chapter from Wei
Meng, SUNY Binhghamton. Used with Dr. Meng's permission).
Special readings for latent semantic indexing: Chapter on
LSI in Mannig et al book
Search
Engine Technology (a draft chapter from Wei
Meng, SUNY Binhghamton. Used with Dr. Meng's
permission).(Primary Reference**)
(Distributed) Indexing
Crawling
Cluster computing
-
Google linux
cluster (read first three pages to get an idea on
how Google processes a query on a parallel linux cluster
with tons of replication)
Social Networks
Overall Seach Engine
Clustering
Required readings:
16
and
17
17 from the IR book draft.
Text Classification & Collaborative and Content-based Filtering
Database refresher readings
XML as a Semi-structured Language
Semantic Web
Information Extraction
Data Integration
Logic
based techniques in data integration. Alon Levy
On the need for Schema Mapping
Query Optimization/Procesing in Data Integration
Collection Selection (Text data aggregation)
Combining Database and Information Retrieval
All of the following are short papers...
Web Services
Background Refreshers
- Some basics of linear algebra (vectors, matrices, eigen values).
- Matrices,
vectors spaces and Information Retrieval Berry et. al. (a math view of linear algebra in IR
- Here are refreshers on database background.
Subbarao Kambhampati
Last modified: Tue Oct 4 19:55:29 MST 2011