ET-I3 Colloquium
Approximate Query Processing
Dr. Gautam Das
Microsoft Research Labs
Friday, May 14th, 2004 11:30 AM BY 660
Abstract
In recent years, advances in data collection and management technologies
have led to a proliferation of very large databases. However, effective
data analysis such as data mining and decision support on such
multi-gigabyte repositories has proven difficult to achieve. This is
primarily because most analysis queries, by their nature, require
aggregation or summarization of large portions of the data. Processing
even a single analysis query involves accessing enormous amounts of data,
leading to prohibitively expensive running times.
While keeping query response times short is very important in such
applications, exactness in query results is frequently less
important. In many cases, ?ballpark estimates? are adequate to
provide the desired insights about the data. The acceptability of inexact
query answers coupled with the necessity for fast query response times
has led researchers to investigate Approximate Query Processing
techniques that sacrifice accuracy to improve running time, typically
through some sort of lossy data compression.
In this talk I will cover some of the approximate query processing
techniques that we have developed in recent years at Microsoft Research,
especially techniques based on pre-computed random samples.
Speaker Bio: Dr. Gautam Das is a researcher in the Data Management,
Exploration and Mining Group at Microsoft Research. Prior to Microsoft he
has held positions at Tandem (now HP) and the University of Memphis. Dr.
Das graduated with a B.Tech in computer science from IIT Kanpur, India,
and with a Ph.D. in computer science from the University of Wisconsin,
Madison.
Dr. Das?s research interests are in search, retrieval, exploration, and
mining of relational information stores. He is especially interested in
investigating the application of cross disciplinary techniques
(information retrieval, machine learning, statistics, algorithms, and
data structures) to the specific data exploration problems that
relational databases pose.