[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ET-I3 Colloquium: Dr. Gautam Das from Microsoft Research on Approximate Query Processing






ET-I3  Colloquium
 
 
Approximate Query Processing
 
Dr. Gautam Das
Microsoft Research Labs
 
 
 
Friday, May 14th, 2004  11:30 AM  BY 660
 
 Abstract

In recent years, advances in data collection and management technologies have led to a proliferation of very large databases. However, effective data analysis such as data mining and decision support on such multi-gigabyte repositories has proven difficult to achieve. This is primarily because most analysis queries, by their nature, require aggregation or summarization of large portions of the data. Processing even a single analysis query involves accessing enormous amounts of data, leading to prohibitively expensive running times.

While keeping query response times short is very important in such applications, exactness in query results is frequently less important.  In many cases, ?ballpark estimates? are adequate to provide the desired insights about the data. The acceptability of inexact query answers coupled with the necessity for fast query response times has led researchers to investigate Approximate Query Processing techniques that sacrifice accuracy to improve running time, typically through some sort of lossy data compression.

In this talk I will cover some of the approximate query processing techniques that we have developed in recent years at Microsoft Research, especially techniques based on pre-computed random samples.
 

Speaker Bio: Dr. Gautam Das is a researcher in the Data Management, Exploration and Mining Group at Microsoft Research. Prior to Microsoft he has held positions at Tandem (now HP) and the University of Memphis. Dr. Das graduated with a B.Tech in computer science from IIT Kanpur, India, and with a Ph.D. in computer science from the University of Wisconsin, Madison.

Dr. Das?s research interests are in search, retrieval, exploration, and mining of relational information stores. He is especially interested in investigating the application of cross disciplinary techniques (information retrieval, machine learning, statistics, algorithms, and data structures) to the specific data exploration problems that relational databases pose.