[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fw: My Ph.D. defense, Friday 12th - BY420 2:00PM



--- Begin Message ---
Ph.D. Dissertation Defense
Mining and Using Coverage and Overlap Statistics for Data Integration

By
Zaiqing Nie

Friday, March 12, 2004  2:00 PM
Brickyard 420

Committee
Dr. Subbarao Kambhampati, Chair
Dr. K. Seluk Candan
Dr. Huan Liu
Dr. Louiqa Raschid
Dr. Susan D. Urban


Abstract

Query processing in the context of integrating autonomous data sources on the
Internet has received significant attention of late. In contrast to
traditional query processing, which assumes that each relation is stored in
the same primary database, in data integration scenarios, a relation is
effectively stored across multiple and potentially overlapping sources. Query
processing in data integration requires coverage and overlap statistics of
these autonomous sources to generate optimal query plans. However, it is
impractical to assume the autonomous sources will export statistics to the
mediator, and users may have differing objectives in terms of what coverage
they want and how much execution cost they are willing to bear for achieving
the desired coverage.

In response, this dissertation introduces a novel query processing engine,
which automatically gathers coverage and overlap statistics about the data
sources and then uses the statistics to support multi-objective query
optimization. Specifically, the dissertation presents StatMiner, a statistics
mining approach which automatically generates attribute value hierarchies,
effectively discovers frequently accessed query classes, and learns coverage
and overlap statistics only with respect to these classes. Next, the
dissertation introduces Multi-R, a multi-objective query optimizer which
supports joint optimization of coverage and cost of query plans using coverage
and overlap statistics. The efficiency of StatMiner and the effectiveness of
the learned statistics are demonstrated in the context of BibFinder, a
publicly available bibliography mediator developed as a testbed for this
work. The empirical evaluation of Multi-R also shows that the generated query
plans are significantly better than the existing approaches, both in terms of
planning cost and in terms of plan execution cost.

Open to public! 


------------------------------
Zaiqing Nie
CSE Dept.
Arizona State University


--- End Message ---