[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

my PhD dissertation proposal defense



Hi, folks,

I plan to defend my PhD dissertation proposal on this Friday. Below I give the
abstract of my proposal (including the time and room). Hope to see you there.

Nie
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

          Flexible Query Processing in Data Integration
                          Zaiqing Nie

               GWC487 10AM, Friday March 22, 2002  
                         PhD Committee
                 Dr. Subbarao Kambhampati (Chair)
                      Dr. K. Selcuk Candan
                         Dr. Huan Liu
                      Dr. Louiqa Raschid
                      Dr. Susan D. Urban

                            Abstract

Query optimization in the context of integrating heterogeneous data sources on
the Internet has received significant attention of late. In contrast to 
traditional query optimization which assumes that each relation is stored in
the same primary database, in data integration scenarios, a relation is
effectively stored across multiple and potentially overlapping sources, and 
particularly, 1) it's impractical to assume the autonomous sources will export
statistics to the mediator; 2) the sources may have a variety of access
limitation; 3) the latency for querying these sources can be very
high; 4) most of these sources can handle many concurrent queries; 5) and
users may have multiple objectives in terms of what sources are good
sources. Consequently,  query optimization in data integration requires the 
ability to gather source statistics and consider the coverages offered by
various sources and the parallelism among them together to select optimal
plans.

In response, we propose a novel query optimization framework in which we adapt
data mining techniques to automatically gather relevant statistics about the
data sources, and use the gathered statistics to support multi-objective query
optimization. Specifically, we propose to use threshold based variants of 
association rule mining techniques to discover frequently asked query classes,
and learn statistics only w.r.t these classes to keep number of needed
statistics low enough to have the storage and learning costs manageable. With
the gathered statistics, we intend to develop optimization techniques for
supporting joint optimization of coverage, execution cost and cost to first
tuple. We search in the space of parallel plans that support multiple source
calls for each subgoal conjunct. The refinement of the partial plans takes
into account the potential parallelism between partial plans takes into
account the potential parallelism between source calls. We implemented a
prototype of our approach, the empirical evaluation demonstrates that the
plans generated  by our approach will be significantly better, both in terms
of planning cost, and in terms of plan execution cost, compared to the 
existing approaches.  The empirical evaluation also shows the efficiency of
the learning algorithms and the effectiveness of the learned statistics.
                                                                               
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>