[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reminder on Today's AIDB seminar [Friday, 3-4pm, ERC 593]




Folks:

This is a reminder that the first of the AIDB seminars for this spring
will be today--3pm in ERC 593 (as per Chris Mayer's mail last
week). Zaiqing Nie will talk about using datamining techniques to
gather source coverage and source overlap statistics for data
integration scenarios.


Here is the abstract.

Rao
[Feb 15, 2002]


MINING SOURCE COVERAGE STATISTICS FOR DATA INTEGRATION


Recent work   in   data  integration  has  shown  the  importance   of
statistical information about the coverage  and overlap of sources for
efficient query processing. Despite   this  recognition there are   no
effective   approaches  for learning  the  needed  statistics. The key
challenge in learning such statistics is  keeping the number of needed
statistics   low  enough  to have  the    storage and  learning  costs
manageable. Naive approaches can   become infeasible very  quickly. In
this paper we present a set of  connected techniques that estimate the
coverage and  overlap statistics while  keeping  the needed statistics
tightly under control. Our approach uses a hierarchical classification
of the queries,  and threshold based  variants of familiar data mining
techniques to dynamically  decide the level of  resolution at which to
learn the  statistics.  We describe the details    of our method,  and
present  experimental results   demonstrating the   efficiency of  the
learning algorithms and the effectiveness of the learned statistics.