[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reminder on Today's AIDB seminar [Friday, 3-4pm, ERC 593]
Folks:
This is a reminder that the first of the AIDB seminars for this spring
will be today--3pm in ERC 593 (as per Chris Mayer's mail last
week). Zaiqing Nie will talk about using datamining techniques to
gather source coverage and source overlap statistics for data
integration scenarios.
Here is the abstract.
Rao
[Feb 15, 2002]
MINING SOURCE COVERAGE STATISTICS FOR DATA INTEGRATION
Recent work in data integration has shown the importance of
statistical information about the coverage and overlap of sources for
efficient query processing. Despite this recognition there are no
effective approaches for learning the needed statistics. The key
challenge in learning such statistics is keeping the number of needed
statistics low enough to have the storage and learning costs
manageable. Naive approaches can become infeasible very quickly. In
this paper we present a set of connected techniques that estimate the
coverage and overlap statistics while keeping the needed statistics
tightly under control. Our approach uses a hierarchical classification
of the queries, and threshold based variants of familiar data mining
techniques to dynamically decide the level of resolution at which to
learn the statistics. We describe the details of our method, and
present experimental results demonstrating the efficiency of the
learning algorithms and the effectiveness of the learned statistics.