[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Clarification re: open and closed world assumptions on DB
There was a question today about what is the problem with negation and open
vs. closed world assumption (and I couldn't
give a compelling example). Here is some wisdom.
---------
"What's with negation and OCW/CCW"
A database can make two guarantees about its data:
(1) that all tuples in it are "correct" and
(2) that its tables are complete (i.e., there are no tuples
that belond to that relation that it is not storing).
If we know the database is complete, we can make closed world assumption
(lierally there are no more tuples belonging to that relation than are
stored in the database).
Normally, a query processor can guarantee that any SQL queries without
negation it processes on a database will return "correct" results modulo
the correctness of the database, and complete results modulo the
completeness of the database.
However, when the query has negation, then there is a cross connection
between correctness of the answer and the completeness of the database. To
see this consider a query "give me all the directors who *did not* make any
movie in 1960". In this case, suppose
the database is correct but incomplete, and thus missed storing a movie
that was made in 1960, just because of that incompleteness you might
identify a wrong director as an answer to the query.
-->if the data source is complete ("closed") and you make the wrong
assumption that it is incomplete ("open"), then you fail to guarantee that
an actually complete answers is complete (and thus become inefficient as
you will continue accessing other sources in hope of getting more results).
swer.
-->if the data source is incomplete ("open") and you make the wrong
assumption that it is complete ("closed"), then you lose *soundness* and
thus can return wrong tuples!
So, the moral of the story is this. If you are not sure whether the data
source is "complete" or not, you are better off erring on the side of
"incomplete" (i.e., make openworld assumption). This way, you can at least
avoid giving *wrong* answers (but will lose completeness). A better idea is
to try and characterize islands of completeness in the databases (which is
what local closed world assumptions do).
-------
Hope this helps.
Rao