[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Clarification re: open and closed world assumptions on DB



There was a question today about what is the problem with negation and open vs. closed world assumption (and I couldn't
give a compelling example). Here is some wisdom.


---------
"What's with negation and OCW/CCW"

A database can make two guarantees about its data:
(1) that all tuples in it are "correct" and
(2) that its tables are complete (i.e., there are no tuples that belond to that relation that it is not storing).


If we know the database is complete, we can make closed world assumption (lierally there are no more tuples belonging to that relation than are stored in the database).

Normally, a query processor can guarantee that any SQL queries without negation it processes on a database will return "correct" results modulo the correctness of the database, and complete results modulo the completeness of the database.

However, when the query has negation, then there is a cross connection between correctness of the answer and the completeness of the database. To see this consider a query "give me all the directors who *did not* make any movie in 1960". In this case, suppose
the database is correct but incomplete, and thus missed storing a movie that was made in 1960, just because of that incompleteness you might identify a wrong director as an answer to the query.



-->if the data source is complete ("closed") and you make the wrong assumption that it is incomplete ("open"), then you fail to guarantee that an actually complete answers is complete (and thus become inefficient as you will continue accessing other sources in hope of getting more results). swer.


-->if the data source is incomplete ("open") and you make the wrong assumption that it is complete ("closed"), then you lose *soundness* and thus can return wrong tuples!

So, the moral of the story is this. If you are not sure whether the data source is "complete" or not, you are better off erring on the side of
"incomplete" (i.e., make openworld assumption). This way, you can at least avoid giving *wrong* answers (but will lose completeness). A better idea is to try and characterize islands of completeness in the databases (which is what local closed world assumptions do).


-------

Hope this helps.

Rao