[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Clarification re: open and closed world assumptions on DB

To: cse494-f05@parichaalak.eas.asu.edu
Subject: Clarification re: open and closed world assumptions on DB
From: Subbarao Kambhampati <rao@asu.edu>
Date: Wed, 09 Nov 2005 18:37:38 -0700

There was a question today about what is the problem with negation and open vs. closed world assumption (and I couldn't give a compelling example). Here is some wisdom.

---------
"What's with negation and OCW/CCW"

A database can make two guarantees about its data: (1) that all tuples in it are "correct" and (2) that its tables are complete (i.e., there are no tuples that belond to that relation that it is not storing).

If we know the database is complete, we can make closed world assumption (lierally there are no more tuples belonging to that relation than are stored in the database).

Normally, a query processor can guarantee that any SQL queries without negation it processes on a database will return "correct" results modulo the correctness of the database, and complete results modulo the completeness of the database.

However, when the query has negation, then there is a cross connection between correctness of the answer and the completeness of the database. To see this consider a query "give me all the directors who *did not* make any movie in 1960". In this case, suppose the database is correct but incomplete, and thus missed storing a movie that was made in 1960, just because of that incompleteness you might identify a wrong director as an answer to the query.

-->if the data source is complete ("closed") and you make the wrong assumption that it is incomplete ("open"), then you fail to guarantee that an actually complete answers is complete (and thus become inefficient as you will continue accessing other sources in hope of getting more results). swer.

-->if the data source is incomplete ("open") and you make the wrong assumption that it is complete ("closed"), then you lose *soundness* and thus can return wrong tuples!

So, the moral of the story is this. If you are not sure whether the data source is "complete" or not, you are better off erring on the side of "incomplete" (i.e., make openworld assumption). This way, you can at least avoid giving *wrong* answers (but will lose completeness). A better idea is to try and characterize islands of completeness in the databases (which is what local closed world assumptions do).

-------

Hope this helps.

Rao

Prev by Date: Project part b grading remarks from the TA
Next by Date: required reading for next class
Previous by thread: Project part b grading remarks from the TA
Next by thread: required reading for next class
Index(es):
- Date
- Thread