Example: confidence

Translating SQL into the Relational Algebra

Translating SQL into the Relational AlgebraJan Van den BusscheStijn VansummerenRequired backgroundBefore reading these notes, please ensure that you are familiar with (1) therelational data model as defined in Section of Database ManagementSystems: The Complete Book (second edition) (hereafter abbreviated as TCB ); (2) the set-based Relational Algebra as defined in section ofTCB; its bag-based variant and extension as defined in sections and TCB; and (3) the SQL query language as defined in chapter 6 of these notes we will use the following example databaseschema about movies, as introduced in TCB Figure The attributesof the primary key are underlined.

Translating an arbitrary SQL query into a logical query plan (i.e., a rela-tional algebra expression) is a complex task. In these course notes we try to explain the most important elements of this translation by making the following simplifying assumptions: Since the latest version of SQL is a very large and complex language

Tags:

  Into, Tional, Real, Relational, Algebra, Translating, Relational algebra, Rela tional algebra

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Translating SQL into the Relational Algebra

1 Translating SQL into the Relational AlgebraJan Van den BusscheStijn VansummerenRequired backgroundBefore reading these notes, please ensure that you are familiar with (1) therelational data model as defined in Section of Database ManagementSystems: The Complete Book (second edition) (hereafter abbreviated as TCB ); (2) the set-based Relational Algebra as defined in section ofTCB; its bag-based variant and extension as defined in sections and TCB; and (3) the SQL query language as defined in chapter 6 of these notes we will use the following example databaseschema about movies, as introduced in TCB Figure The attributesof the primary key are underlined.

2 Movie(title:string, year:int, length:int, genre:string, studioN-ame:string, producerC#:int) MovieStar(name:string, address:string, gender:char, birthdate:date) StarsIn(movieTitle:string, movieYear:string, starName:string) MovieExec(name:string, address:string, CERT#:int, netWorth:int) Studio(name:string, address:string, presC#:int)1 IntroductionTranslating an arbitrary SQL query into a logical query plan ( , a rela- tional Algebra expression) is a complex task. In these course notes we tryto explain the most important elements of this translation by making thefollowing simplifying assumptions: Since the latest version of SQL is a very large and complex language(including features like recursion, stored procedures, user defined func-tions.)

3 We will focus here on the so-called SQL-92 subset of thelanguage, which can be considered as the traditional heart of SQL1(comprising only the traditional select-from-where queries, aggrega-tion, etc). In addition, we will consider aset-basedsemantics of SQL. Recallthat in a set-based semantics, no duplicates occur in input relationsor query results. The real semantics of SQL, in contrast, a bag-based semantics duplicatesdooccur in relations and queryresults. Although the presence or absence of duplicates can often beignored (as we do in these notes), theycanlead to different results forqueries involving aggregation ( , when we want to sum the queryresults).

4 In practice, therefore, the translation of SQL into a logicalquery plan is even more involved than described here. It is neverthelessfounded on the same will use expressions in theextendedrelational Algebra (see in the book) interpreted oversetsas logical query exclude ambiguities, we will assume without loss of generalityin what follows that all occurrences of relation symbols in a SQL statementare assigned a distinct name through the alias mechanism of SQL. In otherwords: we hence assume implicitly that SQL statements in which a relationsymbol occurs multiple times, like, for example,SELECT * FROM R, Ris rewritten into a SQL statement of the formSELECT * FROM R, R R2in which each occurrence is given a distinct (alias) name.

5 Such rewritingis straightforward to do algorithmically, and can hence be done as a pre-processing stepbeforeexecution of the algorithm described Select-from-where statements without subqueriesConsider a generalSELECT-FROM-WHERE statement of the formSELECTS elect-listFROMR1, .. ,R2T2, ..WHEREW here-conditionWhen the statement does not use subqueries in its where-condition, we caneasily translate it into the Relational Algebra as follows: Select-list Where-condition(R1 T2(R2) ).Note that:21. An aliasR2T2in theFROM-clause corresponds to a renaming T2(R2).2. It is possible that there is noWHERE clause.

6 In that case, it is ofcourse unnecessary to include the selection in the Relational If we omit the projection ( ) we obtain the translation of the followingspecial case:SELECT *FROMR1, .. ,R2T2, ..WHEREW here-conditionExample the movieTitleFROM StarsIn, MovieStarWHERE starName = name AND birthdate = 1960 Its translation is as follows: movieTitle starName=name birthdate=1960(StarsIn MovieStar).3 Normalizing Where-subqueries into Exists andNot Exists formIn Section 2 we have considered the special case of Translating select-from-where statements in which subqueries do not occur.

7 In general, however,we also have to be able to translate statements in which such subqueries dooccur. Subqueries can occur in theWHERE clause through the operators=,<,>,<=,>=,<>; through the quantifiersANY, orALL; or through the operatorsEXISTSandINand their negationsNOT EXISTSandNOT IN. We can easilyrewrite all of these cases using onlyEXISTSandNOT EXISTS, however, asillustrated SQL-statementSELECT movieTitle FROM StarsInWHERE starName IN (SELECT nameFROM MovieStarWHERE birthdate = 1960)can be rewritten equivalently as3 SELECT movieTitle FROM StarsInWHERE EXISTS (SELECT nameFROM MovieStarWHERE birthdate = 1960 AND name = starName)Example SQL-statementSELECT name FROM MovieExecWHERE netWorth >= ALL (SELECT MovieExec E)can be rewritten equivalently asSELECT name FROM MovieExecWHERE NOT EXISTS(SELECT MovieExec EWHERE netWorth < )Example relationsR(A,B) andS(C).

8 ThenSELECT C FROM SWHERE C IN (SELECT SUM(B) FROM RGROUP BY A)can be rewritten asSELECT C FROM SWHERE EXISTS (SELECT SUM(B) FROM RGROUP BY AHAVING SUM(B) = C)Without loss of generality we will hence assume in what follows that allsubqueries in theWHERE conditions are of the formEXISTSorNOT Context relationsTo translate a query with subqueries into the Relational Algebra , it seems alogical strategy to work by recursion: first translate the subqueries and thencombine the translated results into a translation for the entire SQL state-ment. If the subqueries contain subqueries themselves, we again translatethe latter first continuing recursively until we reach a level that does notcontain subqueries that do not contain subqueries themselves, we could thinkthat we can simply apply the method from Section 2.

9 There is one compli-cation, however: the subquery can refer to attributes of relations appearingin theFROM list of one of the outer lying queries. This is known following query contains a subquery that refers to thestarNameattribute of the outer movieTitleFROM StarsInWHERE EXISTS (SELECT nameFROM MovieStarWHERE birthdate = 1960 AND name = starName)We call the outer relations from which a correlated subquery uses certainattributescontext relationsfor the subquery. Note that a subquery can havemultiple context relations. We call the attributes of the context relationstheparametersof the subquery.

10 Note that not all of the parameters mustactually occur in the Example 5,StarsInis hence a context relation for the sub-query. (In this example, it is also the only context relation.) The correspond-ing parameters are all attributes ofStarsIn, ,movieTitle,movieYear, select-from-where subqueriesTo translate aSELECT FROM WHERE statement that is used as a subquery we must make the fol-lowing modifications to the method from Section 2: We must add all context relations to the cartesian product of therelations in theFROM list; We must add all parameters as attributes to the projection.


Related search queries