Relational Databases Cheatsheet

1. Relation vs Table

INFORMAL TERMSFORMAL TERMS
TableRelation
ColumnAttribute
Data typeDomain
RowTuple
Table definitionRelation schema

2. Keys

  • A subset of the attributes of a relation schema R is a superkey if it uniquely identifies any tuple in r(R).
  • A superkey K is called a candidate key if no proper subset of K is a superkey. That is, if you take any of the attributes out of K , then it is not enough to uniquely identify tuples.
  • The primary key is chosen from the candidate keys and the primary key is one of the candidate keys.

3. Constrains

  • Domain constraints: every value in a tuple must be from the domain of its attribute.
  • Key constraints: a bunch of keys (superkey, candidate key and primary key).
  • Entity integrity constraints: no primary key value can be NULL.
  • Referential integrity constraints: a foreign key in one table must have a matching value in the primary key of another table.

4. SQL

4. Database Design – Four Phases

  1. Requirements Collection and Analysis
  2. Conceptual Design
    Entity-Relationship Model
  3. Logical Design
    From Entity-Relationship Model to Relation Schemas
  4. Physical Design

5. Entity-Relationship (ER) Model

5. Functional Dependency

Let R be a relation schema.

  • A FD on R is an expression X Y with attribute sets X,Y R.
  • A relation r(R) satisfies X Y on R if, for any two tuples
    t1,t2 ∈ r(R), whenever the tuples t1 and t2 coincide on values of X, they also coincide on values of Y .

t1[X] = t2[X]

t1[Y] = t2[Y]

6. Implied Functional Dependencies

7. Minimal Cover

8. Finding Keys

9. Normalisation

9.1 BCNF

A relation schema R is in BCNF if whenever a non-trivial FD X A holds in R, then X is a superkey. When a relation schema is in BCNF, all data redundancy based on functional dependency are removed.

Algorithm for a BCNF-decomposition
Input: a relation schema R′ and a set Σ of FDs on R′.
Output: a set S of relation schemas in BCNF, each having a set of FDs

  • Start with S = {R′};
  • Do the following for each R ∈ S iteratively until no changes on S:
    • Find a (non-trivial) FD X Y on R that violates BCNF, if any;
    • Replace R in S by two relation schemas XY and (R Y ) and project the FDs to these two relation schemas.

9.1.2 3NF

A relation schema R is in 3NF if whenever a non-trivial FD X → A holds in R, then X is a superkey or A is a prime attribute.

Algorithm for a dependency-preserving and lossless 3NF-decomposition

Input: a relation schema R and a set Σ of FDs on R.
Output: a set S of relation schemas in 3NF, each having a set of FDs

  • Compute a minimal cover Σ′ for Σ and start with S = φ
  • Group FDs in Σ′ by their left-hand-side attribue sets
  • For each distinct left-hand-side Xi of FDs in Σ′ that includes Xi A1,Xi A2,…,Xi Ak:
    • Add Ri =Xi ∪{A1}∪{A2}···∪{Ak} to S
  • Remove all redundant ones from S (i.e., remove Ri if Ri Rj )
  • If S does not contain a superkey of R, add a key of R as R0 into S.
  • Project the FDs in Σ′ onto each relation schema in S

10. Relational Algebra

11. Database Security

Confidentiality

– E.g. enforced by access control mechanisms Integrity

– E.g. enforced by access control mechanisms and integrity constraints specified on schemas

Availability

– E.g. enforced by recovery and concurrency control mechanisms

Some further services

Encryption: to protect data when being transmitted across systems and when being stored on secondary storage

Query authentication: to ensure a query result is correct by using signature mechanisms and data structures

12. Access Control Mechanisms

  1. Discretionary access control (DAC)
    • Grant privileges
    • Revoke privileges
    • Delegate privileges
  2. Mandatory access control (MAC)
    • It is based the Bell-LaPadula model (originally developed for U.S. Department of Defense multilevel security policy).
      • Subjects (e.g. users) are assigned security clearances;
      • Objects (e.g. rows, tables, views) are assigned security classes.
        • TS S C U (TS: Top secret, S: Secret, C: Confidential, U: Unclassified)
      • Two MAC rules are enforced by the model:
        • Subject X can read object Y only if clearance(X ) ≥ class(Y ). ,→ Read down
        • Subject X can write object Y only if clearance(X ) ≤ class(Y ). ,→ Write up
  3. Role-based access control (RBAC)

13. Transactions

A transaction is a sequence of database operations grouped together for execution as a logic unit in a DBMS.

14. Locking

Two main types of locks:

  • Read lock for reading an object by a transaction
  • Write lock for writing an object by a transaction

Lock compatibility:

Lock typeread-lockwrite-lock
read-lockYesNo
write-lockNoNo

14.1 Two-Phase Locking (2PL) Protocol

Locks are handled in two phases:

  • Expanding: locks are acquired and no locks are released.
  • Shrinking: locks are released and no locks are acquired.

Cons:

  • 2PL may be subject to deadlocks, i.e., the mutual blocking of two or more transactions.
  • 2PL can radically limit interleaving among transactions in some cases …

Pros:

2PL makes interleaving safe, i.e., guarantee the serializability property for transactions.

  • Serializability means that a resulting database state is equal to a database state of running transactions serially.
  • Serializability is the major correctness criterion for concurrent transactions.

15. NoSQL

Main Categories of NoSQL Solutions

  1. Key-value data stores
    – Dynamo: Amazon’s Highly Available Key-value Store, DeCandia & et al, SOSP 2007.
  2. Column-oriented data stores
    – Bigtable: a distributed storage system for structured data, Chang & et al, OSDI 2006.
  3. Document-oriented data stores
    – MongoDB: http://mongodb.org, including source code and documentation.
  4. Graph databases
    – Neo4j: http://www.neo4j.org, including source code and documentation.

Other non-relational databases (not usually considered as NoSQL databases) include native XML databases, object databases, etc.

Leave a Comment

Your email address will not be published. Required fields are marked *

Index
Scroll to Top