DBMS

DBMS is Database Management Systems is subject related to storage and retrieval of data from physical media. Read the DBMS tutorial topics and try to do the assignments.

About DBMS Tutorial

In this tutorial, you will learn all the concepts of DBMS and do assignment problems and exercises that come with the DBMS tutorial. The solution of each assignment is provided with the assignments. However, we encourage you to try to solve the problems on your own and then compare the results.

This tutorial is for beginners as well as advanced DBMS learners. There is no prerequisite to learn from this DBMS tutorial, but you must install a DBMS software to perform the SQL labs to get hands-on experiences.

DBMS Tutorial Topics

The Three Schema Architechture of Relational database

The Relational Database Management Systems is the most popular database system. The overall structure of the Database System is given here. The relational database is at three level or three schemas.

Three Tier Architecture
Figure 1 – Three Tier Architecture

Internal level with internal schema

In this level, an internal schema is defined and it is more related to the physical storage and access path of the database,

Conceptual level with conceptual schema

This level describes the structure of the database and describes what entities, data types, constraints and user operations defined on the database. You can also call it in memory DBMS.

Data Definition Language is used for defining this schema. Here are some examples of DDL commands.

  • ALTER
  • DROP
  • CREATE

External or View level with number of  External Schema or Views

At this level, you will get only part of the database as users and the rest of the information is hidden. This is implemented using the representational data model which is easily understood by the users.

Database Terminology

There are some important terminologies about a database that you should know. Data in a database at a particular instance is called the Database State or Snapshot. Whenever, we insert, delete and update any data, we change the database state.

The description of database and constraints are stored in a file called Metadata. You can also call it data about data. For example, if a database field “Name” store names of people, then Metadata will describe the data type for this field as “Char”, means this field will only contain strings of characters and so on.

An Access path is a structure, that will make access to data or search for data more efficient. An Index is a file that will provide direct and fast access to data record using some keywords or search terms. The Index is similar to an index of a book which contains some keywords for reference.

Bibliography

Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 27-Jan-2010. Database System Concepts. McGraw-Hill Education.

Ramakrishnan, Johannes Gehrke, and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).

How to Add a Constraint and Remove a Constraint From a Relation?

Constraints are like rules in relation so that you cannot violate them. There are plenty of constraint types that we use. In this article, we will add a check constraint and delete it from a relation.

The general process to add or remove a constraint from a Relation is as follows.

STEP 1:

In this step, you create EMPLOYEE relation and put a check constraint on Gender.

CREATE TABLE EMPLOYEE (EMPNO NUMBER(5) PRIMARY KEY, ENAME VARCHAR2(20), GENDER CHAR(2), SALARY NUMBER(7,2));
Create Table EMPLOYEES - DESC EMPLOYEES
Figure 1 – Create Table EMPLOYEES – DESC EMPLOYEES

STEP 2:

Here we add the constraint to the table .

ALTER TABLE EMPLOYEE ADD CONSTRAINT CHK CHECK (GENDER IN( 'M','F'));
Figure 2 - Add Constraint EMPLOYEES table
Figure 2 – Add Constraint EMPLOYEES table

STEP 3:

Now, you are going to delete the constraint, but before you drop the constraint, let us show you, how to get the constraint name first.

It is a necessary step if you are not the DBA who created the constraint on the relationship and you cannot delete the constraint unless you have appropriate permission.

SELECT  CONSTRAINT_NAME FROM USER_CONSTRAINTS WHERE TABLE_NAME = 'you table name';

In the above command, you only need to change the “Your Table Name’ with a table name that exists. In this example, EMPLOYEES.

Constraint Name
Figure 3 – Constraint Name

STEP4:

Now that I have the name of the constraint , We can delete the constraint using following command.

ALTER TABLE EMPLOYEE DROP CONSTRAINT CHK;
Figure 4 - Drop Constraint
Figure 4 – Drop Constraint

Summary

Following the above four steps can help you to identify and remove the constraints set on your database tables. Note that these commands are performed on Oracle 10g, therefore, you must be careful while running the same on other database systems. The command and procedure may be different.

Understanding GROUP BY and HAVING Clause

We use SELECT-FROM-WHERE kind of statement to run queries that get us specific rows from a relation. In the select statement, WHERE is the condition that returns specific rows from a relation.

What if we want to group the information? The GROUP BY and HAVING clause helps you to group the resultant rows by a specific column. GROUP BY and HAVING clause is used with aggregate functions like Count, Max, Min, Sum, etc;

This is easy to understand with an example.

Step 1:

Create a relation for students – Student ( rollno, name, age);

CREATE TABLE STUDENT (ROLLNO NUMBER(5) PRIMARY KEY,NAME VARCHAR2(30), AGE NUMBER(3), GRADE CHAR(2));
Figure 1: Create Table STUDENT
Figure 1: Create Table STUDENT

STEP 2:

Enter details of at least 10 students in to the STUDENT relation.

INSERT INTO STUDENT VALUES(50001,'DHANUSH',39,'A'); 
INSERT INTO STUDENT VALUES(50002,'RAJNI',29,'D'); 
INSERT INTO STUDENT VALUES(50003,'VIKRAM',32,'S'); 
INSERT INTO STUDENT VALUES(50004,'VADIVEL',25,'D'); 
INSERT INTO STUDENT VALUES(50005,'PRASAD',37,'S'); 
INSERT INTO STUDENT VALUES(50006,'SHANKAR',21,'A'); 
INSERT INTO STUDENT VALUES(50007,'RAM',33,'A'); 
INSERT INTO STUDENT VALUES(50008,'KARTICK',22,'C'); 
INSERT INTO STUDENT VALUES(50009,'SHAKTI',35,'B'); 
INSERT INTO STUDENT VALUES(50010,'SIMBU',20,'B');

To check the entered value in the table, you can run following query against the STUDENT relation.

SELECT * FROM STUDENT.
Figure 2: Instance of relation - STUDENT
Figure 2: Instance of relation – STUDENT

 STEP 3:

You can run two simple queries to understand the difference between a query without group by and a query with GROUP BY clause. First, you want a student with minimum age from the STUDENT relation using an aggregate function MIN().

SELECT MIN(S.age) FROM STUDENT S;

You can add WHERE clause, but we want query without any conditions. It will produce the following results.

Figure 3: Student with Minimum Age
Figure 3: Student with Minimum Age

You can see that there is only one person in an entire student relation whose minimum age is returned.Suppose you want to write a query to get the minimum age of student by student Grade. It means this statement “What is the minimum age of students who got A’s ?”  or “What is the minimum age of students who got B’s?” and so on.

Let’s run the following query that will get you the minimum age grouped by student GRADE.

SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE;
Figure 4: Student over minimum age grouped by Grade
Figure 4: Student over minimum age grouped by Grade

What is the Difference ?

In the first query, we treated the entire STUDENT relation as one group and that’s why we only have a single value for the whole relation.In the second query, we grouped STUDENT relation by student grade and for each of these grade got minimum age.

The HAVING clause

The HAVING clause is qualification for the Group By clause. It means you are putting more conditions on the resulting rows from GROUP BY clause.

SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE;

You know that it will return a list of minimum age by student grade, but suppose you want to see only minimum age that are greater than or equal to 30.

SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE HAVING MIN(S.age) >= 30;

The result is only one row whose minimum age is greater than or equal to 30 but grouped by student grade.

Figure 5 - Students Group by Age Having Age less than or equal to 30
Figure 5 – Students Group by Age Having Age less than or equal to 30

Group by cannot have duplicate values and the general format of the query is given below.

SELECT [DISTINCT] select list 
FROM from-list 
WHERE qualification 
GROUP BY group-list 
HAVING group-qualification

Source: Database Management Systems – Raghu Ramakrishan

You want to understand the GROUP BY and HAVING clause, then create similar examples and run them against the relations.

References

Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 27-Jan-2010. Database System Concepts. McGraw-Hill Education.

Ramakrishnan, Johannes Gehrke, and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).

JOIN Concepts in Relational Database Management System

JOIN put condition (selections and projections) on a CROSS-PRODUCT from two or more tables and the result is a smaller relation than the CROSS-PRODUCT

In DBMS, JOIN is an important operation in relational algebra and it extracts useful information from joining two or more relations. In this lab, you will create two tables TEACHER and STUDENT. A STUDENT studies under one or more TEACHER for different subjects, so student and teacher have a One-to-Many relationship.

Step 1: Create Table TEACHER and STUDENT.

Before we dive into the JOIN concept let’s create relations – TEACHER and STUDENT. You have to use the following SQL commands to do that. In the relations, TEACHER, and STUDENT, the TID and SID are primary keys respectively.

CREATE TABLE T (TID NUMBER(5) PRIMARY KEY , CNAME VARCHAR2(30));
Figure 1: Create Teacher Relation
Figure 1: Create Teacher Relation
CREATE TABLE STUDENT ( SID NUMBER(3) PRIMARY KEY, SNAME VARCHAR2(30), TNAME VARCHAR2(25), GRADE CHAR);
Figure 2: Create Relation STUDENT
Figure 2: Create Relation STUDENT

Step 2: Insert values into both the tables as follows

The next step is to insert values into both relations. First, you need to enter values for the TEACHER relation using following command,

INSERT INTO TEACHER VALUES(10001,'RAMESH'); 
INSERT INTO TEACHER VALUES(10002,'KIRAN'); 
INSERT INTO TEACHER VALUES(10003,'JOHN'); 
INSERT INTO TEACHER VALUES(10004,'PETER'); 
INSERT INTO TEACHER VALUES(10005,'FRODO');

An instance of TEACHER relation is given below.

Figure 3: Instance of Relation TEACHER
Figure 3: Instance of Relation TEACHER

Insert values into the STUDENT table as follows.

INSERT INTO STUDENT VALUES(501,'ANIL KUMAR' ,'RAMESH','S'); 
INSERT INTO STUDENT VALUES(502,'RAJESH KAPOOR','KIRAN','D'); 
INSERT INTO STUDENT VALUES(503,'SUBBARAJ','JOHN','A'); 
INSERT INTO STUDENT VALUES(504,'NAGESH ','PETER','C'); 
INSERT INTO STUDENT VALUES(505,'RAM PRASAD','FRODO','A'); 
INSERT INTO STUDENT VALUES(506, 'JASSE'); 
INSERT INTO STUDENT VALUES(507,'MADHU');

An instance of relation STUDENT is shown below, you must get similar results. For each student there is a teacher associated who taught a course.

The course relation is not required at this moment because these two relations are sufficient to understand the JOIN concepts in DBMS.

Figure 4: An Instance of STUDENT relation
Figure 4: An Instance of STUDENT relation

Cross-Product (Cartesian Product)

A Cross-Product is a product of two or more relations and is denoted by T ⨯ S. Suppose the relation Teacher has 5 tuples and the relation Student has 4 tuples, then the Cross-Product will have 5 x 4 = 20 tuples. Cross-Product is a JOIN without any conditions.

The Cross-Product operation results in a very large relation with multiple tuples. The JOIN operation accepts some conditions and applies them to the Cross-Product and result is a relational instance you want.

Depending on the condition applied you get different types of JOIN.

Note: A Cross-Product is also called NATURAL JOIN.

Equi-Join

SELECT * FROM TEACHER T, STUDENT D WHERE T.TNAME = D.TNAME;

This will return all the rows that are common in both teacher and student table.
i.e. t.tname = d.tname.

Figure 5: Equi-Join on relation TEACHER and STUDENT
Figure 5: Equi-Join on relation TEACHER and STUDENT

Left-Outer Join

In this join type, you get all rows that are common to both tables, (EQUI-JOIN) + remaining row from the left side table.

for example

SELECT T.TID,T.TNAME,S.SNAME, S.GRADE FROM TEACHER LEFT OUTER JOIN STUDENT ON T.TNAME = S.TNAME;

In the above command the TEACHER is left table and student is right table.

Figure 6: Left-Outer JOIN
Figure 6: Left-Outer JOIN

Right-Outer Join

In this type of JOIN, you get all row that are common to both table (EQUI-JOIN) + REMAINING ROWS from right side table.

SELECT S.SID, S.SNAME, T.TNAME FROM TEACHER T RIGHT OUTER JOIN STUDENT S ON T.TNAME = S.TNAME;

The student table is right hand side table, and you get most of your column from right hand side table. It means that the query returns rows that are in both table + rows in student table only.

Figure 7: Right - Outer JOIN
Figure 7: Right – Outer JOIN

References

Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 27-Jan-2010. Database System Concepts. McGraw-Hill Education.

Ramakrishnan, Johannes Gehrke and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).

Wikipedia. n.d. Data definition language. Accessed March 14, 2018. https://en.wikipedia.org/wiki/Data_definition_language.

Converting an E-R Model into Relational Model in DBMS

In this lesson you will learn to convert er model into relational model. First step in designing a database is to create an entity-relationship model. Then the entity-relationship model is converted into a relational model.

The relational model is nothing but a group of tables or relations that create for your database.

Read: DBMS Basics
Read: Basic DML commands
Read: Basic DDL commands

This article you will learn to convert an entity-relationship model into a relational model using an example. You will understand the process of modeling the real-world problem into an entity-relationship diagram and then convert that entity-relationship diagram into a relational model.

Problem Definition

In this example problem, you will create a database for an organization with many departments. Each department has employees and employees have dependents. To create a database for the company, read the description and identify all the entities from the description of the company.

  • The company organized into departments and departments have employees working in it.
  • Attributes of Department are dno, dname. Attributes of Employee include eno, name, dob,gender, doj, designation, basic_pay, panno, skillsSkills are multi-valued attribute.
  • The Department has a manager managing it. There are also supervisors in Department who supervises a set of employees.
  • Each Department enrolls a number of projects. Attributes of Project are pcode, pname. A project is enrolled by a department. An employee can work on any number of projects on a given day. The Date of employee work in-time and out-time has to keep track.
  • The Company also maintains details of the dependents of each employee. Attributes of dependent include dname, dob, gender and relationship with the employee.

Model an entity-relationship diagram for the above scenario.

Solution – Converting ER model into Relational model

From the real-world description of the organization, we were able to identify the following entities. These entities will become the basis for an entity-relationship diagram or model.

Entity-Relationship Mode - er model into relational model
Figure 1 – Entity-Relationship Mode

Convert to Relational Model

The next step in the database design is to convert the ER Model into the Relational Model. In the Relational Model, we will define the schema for relations and their relationships. The attributes from the entity-relationship diagram will become fields for a relationship and one of them is a primary field or primary key. It is usually underlined in the entity-relationship diagram.

An entity in a relational model is a relation. For example, the entity Dependent is a relation in the relational model with all the attributes as fields – eno, dname, dob, gender, and relationship.

Here is the Relational Model for above diagram of the company database. This the result after converting ER model into relational model.

Relational Model of Database - er model into relational model
Figure 2 – Relational Model of Database

References

Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 27-Jan-2010. Database System Concepts. McGraw-Hill Education.

Ramakrishnan, Johannes Gehrke, and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).

Understanding Basic SQL Queries in DBMS

Data Manipulation Language ( DML ) is the language used to maintain the database in a DMBS. The SQL (Structured Query Language) is the most popular language to manipulate databases.

DML Commands

The DML commands perform following tasks on a database.

  • INSERT
  • SELECT
  • DELETE
  • UPDATE

The SQL being a DML can do above task without any problem. Other than inserting, deleting or querying database, the SQL has advanced commands to change the schema of relations or create  new relations.

It is used for maintaining the database and they are called Data Definition Language (DDL) commands. You can create table, delete a table and do other operations to maintain the structure of your database.

INSERT command

The INSERT command helps insert data into the relation. For example, you can insert values in the ‘MANAGER’ relation as follows.

Figure 1: Inserting Data Into Manager Relation
Figure 1: Inserting Data into Manager Relation

The schema for Manager relation is given in the previous post about DDL commands.

SELECT command

The SELECT command helps query the database and retrieve the tuples from one or more relations.

There are two types of query – Selection(σ)  and Projection(π).

Selection(σ)

This unary operator selects all the tuple in a relation that meet specific condition using normal conditional operators and logical operator such as AND, OR and NOT.

For example

\sigma_{age > 30 ( Manager )}, will return all the tuples that have an age greater than 30.

Figure 2: Select Statement On Manager Relation
Figure 2: Select Statement On Manager Relation

Projection(π)

This unary operator selects columns, but you can add selection with those retrieved columns.

For example

\pi_{(age,salary(Manager)} will return two column – age and salary.

Figure 3: PROJECTION on MANAGER relation
Figure 3: PROJECTION on MANAGER relation

Rename operator(ρ)

\rho_{(Ename,Sal \rightarrow Name,Pay(Manager)}

The rename operator will rename the existing field of a relation to a different specified name. Alter is a DDL command that modifies the relation name.

Figure 4: RENAME operator on MANAGER relation
Figure 4: RENAME operator on MANAGER relation

UPDATE command

The UPDATE command modifies the value for one or more tuple in the relation. In the following example, we have an updated salary of a manager whose salary is less than 20000.

Figure 5: UPDATE command on MANAGER relation
Figure 5: UPDATE command on MANAGER relation

The result of the update is as follows.

Figure 6: Result of UPDATE statement on MANAGER relation
Figure 6: Result of UPDATE statement on MANAGER relation

DELETE command

Figure 7: DELETE command on MANAGER relation
Figure 7: DELETE command on MANAGER relation

The DELETE will delete the tuple from the relation. Here is an example of DELETE.

The following figure shows the relation Manager after the delete operation. Note that one entry is already deleted successfully.

Figure 8: Results of DELETE operation on MANAGER relation
Figure 8: Results of DELETE operation on MANAGER relation

References

Ramakrishnan, Johannes Gehrke, and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).

Wikipedia. n.d. Data definition language. Accessed March 14, 2018. https://en.wikipedia.org/wiki/Data_definition_language.

Basic Data Definition Language(DDL) Commands

The Data Definition Language( DDL ) commands are used for creating new or modify existing schema of a relation. You can also set constraints such as key constraints, foreign key constraints, etc on a relation.

CREATE TABLE Command

This command creates a new relation or table. You must specify the fields and data type for each field while using this command.

Figure 1: Create Table MANAGER
Figure 1 – Create Table MANAGER

ALTER TABLE Command

This is most important of DDL commands because it allows us to modify the table anytime.
Here are some examples of ALTER TABLE command

Adding a primary key for the table. A primary key uniquely identifies a tuple in a relation.

Figure 2 - Alter Table MANAGER
Figure 2 – Alter Table MANAGER

Adding a foreign key for the table. A foreign key in a relation refer to primary key of another relation and cannot be null called the Referential Integrity.

Figure 3: Alter Table MANAGER for Foreign Key - Error
Figure 3: Alter Table MANAGER for Foreign Key – Error

In the above example, we receive an error because the attribute ‘DEPTNO’ is not set as primary key in relation to ‘EX_DEPT’.

After adding ‘DEPTNO’ as primary key we are able to set the foreign key for ‘MANAGER’ relation.

Figure 4: Alter Table MANAGER for Foreign Key - Success
Figure 4: Alter Table MANAGER for Foreign Key – Success

Adding a new column in a Table. In the following example, we are adding ‘PCODE‘ in the ‘MANAGER’ table.

Figure 5: Alter Table MANAGER for new column
Figure 5: Alter Table MANAGER for new column

Removing a column from a relation or table. In the following example, we are removing the ‘PCODE’ column from the ‘MANAGER’ table.

Figure 6: Removing a Column from the Table MANAGER
Figure 6: Removing a Column from the Table MANAGER

DESC <Table Name>

The ‘DESC’ command helps view the schema of the relation.

Figure 7: DESC <table_name> display the table schema
Figure 7: DESC <table_name> display the table schema

DROP TABLE <table name>

This command will drop the table permanently.

Figure 8: DROP < table_name> will delete the Table

References

Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 27-Jan-2010. Database System Concepts. McGraw-Hill Education.

Ramakrishnan, Johannes Gehrke, and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).

Wikipedia. n.d. Data definition language. Accessed March 14, 2018. https://en.wikipedia.org/wiki/Data_definition_language.

Normalization in DBMS

Designing a database structure without proper planning would cause duplicate data and problem updating the database. This will result in an inconsistent database. The process of normalization is to make an efficient database design that allow faster and efficient access while maintaining the accuracy.

What is Normalization?

Normalization is a database design technique that cuts down data redundancy and eliminates undesirable characteristics like Insertion, Update, and Deletion Anomalies. Normalization rules divide larger tables into smaller tables and attach them using relationships. Normalization in SQL’s objective is to get rid of redundant (repetitive) data and ensure data is stored logically.

Normalization is the process of reducing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion, and update irregularities. So, it helps to minimize the extra in relations Normal forms are used to eliminate or cut down excess in database tables.

Normalization is used for mainly two purposes,

•       Eliminating redundant(useless) data.

•       Verifying data dependencies make sense that is data is logically stored.

What is a KEY in SQL?

KEY in SQL is a value used to recognize records in a table uniquely. An SQL KEY is a single column or addition of multiple columns used to uniquely spot rows or tuples in the table. SQL Key is used to point out duplicate information, and it also helps establish a relationship between multiple tables in the database.

Note: Columns in a table that are NOT used to identify a record distinctive are called non-key columns.

What is a Primary Key?

A Primary Key is the minimal set of attributes of a table that has the task of uniquely identifying the rows, or we can say the tuples of the given particular table.

A primary key of a relation is one of the possible candidate keys which the database designer thinks it’s primary. It may be selected for favorable, performance, and many other reasons. The choice of the possible primary key from the candidate keys depends upon the following conditions. Rules for defining the Primary key are:-

  • Two rows can’t have the same primary key value
  • It must be for every row to have a primary key value.
  • The primary key field cannot be false.
  • The value in a primary key column can never be modified or updated if any foreign key refers to that primary key.

Why do we need Primary Key?

Suppose for any identification we need a different feature that makes it special(different) from others. Everyone has a unique specialty that makes us judge and verifies them based on that quality one has in them.

 Similarly, for differentiating a table from others and for identifying them we need a special key control that can be used to validate the records of that table and maintain uniqueness, consistency, and completeness.

Also, for joining any two tables in the relative database systems, a Primary key and another table’s Primary key are known as a Foreign Key, which plays an important role.

So, we can use this Primary Key while creating a table or change it in the database. If a Primary Key is already present in the table, then the server or system will not allow inserting a row with the same key. It helps to improve the security of database records.

What is the Foreign Key?

FOREIGN KEY is a column that creates a relationship between two tables. The purpose of Foreign keys is to maintain data constancy and allow navigation between two different instances of an organization. It acts as a double verification between two tables as it references the primary key of another table.

Foreign Key references the primary key of another Table! It helps connect your Tables

• A foreign key can have a different name from its primary key

• It ensures rows in one table have relative rows in another, Unlike the Primary key, they do not have to be special from others. Most often they don’t have.

• Foreign keys can be invalid even though primary keys cannot be empty.

Example of Foreign Key

Consider two tables Student and Department having their particular attributes as shown in the below table structure: –

In the tables, one attribute, you can see, is common, that is Std_Id, but it has different key constraints for both tables. In the Student table, the field Std_Id is a primary key because it is uniquely identifying all other fields of the Student table.

 On the other hand, Std_Id is a foreign key attribute for the Department table because it is acting as a primary key attribute for the Student table. It means that both the Student and Department table are linked with one another because of the Std_Id attribute.

Figure 1 - Foreign Key
Figure 1 – Foreign Key

Difference Between Primary key & Foreign key

Following are the main difference between the primary key and foreign key:

Primary Key      

  • Helps you to uniquely identify a record in the table.
  • Primary Key never accepts clear values. 
  • The primary key is a clustered index and data in the DBMS table are physically organized in the sequence of the clustered index.       
  • You can have the single Primary key in a table.

Foreign Key

  • It is a field in the table that is the primary key of another table.
  • A foreign key may accept multiple empty values.
  • A foreign key cannot automatically create an index, grouped or non-grouped. However, you can manually create an index on the foreign key.
  • You can have multiple foreign keys on a table.

What is the Composite key?

COMPOSITE KEY is a combination of two or more columns that uniquely identify rows in a table. The combination of columns ensures uniqueness, though individually uniqueness is not guaranteed. Hence, they are combined to uniquely identify records in a table.

The difference between the compound and the composite key is that any part of the compound key can be a foreign key, but the composite key may or maybe not be a part of the foreign key.

What is a Candidate’s Key?

CANDIDATE KEY in SQL is a set of attributes that uniquely identify substrings in a table. Candidate Key is a super key with no repeated attributes. The Primary key should be selected from the candidate keys. Every table must have at least a single candidate key. A table can have multiple candidate keys but only a single primary key.

Figure 2 - Candidate Key
Figure 2 – Candidate Key

Advantages of Normalization:

  • Normalization helps to cut down the data redundancy.
  • Greater overall database organization.
  • Data consistency within the database.
  • Much more workable database design.
  • Applies the concept of relational integrity.

Disadvantages of Normalization:

  • You cannot start building the database before knowing what the user needs.
  • The performance degrades when normalizing the relations to higher normal forms, that is 4NF, 5NF.
  • It is very time-consuming and difficult to normalize relations of a peak degree.
  • Careless decomposition may lead to a bad database design, leading to serious problems.

Database Normal Forms

Here is a list of Normal Forms in SQL:

  • 1NF (First Normal Form)
  • 2NF (Second Normal Form)
  • 3NF (Third Normal Form)
  • BCNF (Boyce-Codd Normal Form)
  • 4NF (Fourth Normal Form)
  • 5NF (Fifth Normal Form)
  • 6NF (Sixth Normal Form)
Figure 3 - Database Normal Forms
Figure 3 – Database Normal Forms

1st Normal Form (1NF)

It is Step 1 in the Normalization procedure and is considered the most basic requirement for getting started with the data tables in the database. If a table in a database is not capable of forming a 1NF, then the database design is considered to be poor. 1NF proposes a scalable table design that can be extended easily to make the data retrieval process much simpler.

For a table in its 1NF,

  1. Data Atomicity is maintained. The data present in each attribute of a table cannot contain more than one value.
  2. For each set of related data, a table is created and the data set value in each is identified by a primary key applicable to the data set.
Figure 4 - 1 NF
Figure 4 – 1 NF

The multiple values from EMP_PHONE made atomic based on the EMP_ID to satisfy the 1NF rules.

2nd Normal Form (2NF)

The basic prerequisite for 2NF is:

  1. The table should be in its 1NF,
  2. The table should not have any partial dependencies

Partial Dependency: It is a type of functional dependency that occurs when non-prime attributes are partially dependent on part of Candidate keys.

Example:

Figure 5 - 2nd NF
Figure 5 – Employee Table before 2nd NF
  • If manager details are to be fetched for an employee, multiple results are returned when searched with EMP_ID, to fetch one result, EMP_ID and PROJECT_ID together are considered as the Candidate Keys.
  • Here the manager depends on PROJECT_ID and not on EMP_ID, this creates a partial dependency.
  • There are multiple ways to get rid of this partial dependency and cut off the table to its 2nd normal form, one similar method is adding the Manager information to the project table as shown below.
Figure 6 - After 2nd NF
Figure 6 – After 2nd NF

3rd Normal Form (3NF)

3NF ensures referential integrity, cut down the duplication of data, cuts down data anomalies, and makes the data model more informative. The basic prerequisite for 3NF is,

1.      The table should be in its 2NF, and

2.      The table should not have any transitive dependencies

Transitive dependency: It occurs due to a resulting relationship within the attributes when a non-prime attribute has a functional dependency on a prime attribute.

Example:

Figure 7 - Employee Table before 3rd NF
Figure 7 – Employee Table before 3rd NF
  • In Table the PERFORMANCE_SCORE depends on both the employee and the project that he is associated with, but in the last column, HIKE depends on PERFORMANCE_SCORE.
  • The hike changes with performance and here the attribute performance score is not a primary key. This forms a transitive dependency
  • To get rid of the transitive dependency created, and to satisfy the 3NF, the table is broken as illustrated below:-
Figure 8 - After 3NF
Figure 8 – After 3NF

Boyce-Codd Normal Form BCNF

BCNF deals with the anomalies that 3 NF fails to address. For a table to be in BCNF, it should satisfy two conditions:

1.      The table should be in its 3 NF form

2.      For any dependency, A à B, A should be a super key i.e. A cannot be a non-prime attribute when B is a prime attribute.

Example:

Figure 9 - Before BCNF
Figure 9 – Before BCNF

The EMP_ID and PROJECT_ID together can fetch the Department details associated with the employee. i.e.

  • EMP_ID + PROJECT_ID → DEPARTMENT
  • (Candidate keys)
  • DEPARTMENT which is not a super key is dependent only on PROJECT_ID but not dependent on EMP_ID
  • Department → Project_ID
  • (Non-prime attribute) (prime attribute)

To make this table satisfy BCNF, we need to break the table as shown below:-

Figure 10 - After BCNF
Figure 10 – After BCNF

Here the department table is created such that each department id is unique to the department and project-related to it.

It is very important to ensure that the data stored in the database is meaningful and the chances of anomalies are minimal to zero. Normalization helps in reducing data redundancy and helps make the data more meaningful.

Normalization follows the principle of ‘Divide and Rule’ wherein the tables are divided until a point where the data present in them makes actual sense. It is also important to note that normalization does not fully get rid of the data redundancy rather its goal is to minimize the data redundancy and the problems associated with it.

Summary

So, you have learnt the normalization process for designing an efficient database. In most of the cases, the 3rd normal form is enough to create an excellent structure and we never goes to a higher normal form.

It is because of the normal forms a proper functional dependency is maintained.

Locks in Database Management

Previously, you learned about the concurrency mechanism in database. You know that the concurrency is maintained in a serialized manner, giving the impression that there is indeed some concurrency. The locking technique has to do a lot with this achievement of efficient access to database.

What is a Lock?

A lock is a variable associated with a data item that describes the status of the item concerning possible operations that can be applied to it. Accordingly, there is one lock for each data item in the database. Locks are used as a means of synchronizing the access by concurrent transactions to the database item.

Locking protocols are used in database management systems as a means of concurrency control. Many transactions may request a lock on a data item simultaneously. So, we require a mechanism to manage the locking requests made by transactions. Such a  process is called Lock Manager.

It relies on the process of message passing where transactions and lock managers exchange messages to handle the locking and unlocking of data items.

Concurrency control protocols can be broadly divided into two categories −

  • Lock based protocols
  • Timestamp based protocol

Lock-based Protocols

Database systems equipped with lock-based protocols use a method by which any transaction cannot read or write data until it acquires an appropriate lock on it. Locks are of two varieties:

  • Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
  • Shared/exclusive − This type of locking method differentiates the locks based on their uses. If a lock is obtained on a data item to perform a write operation, it is an exclusive lock.

Allowing more than one transaction to write on the same data item would lead the database into an incompatible state. Read locks are divided because no data value is being changed.

Types of Locks

Several types of locks are used in concurrency control. To implement locking concepts gradually, we need to talk about binary locks, which are simple but restrictive and so are not used in practice.

 After it shared/exclusive locks, which provide more general locking capabilities and are used in practical database locking schemes.

What are Binary Locks?

A binary lock can have two states or values locked and unlocked.

A distinct lock is related to each database item A. If the value of the lock on A is 1, item A cannot gain access by a database operation that requests the item.

If the lock value on A is 0 then the item can be accessed when requested. We refer to the current value of the lock associated with item A as LOCK (A).

 There are two operations, lock item and unlock item are used with binary locking A transaction requests access to an item A by first issuing a locked item (A) operation. If LOCK (A) = 1, the transaction is forced to wait. If LOCK (A) = 0 it is set to 1 (the transaction locks the item) and the transaction is allowed to access item A. 

When the transaction is through using the item, it issues an unlock item (A) operation, which assigns LOCK (A) to 0 (unlocks the item) so that A may be accessed by remaining transactions. Hence binary lock imposes mutual exclusion1 on the data item.

Figure 1 - Binary Lock
Figure 1 – Binary Lock

Rules of Binary Locks

Incase the simple binary locking scheme described here is used, every transaction must obey the following rules:

  • A transaction must supply the operation lock_item (A) before any read_item (A) or write, item operations are performed in T.
  • A transaction T must supply the operation unlock_item (A) after all read_item (A) and write_item (A) operations are completed in T.
  • A transaction Twill does not supply a lock _item (A) operation if it already holds the lock on Item A.
  • A transaction will not issue an unlock _item (A) operation unless it already holds the lock on item A.
  • The lock manager module of the DBMS can implement these rules. Between the Lock_item (A) and unlock_item (A) operations in transaction T, is said to hold the lock on item A. 

At most one transaction can hold the lock on a particular item. Thus no two transactions can access the same item together.

Merits of Binary Locks

  • They are simple to execute since they are effectively mutually exclusive and establish isolation perfectly.
  • Binary Locks request less from the system since the system must only keep a record of the locked items. The system is the lock manager subgroup which is a feature of all DBMSs today.

Disadvantages of Binary Locks

As discussed earlier, the binary locking scheme is too restrictive for database items, because at most one transaction can hold a lock on a given item. So, a binary locking system cannot be used for practical purposes.

One of the methods to ensure isolation of property in the transactions is to require data items to be accessed in a mutually exclusive manner. That means, that while one transaction is accessing a data item, no other transaction can make changes to that data item.

So, the most common method used to implement requirements is to allow a transaction to access a data item only if it is currently holding a lock on that item.

Thus, the lock on the operation is required to ensure the isolation of the transactions.

How Do Shared Locks Function?

  • Shared locks survive when two transactions are granted read access.
  • One transaction gets the shared lock on data and when the second transaction requests the same data it is also given a shared lock.
  • Both transactions are in a read-only mode, updating the data is not allowed until the shared lock is released. There is no difference with the shared lock because nothing is being updated.
  • Shared locks last if they need to last; it depends on the level of the transaction that holds the lock.
  • Shared locks keep up read integrity. They check whether a record is not in process of being updated during a read-only request.
  • Shared locks can also be used to stop any kind of updates of record.
  • It is represented by Lock-S which is a read-only lock.
  • S-lock is appealed using Lock-S instruction.
Figure 2 - Shared Lock
Figure 2 – Shared Lock

Example:

Consider a case where initially A=100 and two transactions are reading A. If one of the transactions wants to update A, in that case, the other transaction would be reading the wrong value.

However, the Shared lock prevents it from updating until it has finished reading!

Exclusive Locks

We should allow several transactions to access the same item A if they all access A’ for reading purposes only. However, if a transaction is to write an item A, it must have total access to A

. For this purpose, a different type of lock called a multiple-mode lock is used. In this scheme, there are exclusive or read/write locks are used.

Example:

Consider a transaction(T2) that requires updating the data item value A. The following steps take place when lock protocol is applied to this transaction.

Figure 3 - Exclusive Lock
Figure 3 – Exclusive Lock
  • T2 will acquire an exclusive lock on the data item A
  • Read the current value of data item A
  • Change the data item as required. In the example illustrated, a value of 50 is subtracted from the data item A
  • Write the updated value of the data item
  • Once the transaction is finished, the data item will be unlocked.

Differences between Shared Lock and Exclusive Lock

Shared Lock:

  • Lock mode is a read-only operation.                                   
  • Shared lock can be placed on objects that do not have an exclusive lock already placed on them.
  • Prevents others from updating the data.                         
  • Provided when the transaction wants to read an item that does not have an exclusive lock.     
  • Any number of transactions can hold a shared lock on an item.    
  • S-lock is appealed using lock-S instruction.    

Exclusive lock:

  • Lock mode is read as well as write operation.
  • Exclusive lock can only be placed on objects that do not have any other kind of lock.
  • Prevents others from reading or updating the data.
  • Provided when transaction wants to update unlocked item.
  • Exclusive lock can be observed by only one transaction.
  • X-lock is sought using lock-X instruction.

Locking operations

There are three locking operations called read_lock(A), write_lock(A) and unlock(A) represented as lock-S(A), lock-X(A), unlock(A) (Here, S indicates shared lock, X indicates exclusive lock) can be performed on a data item. appealed

A lock related to an item A, LOCK (A), now has three possible states: “read-locked”, “write-locked,” or “unlocked.” A read-locked item is also called a share-locked item because various transactions are allowed to read the item, whereas a write-locked item is caused exclusive-locked. Hence, a single transaction exclusively holds the lock on the item.

Compatibility of Locks

Suppose that there are A and B two different locking modes. If a transaction T1 requests a lock of mode on item Q on which transaction T2 currently holds a lock of mode B.

 If the transaction can be granted the lock, despite the presence of the mode Block, then we say mode A is compatible with mode B. Such a function is shown in one matrix as shown below:

SX
STrueFalse
XFalseFalse
Compatibility Graph

The graphs show that if two transactions only read the same data object they do not conflict, but if one transaction writes a data object and another either reads or writes the same data object, then they dispute with each other.

A transaction requests a shared lock on data item Q by executing the lock-S(Q) instruction. Similarly, an exclusive lock is appealed through the lock- X(Q) instruction. A data item Q can be free via the unlock(Q) instruction.

To access a data item, transaction T1 must first lock that item. If the data item is already locked by another transaction in an opposite mode, the concurrency control manager will not allow the lock until all opposed locks held by other transactions have been released. Thus, T1 is made to wait until all opposed locks held by other transactions have been released.

There are four types of lock protocols available are: –

Simplistic Lock Protocol

Simplistic lock-based protocols allow transactions to obtain a lock on every object before a ‘write’ operation is performed. Transactions may unlock the data item after completing the ‘write’ operation.

Pre-claiming Lock Protocol

Pre-claiming protocols estimate their operations and create a list of data items on which they need locks. Before starting execution, the transaction requests the system for all the locks it needs beforehand.

If all the locks are allowed, the transaction executes and releases all the locks when all its operations are over. If all the locks are not allowed, the transaction rolls back and waits until all the locks are granted.

Figure 5 - Pre-Claiming protocols
Figure 5 – Pre-Claiming protocols

Two Phase Locking 2PL

This locking protocol divides the implementation phase of a transaction into three parts. In the first part, when the transaction starts working, it seeks permission for the locks it requires.

The second part is where the transaction obtains all the locks. As soon as the transaction releases its first lock, the third phase starts. In this phase, the transaction cannot order any new locks; it only releases the acquired locks.

Figure 6 - Two Phase Locking
Figure 6 – Two Phase Locking

Two-phase locking has two phases, one is increasing, where all the locks are being acquired by the transaction; and the second phase is decreasing, where the locks held by the transaction are being released.

To claim an exclusive (write) lock, a transaction must first obtain a shared (read) lock and then upgrade it to an exclusive lock.

Strict Two-Phase Locking

The first phase of Strict-2PL is the same as 2PL. After obtaining all the locks in the first phase, the transaction continues to execute normally.

 But in contrast to 2PL, Strict-2PL does not release a lock after using it. Strict-2PL holds all the locks until the commit point and releases all the locks at a time.

Figure 7- Strict Two Phase Locking
Figure 7- Strict Two Phase Locking

Strict-2PL does not have to cascade termination as 2PL does.

Timestamp-based Protocols

The most used concurrency protocol is the timestamp-based protocol. This protocol uses either system time or a logical counter as a timestamp.

Lock-based protocols manipulate the order between the conflicting pairs among transactions at the time of execution, whereas timestamp-based protocols start working as soon as a transaction is created.

Every transaction has a timestamp related to it, and the ordering is determined by the age of the transaction. A transaction created at 0002 clock time would be older than all other transactions that come after it. For example, any transaction ‘y’ entering the system at 0004 is two seconds younger and the priority would be given to the older one.

In addition, every data item is given the latest read and write-timestamp. This lets the system know when the last ‘read and write’ operation was performed on the data item.

Timestamp Ordering Protocol

The timestamp-ordering protocol checks serializability among transactions in their conflicting read and writes operations. This is the responsibility of the protocol system that the various pair of tasks should be executed according to the timestamp values of the transactions.

Let’s assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it executes first as it is entered the system first.

The priority of the older transaction is higher that’s why it executes first. To decide the timestamp of the transaction, this protocol uses system time or logical counter.

  • The timestamp of transaction Ti is represented as TN(Ti).
  • Read time-stamp of data-item X is represented by R-timestamp(X).
  • Write time-stamp of data-item X is shown by W-timestamp(X).

Timestamp ordering protocol works as follows −

  • If a transaction Ti issues a read(X) operation −
  • If TN(Ti) < W-timestamp(X)
    • Operation rejected.
  • If TN(Ti) >= W-timestamp(X)
    • Operation executed.
  • All data-item timestamps are updated.
  • If a transaction Ti issues a write(X) operation −
  • If TN(Ti) < R-timestamp(X)
    • Operation rejected.
  • If TN(Ti) < W-timestamp(X)
    • Operation rejected and Ti rolled back.
  • Otherwise, the operation is executed.

Concurrency control is important in DBMS for handling the simultaneous execution of transactions among various databases.

 Lock Based Protocols being an essential member of the concurrency control technique enforces isolation among the transactions, preserves and maintains the reliability of the database, and resolves the disputes of read-write and write-read operations.

 In addition to Lock-based Protocols, concurrency control can also be achieved via methodologies such as Timestamp Protocol, Multiverse concurrency Protocol, and Validation Concurrency Protocols.

Advantages and Disadvantages of Timestamp Ordering protocol:

  • Timestamp Ordering protocol ensures serializability since the precedence graph is as follows:
Figure 8 - Timestamp Ordering
Figure 8 – Timestamp Ordering
  • TS protocol ensures freedom from deadlock which means no transaction ever be in waiting.
  • But the schedule may not be modified back and may not even be cascade-free.

Summary

You have learned about different types of lock and their efficiencies, drawbacks in this article. The idea of the efficient lock is to maintain faster access and keep the database consistent. This is done when the transaction is kept atomic and isolated.