Understanding GROUP BY and HAVING Clause

We use SELECT-FROM-WHERE kind of statement to run queries that get us specific rows from a relation. In the select statement, WHERE is the condition that returns specific rows from a relation.

What if we want to group the information? The GROUP BY and HAVING clause helps you to group the resultant rows by a specific column. GROUP BY and HAVING clause is used with aggregate functions like Count, Max, Min , Sum, etc;

This is easy to understand with an example.

STEP 1:

Create a relation for  

Student ( rollno, name, age);
CREATE TABLE STUDENT (ROLLNO NUMBER(5) PRIMARY KEY,NAME VARCHAR2(30), AGE NUMBER(3), GRADE CHAR(2));

 

Figure 1: Create Table STUDENT
Figure 1: Create Table STUDENT

STEP 2:

Enter details of at least 10 students in to the STUDENT relation.

INSERT INTO STUDENT VALUES(50001,'DHANUSH',39,'A');   INSERT INTO STUDENT VALUES(50002,'RAJNI',29,'D');   INSERT INTO STUDENT VALUES(50003,'VIKRAM',32,'S');   INSERT INTO STUDENT VALUES(50004,'VADIVEL',25,'D');   INSERT INTO STUDENT VALUES(50005,'PRASAD',37,'S');   INSERT INTO STUDENT VALUES(50006,'SHANKAR',21,'A');   INSERT INTO STUDENT VALUES(50007,'RAM',33,'A');   INSERT INTO STUDENT VALUES(50008,'KARTICK',22,'C');   INSERT INTO STUDENT VALUES(50009,'SHAKTI',35,'B');   INSERT INTO STUDENT VALUES(50010,'SIMBU',20,'B');

To check the entered value in the table, you can run following query against the STUDENT relation.

SELECT * FROM STUDENT.

 

Figure 2: Instance of relation - STUDENT
Figure 2: Instance of relation – STUDENT

 STEP 3:

You can run two simple queries to understand the difference between a query without group by and a query with GROUP BY clause. First you want student with minimum age from the STUDENT relation using an aggregate function MIN().

SELECT MIN(S.age) FROM STUDENT S;

You can add WHERE clause but we want query without any conditions.It will produce the following results.

Figure 3: Student with Minimum Age
Figure 3: Student with Minimum Age

You can see that there is only one person in entire student relation whose minimum age is returned.
Suppose you want to write a query to get minimum age of student by student Grade. It means this statement “What is the minimum age of students who got A’s ?”  or “What is the minimum age of students who got B’s?” and so on.

Let’s run the following query that will get you the minimum age grouped by student GRADE.

SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE;

 

Figure 4: Student over minimum age grouped by Grade
Figure 4: Student over minimum age grouped by Grade

What is the Difference ?

In the first query, we treated the entire STUDENT relation as one group and that’s why we only have single value for the whole relation.

In the second query, we grouped STUDENT relation by student grade and for each of these grade got minimum age.

The HAVING clause

The HAVING clause is qualification for the Group By clause. It means you are putting more conditions on the resulting rows from GROUP BY clause.

Advertisements


SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE;

You know that it will return a list of minimum age by student grade, but suppose you want to see only minimum age that are greater than or equal to 30.

SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE HAVING MIN(S.age) >= 30;

The result is only one row whose minimum age is greater than or equal to 30 but grouped by student grade.

Figure 5: Student grouped by Grade and Having minimum age > 30
Figure 5: Student grouped by Grade and Having minimum age > 30

Group by cannot have duplicate values and the general format of the query is given below.

SELECT [DISTINCT] select list   FROM from-list   WHERE qualification   GROUP BY group-list   HAVING group-qualification

Source: Database Management Systems – Raghu Ramakrishan

You want to understand the GROUP BY and HAVING clause, then create similar examples and run them against the relations.

 


Bibliography

Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 27-Jan-2010. Database System Concepts. McGraw-Hill Education.

Ramakrishnan, Johannes Gehrke and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).

 

Advertisements