We use SELECT-FROM-WHERE kind of statement to run queries that get us specific rows from a relation. In the select statement, WHERE is the condition that returns specific rows from a relation.
What if we want to group the information? The GROUP BY and HAVING clause helps you to group the resultant rows by a specific column. GROUP BY and HAVING clause is used with aggregate functions like Count, Max, Min, Sum, etc;
This is easy to understand with an example.
Step 1:
Create a relation for students – Student ( rollno, name, age);
CREATE TABLE STUDENT (ROLLNO NUMBER(5) PRIMARY KEY,NAME VARCHAR2(30), AGE NUMBER(3), GRADE CHAR(2));
STEP 2:
Enter details of at least 10 students in to the STUDENT relation.
INSERT INTO STUDENT VALUES(50001,'DHANUSH',39,'A');
INSERT INTO STUDENT VALUES(50002,'RAJNI',29,'D');
INSERT INTO STUDENT VALUES(50003,'VIKRAM',32,'S');
INSERT INTO STUDENT VALUES(50004,'VADIVEL',25,'D');
INSERT INTO STUDENT VALUES(50005,'PRASAD',37,'S');
INSERT INTO STUDENT VALUES(50006,'SHANKAR',21,'A');
INSERT INTO STUDENT VALUES(50007,'RAM',33,'A');
INSERT INTO STUDENT VALUES(50008,'KARTICK',22,'C');
INSERT INTO STUDENT VALUES(50009,'SHAKTI',35,'B');
INSERT INTO STUDENT VALUES(50010,'SIMBU',20,'B');
To check the entered value in the table, you can run following query against the STUDENT relation.
SELECT * FROM STUDENT.
STEP 3:
You can run two simple queries to understand the difference between a query without group by and a query with GROUP BY clause. First, you want a student with minimum age from the STUDENT relation using an aggregate function MIN().
SELECT MIN(S.age) FROM STUDENT S;
You can add WHERE clause, but we want query without any conditions. It will produce the following results.
You can see that there is only one person in an entire student relation whose minimum age is returned.Suppose you want to write a query to get the minimum age of student by student Grade. It means this statement “What is the minimum age of students who got A’s ?” or “What is the minimum age of students who got B’s?” and so on.
Let’s run the following query that will get you the minimum age grouped by student GRADE.
SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE;
What is the Difference ?
In the first query, we treated the entire STUDENT relation as one group and that’s why we only have a single value for the whole relation.In the second query, we grouped STUDENT relation by student grade and for each of these grade got minimum age.
The HAVING clause
The HAVING clause is qualification for the Group By clause. It means you are putting more conditions on the resulting rows from GROUP BY clause.
SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE;
You know that it will return a list of minimum age by student grade, but suppose you want to see only minimum age that are greater than or equal to 30.
SELECT S.GRADE, MIN(S.age) FROM STUDENT S GROUP BY S.GRADE HAVING MIN(S.age) >= 30;
The result is only one row whose minimum age is greater than or equal to 30 but grouped by student grade.
Group by cannot have duplicate values and the general format of the query is given below.
SELECT [DISTINCT] select list
FROM from-list
WHERE qualification
GROUP BY group-list
HAVING group-qualification
Source: Database Management Systems – Raghu Ramakrishan
You want to understand the GROUP BY and HAVING clause, then create similar examples and run them against the relations.
References
Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 27-Jan-2010. Database System Concepts. McGraw-Hill Education.
Ramakrishnan, Johannes Gehrke, and Raghu. 1996. Database Management Systems. McGraw Hill Education; Third edition (1 July 2014).