Its so true in context of SQL too. A select query can have do a lot. If you talk to DB based programmer, he will be spending most of his time on getting proper select queries. Faster the query, better it is (obviously, it should provide you with right data).
The smallest possible select query is "SELECT 1;" while the lengthiest one can go any long. (I myself have seen 50+ line select query, which used 44 tables - and for obvious reason was never good).
Lets consider these two tables for the discussion.
|
|
|
|
|
|
students
|
|
parents
|
|
|
id
|
|
id
|
|
|
name
|
|
name
|
|
|
class
|
|
contact_number
|
|
|
date_of_joining
|
|
|
|
|
parent_id
|
|
|
|
|
|
|
|
|
In students table we have the details of the students. Parents table holds the details of the parents. A student can have parents details entered in the system. The parent_id in students table is mapped to parents table.
Lets try by writing how each part is used. Hope you still remember you first select query which used "select", "from" and "limit" clauses.
WHERE CLAUSE
Query 1: Get me the details of the student named Paul Matt.
mysql> select * from students where name = 'Paul Matt';
+----+-----------+-------+-----------+---------------------+
| id | name | class | parent_id | date_of_joining |
+----+-----------+-------+-----------+---------------------+
| 1 | Paul Matt | 7 | NULL | 2011-06-10 00:00:00 |
+----+-----------+-------+-----------+---------------------+
1 row in set (0.08 sec)
So here comes "WHERE" clause. A where clause is used to filter rows. name = 'Paul Matt' checks whether the name is "Paul Matt".
Query 2: Get me the details of the student named who joined after Jan 2008
mysql> select * from students where date_of_joining > '2008-01-01';
+----+-----------+-------+-----------+---------------------+
| id | name | class | parent_id | date_of_joining |
+----+-----------+-------+-----------+---------------------+
| 1 | Paul Matt | 7 | NULL | 2011-06-10 00:00:00 |
| 2 | Liz Frank | 8 | NULL | 2008-06-12 00:00:00 |
+----+-----------+-------+-----------+---------------------+
2 rows in set (0.00 sec)
Query 3: Get me the details of the student named who joined after Jan 2008 and is not in 8th grade
mysql> select * from students where date_of_joining > '2008-01-01' and class != '8';
+----+-----------+-------+-----------+---------------------+
| id | name | class | parent_id | date_of_joining |
+----+-----------+-------+-----------+---------------------+
| 2 | Liz Frank | 8 | NULL | 2008-06-12 00:00:00 |
+----+-----------+-------+-----------+---------------------+
1 rows in set (0.00 sec)
Query 2 and 3 are self explanatory explanatory. Please go through them as each has special operators
Some more where clauses:
All students in 7th, 8th or 9th grade - where class in ("7", "8", "9") // same as (class = "7" or class ="8" or class="9")
All students whose parent details are not provided - where parent_id is null // "is null" a special operator to check null values
All students whose name start with Mark - where name like 'Mark%'
LIKE and NOT LIKE is used for matching the patterns. % matches 0 or more characters and _(underscore) matches exactly one character
Note: You need to make sure you escape any special characters like quotes, slashes, etc wherever needed.
Try yourself :
P1: All students' name who has parent details are entered and is in grade 8 and name ends with "Eddie", roll number is even and joined in 2009.
Now lets have more fun.
DISTINCT
Distinct is a keyword used to list unique result set.
Query 4: Get all different name among students
mysql> select distinct name
from students
+-----------+
| name |
+-----------+
| Paul Matt |
| Liz Frank |
+-----------+
2 rows in set (0.00 sec)
In this case, if there a second another Liz Frank, her name wont be listed.
Note: distinct has to be specified before specifying any column. and it applies to all fields. If you have id field, as id is unique, might not do the desired
JOINS
Join clause is what makes most select queries complex. What is join? To be simple, its combining a table other tables to fetch all the details.
Query 5: Get me name of all students and their parents.
mysql> select parents.name, students.name
from students
join parents on parents.id =students.parent_id;
+--------+-----------+
| name | name |
+--------+-----------+
| Smith | Paul Matt |
| Muller | Liz Frank |
+--------+-----------+
2 rows in set (0.00 sec)
The join clause combines students and parents tables using ON clause. ON clause is similar to WHERE clause except that defines relation between tables. If ON clause is not mentioned, it will join all students records against all parents records.
Have you found any issue here? If the parents details is not present, the student details wont be listed. How to fix that.
mysql> select parents.name, students.name
from students
left join parents on parents.id =students.parent_id limit 2;
+--------+-----------+
| name | name |
+--------+-----------+
| Smith | Paul Matt |
| null | Francy Linc|
+--------+-----------+
2 rows in set (0.00 sec)
Now we have added a LEFT infront of join. What does it mean? Yes, you got it right. All records in the left table(table mentioned first) will be listed even if it doesnt have a matching record in the right table based on the ON clause.
This query can re-written using a RIGHT JOIN also. Its so self-explanatory
mysql> select parents.name, students.name
from parents
right join students on parents.id =students.parent_id limit 2;
Note: If LEFT or RIGHT is mentioned, JOIN is considered as INNER JOIN
Its always advisable to use a LEFT or INNER JOIN based on what you need(MySQL converts RIGHT joins to LEFT join by re-structuring the query)
As we have reached this long, I will introduce the concept of ALIAS.
mysql> select p.name as p_name, s.name as s_name
from students as s
left join parents as p on p.id =s.parent_id limit 2;
+--------+-----------+
| p_name | s_name |
+--------+-----------+
| Smith | Paul Matt |
| null | Francy Linc|
+--------+-----------+
2 rows in set (0.00 sec)
If you check the query above you will see the difference. After table names, we have used an "AS <alias_name>". So instead of saying table names everywhere, you can use the alias name. As you proceed further, you will know how useful its going to be.
Now lets do a tricky one.
Query 6: Get me all siblings in the school.
The assumption we have to make is, for siblings parent_id will be same.
mysql> select s1.name, s2.name
from students as s1
inner join students as s2
on s1.parent_id = s2.parent_id and s1.id < s2.id
where s1.parent_id is not null;
+-----------+------+
| name | name |
+-----------+------+
| Paul Matt | Zony |
+-----------+------+
1 row in set (0.00 sec)
What have we done here? We are joining the students table with itself such that two students have the same parent (s1.parent_id = s2.parent_id). s1.id < s2.id is used to make sure only once a row sibling is picked, otherwise you might get 3 rows as Paul Matt-Zony, Paul Matt-Paul Matt and Zony-Zony. And finally where clause make sure any student without parent details will be considered for listing( otherwise all students whose parent details are not entered will be siblings!!)
ORDER BY
It defines the order in which the data has to be listed. For example, you would like to get the student listing sorted by recently joined, and if two students joined on same day, first name should come first
mysql> select s.name, s.date_of_joining
from students as s
order by s.date_of_joining desc, s.name ascending
Ordering can be either ascending(asc) or descending(desc). If ordering is not mentioned, its considered as ascending. (Ordering can be performed on any column from the table, any computed field in selected fields and any computed expression. You will see more of it in future)
LIMIT
MySQL has a non-sql standard for LIMIT clause. Limit is often followed by 1 or 2 numbers both being non-negative. Examples are:
LIMIT 10 - fetches the first 10 matching records (same as LIMIT 0,10)
LIMIT 20, 10 - fetches the 10 matching records starting from offset 21
GROUP BY... HAVING
All clauses so far pretty straight forward. Group by and having clauses are not understood by most people. GROUP BY is used as it says, groups multiple rows. Lets crack them with examples:
Query 7: Get me number of children each parent have
mysql> select p.name, count(*)
from parents as p
inner join students as s on s.parent_id = p.id
group by p.id;
+--------+----------+
| name | count(*) |
+--------+----------+
| Smith | 2 |
| Muller | 1 |
+--------+----------+
2 rows in set (0.05 sec)
COUNT(*) is called an aggregate function(it is applies after grouping the together rows.) in MySQL and returns the count of matching rows. Here, we are grouping on p.id(parent_id) and joining the students table on parent_id fields. And we want to get the number of matching rows for the parent.
Some of the other commonly used aggregate functions are - SUM, AVG, MIN, MAX.
For example, you want to get the when his child joined school, then MIN(date_of_joining) can be used.
You might think of what will happen if I dont use a GROUP BY clause while using aggregate function. It will search across matching rows considering it as single group.
mysql> select count(*)
from parents as p
inner join students as s on s.parent_id = p.id
Will provide you number of students whose parent details are available.
Now we have the trickier part. HAVING - its very similar to WHERE, except that it is applied after aggregation.
Query 8: Get me number of children each parent have if they have atleast two and atleast one joining before 2008
mysql> select p.name, count(*)
from parents as p
inner join students as s on s.parent_id = p.id
group by p.id
having count(*)>=2 and min(date_of_joining)<'2008-01-01'
;
+--------+----------+
| name | count(*) |
+--------+----------+
| Smith | 2 |
| Steven | 3 |
+--------+----------+
2 rows in set (0.05 sec)
min(date_of_joining)<'2008-01-01' make sure there is atleast one student joined before 2008.
More to try for yourself:
P2:
Get me the parents who have both kids of both gender(assume you have a gender column in students table)
Tip: count(distinct gender)
P3:
Get me the count of students who have parent's details and who does not.
IMPORTANT NOTE
The execution of query always happens as
1. FROM AND JOIN clauses
2. WHERE clause
3. GROUP BY
4. HAVING
5. ORDER BY
6. LIMIT
If you have an order by and a group by, ORDER BY is applied only after grouping.
mysql> select s.name, parent_id
from students as s
group by parent_id
order by s.id
So in the above query, you might not get the name of the first id , but it just picks first records it finds while grouping and sort the results on id.
Answers
P1:
mysql> select name from students where parent_id is not null and class = '8' and name like '%Eddie' and id%2=0 and date_of_joining >= '2009-01-01 00:00:00' and date_of_joining < '2010-01-01 00:00:00';
If you are interested explore between operator , " <field> between <value1> and <value2> "