Saturday, February 23, 2013

More On Select Queries

We have gone thru Select queries fundamentals earlier. You can revisit the same again here. Its time to dig deeper into more what you can do with Select.

Lets try to refresh with a query.
Query 1: To get the name of all parents who has two or more girls in the school.
select p.name 
from parents as p 
inner join students as s on s.id = parent_id 
where gender = 'F'
group by p.id having count(*)>2;

Lets now see select query inside query - sub-queries

Query 2: Find the list of students who has no parent details in the system
There are two approaches - The common approach is using sub-query with not in clause


select s.name
from students as s 
where s.parent_id not in ( select id from parents)
Query is like first identifying the all rows from students table and checking with list of all the parent-ids to identify the required records. But if you analyse more, the sub-query is ran for each matching row. And it can be done better with join

select s.name
from students as s 
left join parents as p on p.id = s.parent_id
where p.id is null
For those who the query is confusing, ignore the where clause first.It will list all the students and gets parent ids, if they have on (we are using left join). Now we add the where clause, its saying to ignore if parent id is mapped. Now the time take for the query will be lesser. But there can be cases where the sub-queries can  be useful.

Just for the sake of knowledge, I will show you another query - but its heavy and I would say, never ever use it.
Query 3 : List all students with number of students from the family currently studying in the school

select s.name,
(select count(*) from students as s1 where s.id = s1.id) as students_from_family
from students as s 
Here again the select sub-query is ran for each matching row.This could again be better executed using a JOIN query.
select s.name, count(s1.id)
from students as s 
left join students as s1 on s1.parent_id = s.parent_id
group by s.id

Now lets look into another type of query - derived-query. It uses a select query inside FROM or JOIN clause.
Query 4: To get the count of parents and students where parent_id is greater than 4

select s.name, count(s.id), p.name
from students as s 
left join (select id, name from parents as p where p.id <4) as p 
on p.id = s.parent_id
We can again get the result without join but its good to keep all these weapon with you. If you do an explain, you will notice there is a Temporary table creation involved here(row 2)

+----+-------------+------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | PRIMARY     | s          | ALL   | NULL          | NULL    | NULL    | NULL |    5 |             |
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |    2 |             |
|  2 | DERIVED     | p          | range | PRIMARY       | PRIMARY | 4       | NULL |    2 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+-------------+


I would rather suggest to create your own Temporary Table, execute the query by joining your tables and finally dropping the Temporary Table.

 create temporary table temp_parents(
 id int(11) not null,
 name varchar(30),
 index(id)
 )
 select id, name
 from parents 
where id > 4;



select s.name, count(s.id), p.name
from students as s 
left join temp_parents as p 
on p.id = s.parent_id;

delete temporary table temp_parents;

The advantage of creating your own temp table is, we can define your own indexes and optimize the query. Also as your join query is ran on temp table, the original table huge table wont be locked during your query.

Note: Temp tables exists only with in session - ie same table name but with different content can exists across 2 session and will expire soon after the session is closed
In the above create table query, we have combined both CREATE and INSERT into single query. See more options of  CREATE TABLE here

These Temp table is approach is useful especially when you parent table is huge (with millions of records) and your join clause needs to match only few thousands  rows from the huge table.