Thursday, January 31, 2013

Common Functions and Expressions

Lets now peep into common function in MySQL and probably when they are uses.

Comparison Operators

Operator
Operation
Usage
Comments
=
Equal
a=b
true if equal, else false
!= or <>
Not equal
a!=b or a<>b
true if not equal
<
Less than


<=
Less than or equal


> 
Greater than


>=
Greater than or equal


IN
equal to any one in list
a in (1,2,3)
true if a=1 or a=2 or a=3
LIKE
string matching
matches the pattern
strings starting with C followed by a character, then d followed by any number of characters
BETWEEN … AND
searches in a range
a between 5 and 10


More if interested at : http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html

Logical Operations

Operator
Operation
Usage
AND , &&
Logical AND two expressions
1=2 AND 5>6
OR , ||
Logical OR

NOT, !
Negative check
NOT a and b
!c
XOR
Logical XOR

Control Functions

if - if(expr_to_check, true_expr, false_expr)
eg: 
if (5 > 3, "yes", "no")  => returns "yes"

ifnull - ifnull(expr_to_check, expr_if_first_expr_empty)
eg:
if(5, 10) => returns 5
if(5 and NULL, "was null") => returns "was null" (5 AND NULL evaluates to null)

case .. when
This is similar to switch..case in programming languages
eg:
case 1+3
when 4-2 then "4-2"
when 6-2 then "6-2"
when 4 then "4"
else "else"
end  
 => return "6-2" (it wont go to case 4 as, case 6-2 already matched)

Common Functions
concat 
eg: concat( "hello", " ", "world") => returns "hello world"

trim, rtrim, ltrim - Trim white spaces in a string
eg: trim(" a ") 

date_format 
eg: date_format(now(), '%Y-%m-%d') => returns "2013-01-31"

adddate / subdate 
eg: adddate('2012-01-20', interval 5 day) => return '2012-01-25 00:00:00' 
interval is a keyword followed by no:of units and unit to operate

cast/convert - for type casting
eg: cast("124as" as unsigned) / convert("124as", unsigned) => returns 124
You might have noticed by now various default MySQL type casting happening here.


Some Tips
  • If you are having some indexes, to use the advantage of it, do not use it with function.  For example convert(id as char)= "10" will not make use of the index on id
  • While using like, if search pattern starts with %, the index could not  be used.
  • concat(col1, col2) returns null if either one of them is null
  • MySQL does internal type casting, but could be touch slower(as indexes might not be utilized) if types does not match 
By now we have seen just the basics of running select queries, modifying data and some basic DB schema design. Its time to move to phase 2 where we will get deeper into everything we have seen so far. 

Keep reading :)

Normalization

Normalization

Now comes a lot of rules and constraints, we call it Normalization. It is again applied at various level and are numbered as below

1NF
The first normal form asks the attributes to be atomic - means a column should contain only one piece of information. We should not save the say employee ID card number and employee number in single column separated by some separator. The schema we have is already in 1NF
2NF
The 2NF is a rule on primary key. No column should be dependent on subset of primary key. This will be maintained in most cased with the help of auto-increment id acting as primary key

3NF
The 3NF says, all the values in a table should be fully dependent on the primary key only. Or in other words,  a column in a table should not be obtained based another column(s) in the same table. So in our case employee_id/card_number in  swipe details and attendance register are mutually dependent. If you know one, you can get the other from employee entity. So we will have to remove employee id or card number. Employee id being the key identifier for employee table, I would prefer to save employee id over card number(also card number is string which occupies more space and searching is slower compared to searching numbers). 

When I started career, I always tried to do normalization. But now I understand, these are guidelines and not rules. More you learn the database and your application, you will find an excuse not to follow these. If you are interested, read more on the same.

Wednesday, January 30, 2013

Designing Your First Table

Till now, I have been using the schema someone before you have created. Its time to do a simple DB design for yourself. Lets suppose we have to make a simple DB design for an employee attendance tracking application. So application need to save the details of employee and daily attendance timing. 
While DB designing anything, the following steps are usually
  • Understand the system
  • Identify the key entities and what all information need to be saved
  • Define the first schema
    • Try with other possible structures if required
    • Re-define based on expected usage and behaviour of application
  • Add indexes based on expected usage

Understanding the System

We can assume that the attendance is tracked using swiping machines which are located at the entrance and the exit. An employee can swipe in and swipe out any number of times on a given day. A users first swipe-in and last swipe-out will be used to make sure he was present. He can be present(8+hours), half day(4-8hours) or leave based on his availability at office on a given day.


Identifying the key entities

So here most important entity is employee. Each employee has name, email, gender, and ID card number (Lets bother only about these many information as of now). Now we have swipe in and out. There are two approaches here - 1. considering swiping in/out as entity or 2. attendance itself as an entity.  Lets us consider both cases, we are saving all swipe ins and outs plus direct attendance information. If swipe-in and swipe-out will have a time associated with it. Attendance will have first swipe-in and last-swipe out along with derived information of hours at office and his presence type. 

Defining First Schema

Lets define for each entity the attributes possible and their data types

Employees
id - Int - Auto-increment id generated in the system(internally we will refer any employee using id)
name - string - user name is string with max size as say 100 characters
ID card-number - string - Each employee has a unique card number, but once the employee exits the company the same card number can be allocated to a different employee
gender - enum - an employee can be male or female(we can assume be default employees are female)
email - string - optional and can have max of 50 characters

Swipe Details
id - Int - Auto-increment id generated in the system
time - timestamp, we can gracefully assume that time at which the entry is made is same as swipe time. (Default  value has to be now())
swipe_action - enum - swipe in or swipe out
card_number - string - card number mentioned in the card
employee_id - employee id for whom the card belongs (captured from employee table)

Attendance Register
id - Int - Auto-increment id generated in the system
swipe_in_time - timestamp, we can gracefully assume that time at which the entry is made is same as swipe-in time. (Default  value has to be now())
swipe_out_time - timestamp - time of last swipe-out
card_number - string - card number mentioned in the card
employee_id - employee id for whom the card belongs (captured from employee table)
time_at_office - time - time for which he was present
attendance_type - enum  - can be present, half day or leave

The need to maintain attendance register alone and/or swipe details alone depends on you need. Attendance register cannot give all swipe information, but if you record all swipes, you may look at as junk data. If you look other way round, if you want to store all swipes, attendance_register is redundant data; again based on your application, it might be good to have a register for quick report generation. 

Trade-Offs In Data Type & Column
  • Integer, String and float number selection are quite obvious if you are familiar with some programming language and I dont want to spend much time on the same.
  • Enum Vs TinyInt - You can define an Enum which can also be achieved using integers. If you use integers, it lack readability and also all applications using that column should have a mapping for what each number stands for. But if you use number, its easy to add a new value(To allow a new value to an existing ENUM field, you have to alter table, which might be time consuming if its a big table). In this case itself, suppose an employee came to office on a weekend and left in 3 hours, he will marked as leave and it was decided to introduce a new status saying "holiday" so that. 
  • Datetime Vs Timestamp - As said in earlier post, timestamp is internally saved as a number and is always better than a string base datetime.
  • Text Vs Blob - Both fields saves long text. For now you can take it like this - searchable text field refers to text and in other cases blob(blobs usually used for binary data chunk)
  • Employee Id Vs Card number in attendance register - The question is strictly based on you load. If you have such a huge employees list, on each swipe, it might not be possible to get the employee id from employees table(as it might result in too many hits on you employees table also). If you are expecting huge reports on attendance, you can think of some fetching and updating employee id once in a day to make reports faster. So its all based on your application you have to make the decision.
Some common fields I recommend
As someone said, "only change is  permanent", so is you application. It always good to leave space for future when u design today. 
  • id - Usually for tables we can go for unsigned int. If you are expecting huge data, bigint would be good.(1billion records in DB not that often, so I would say int)
  • Audit fields - In any table, I would always recommend these 4 fields - created_on, created_by, last_modified_on, last_modified_by(wherever possible). That is quick reference log, even if you have audit logs, these are good to have
  • status - In previous post on delete, I said never to DELETE, but to use a status field which is usually by default 1(active). I would recommend it to be tinyint rather than boolean - again for purpose. 
  • additional_info - This column is not available in all my DBs, but I would use it whenever there is a space of some additional info coming. Assume in our employee table we need to add blood group today and then comes request for hobbies next day, then home town and so on. If I am sure, these are just some readonly data and no search will be done on these details(as indexing wont work in this case), I would json_encode and dump the data into that column. This is violation of 1NF and  recommended only if you senior approves this :)
Note: Naming convention for MySQL tables and columns is usually words separated by underscore(my_table, column_name_1) but camel case is also good

Selection Of Indexes

Which all columns should be indexed. In normal case, I would say any column you are expecting a search, add an index. By searching, I include the columns used for join.
I would make any column(s) expected to be unique in the system, be indexed as unique and reduce the pain on application for checking if it already exists or not. Here I would recommend employee card number to be unique in employees table. Also adding a column of type date to attendance_register and making it unique along with employee_id would be good(so one record per employee per day)

Note: 
Indexing on columns like status might not actually serve the purpose. If the search yields more than 30% of data, mysql prefers full table scan to using the index usage
If there are two indexes on column_1 and column_2 and if the search is on both 2 columns, mysql will use one of these or both indexes based on necessity

Now lets see what we have come up










employees

swipe_details

attendance_register


id

id

id


name

swiped_on

swiped_in


card_number

swipe_action

swipe_out


gender

card_number

card_number


email

employee_id

employee_id


status



date


created_on



time_at_office


created_by



attendance_type


modified_on






modified_by






additional_info












Query

So lets now write our first create table query:

create table employees(
id int(11) unsigned not null auto_increment,
name varchar(100) not null,
card_number varchar(50) not null,
gender enum('Male', 'Female') default 'Female',
email varchar(100),
status tinyint(3) unsigned  not null default 1,
created_on timestamp default current_timestamp,
created_by int not null default -1,
modified_on timestamp null,
modified_by int null,
additional_info text,
primary key (id),
index indx_name(name),
unique index indx_card_number(card_number)
);

For the sake of explanation, I would give a brief explanation

Generic Syntax for column specification:
<column_name> <data_type[(<size allowed>)]> [[NOT ]NULL] [DEFAULT <value>]
column_name can be alphanumberic starting with non digit(if you give column name inside ``, you can infact give anything as column name- keywords, quotes, slashes, so be careful
datatype and size defines how much memory it takes. For int fields, size is for only display purpose, doesnot effect actual memory
null - means the column can have null values
default - this defines the default value

Generic Syntax for index specification

<index_type> [index_name](<col1>[, col2[,col3..]])


Now try to create other table for yourself. Feel free to make changes based on the application you can think of.

Note: I have purposefully removed concept of normalization from here feeling its too much for first table and I have got into habit of violating it :)

Sunday, January 27, 2013

Writing Data to Tables

So far we have been reading data from existing database. Its time to start how the data came into the tables.

INSERT

Insert, as the name indicates adds new records to a table. The most common and simple syntax is
INSERT INTO <table_names>
[(col1[,col2 [,col3..]])]
VALUES
(value1[,value2 [,value3..]])
[,(value1[,value2 [,value3..]])]
;

Lets looks a real example. 
mysql> insert into students
 (name, class, date_of_joining) values 
("Prior", "8", now());
Query OK, 1 row affected (0.06 sec)

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                5 |
+------------------+
1 row in set (0.05 sec)

mysql> select * from students where id = 5;
OR
mysql> select * from students order by id desc limit 1;
+----+-------+-------+-----------+---------------------+
| id | name  | class | parent_id | date_of_joining     |
+----+-------+-------+-----------+---------------------+
|  5 | Prior | 8     |      NULL | 2013-01-27 21:48:20 |
+----+-------+-------+-----------+---------------------+
1 row in set (0.00 sec)

So lets analyse it. First line says, insert into students so we are selecting the table. Now we specify the  columns to which we are going to update the data. Then after the VALUES keyword, we specify all the column values. You can insert multiple rows also using a single insert query.

Note : You might have noticed now() being used for date_of_joining. It returns the current timestamp.

In the query we have not passed the id, and the MySQL has auto generated the value for id based on the current max id in the table. So to retrieve the that id, we use the special function last_insert_id(). 
Note: The function is not related to a table but to a connection, ie irrespective of the table and database, it returns the last id it has created.

Third query is quite obvious, verifying the data.

UPDATE

Now, Prior got promoted to 9th grade. So basically we have to the class for him.

mysql> update students 
set class="9", date_of_joining=now() 
where id = 5;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from students where id = 5;
+----+-------+-------+-----------+---------------------+
| id | name  | class | parent_id | date_of_joining     |
+----+-------+-------+-----------+---------------------+
|  5 | Prior | 9     |      NULL | 2013-01-27 21:48:20 |
+----+-------+-------+-----------+---------------------+
1 row in set (0.00 sec)

The UPDATE query is similar to SELECT query as it has table selection(can select multiple table using index) and has a where clause. An update cannot have GROUP by clause.

So here again I am breaking the query to 3 parts. UPDATE students, select the table to apply the update statement. Then comes the SET keyword, followed by columns to be updated. A comma separated list of "columns=value" can be given. Followed by optional WHERE clause.

You can use an update on multiple tables as well.
mysql> update students as s 
left join parents as p on p.name =  "Smith"
set class="9", date_of_joining=now(), s.parent_id = p.id 
where s.id = 5;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Hope you can understand this query. We are updating the parent details of Prior with Smith.

Before running an update query, its always good to run a select query to identify the updating rows. Its simple to do. 
Step 1: Replace "update" with "select * from"
Step 2: Remove text from "set" till where.

So the above query will look like
mysql> select * from students as s 
left join parents as p on p.name =  "Smith"
where s.id = 5;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Its even better, if you do an explain also before running the select query also.

DELETE

On any given day, I would say no deletion of data from your database. If you dont need the data, add a column status/active/enabled(or whatever u like) and set the value accordingly, effectively using only an update query.

For sake of information, deletion can be done as:

Prior has completed the course and leaving school now. So, we would like to remove his information. 

mysql> delete from students where id = 5;
Query OK, 1 row affected (0.01 sec)

mysql> select *  from students where id = 5;
Empty set (0.00 sec)

Its exactly same as SELECT, except that it cannot use GROUP BY.

We can also delete from only specific tables when you are using join.

mysql> delete s.* 
from students as s 
INNER join parents as p on s.parent_id = p.id 
where s.id = 5;

This will select records from both students and parents but will delete only from s.*(students table).By default all tables will be cleared.


Right now we have seen only the basic INSERT, UPDATE and DELETE queries. More complex things are yet to come and we will explore it later.

Indexes

What is an Index?

Consider a dictionary in which words are not arranged in alphabetic order and you want to search for a word. There is no easy way to get the find the word. Same is true about database. Most common method of index is B-Tree(with leaf nodes having the data and intermediate nodes for navigation, read more if you are interested, but not relevant for proceeding with MySQL). Any index basically makes searching of data fast. So if you want to search for an id = 5, it searches through the B-tree(or whatever indexing data-structure is) and gives result without scanning whole table to identify which record has id = 5.

Types of Indexes

There are 4 major types of indexes in MySQL. 
Primary  
Primary index or primary key is value that can uniquely identify a row in a table. In a table there can be only one primary key. Usually id is the field used as the primary key for the table. Most databases has an option of making the primary key auto_increment field. So you dont have to bother about what is the next value of the id while adding a new record, MySQL will take care of it by itself. As a primary key need to uniquely identify a row, it can never have a NULL value. Its always good to have a primary key in a table, preferably as id(but its not a rule, but a guideline)
Note: Only primary key field can have auto-increment option.

Index
A generic index is used when you are expecting a lot of search is expected on a column. Assume the column student_name. We need to search on student name quite often but it cannot be a primary key( two students might have same name). So we add an index to the column and make search on the column fast.

Unique 
Unique index is similar to primary key, except that it can accept NULL values. For example, if a students table has an examination roll number column, no two student have same roll number but if he has not registered for an exam yet, he wont have a roll number. So any column which is having a NOT NULL and UNIQUE index can be considered to a primary key.

Full Text
Its similar to generic index but is usually applies on text/string columns to enable search across whole string.

Foreign Key
Lets think as if we have not seen that index for now. Its used to define relationship between two tables. For example, the parent_id in students table need to be a value defined in parents table, otherwise it has no meaning. We will leave the details for time being

Data Storing

In MySQL, the data is stored in the hard-disk in the same order as your primary key. For any other index, it will created a another B-tree with leaf notes having pointers to actual physical memory at the leaf notes.

Multi-Column indexes

You can always have index across multiple columns, ie Suppose you have a student has a roll number in his particular class. So the roll number is unique in each class. There can be roll number 1 in each class, but there can not be same roll number in same class. So we can create a unique index on class and roll number.

Digging one step ahead. The order in which the columns are specified in multip;e-column index is very important. Let us take a same example. There is a separate attendance register for each class and student details are listed based on roll numbers. Now you want to find the student details of student in Grade 7 with roll number 18. Pick the 7th grade register and search quickly to roll number 17. 
Now you want to find the details of all students in in 8th grade, with 70% attendance. You can define your search register of class 8 here.
Now you want to find the first name in all the classes. Now your have to search through all classes(in database terms, a full table scan). If you had the index as roll number and class(with roll number one of all classes in a register), you could have done a quicker search. Hope you got the point.

View Indexes on a table

There are two common ways of viewing the indexes on a table

SHOW INDEX
mysql> show indexes from students;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+-------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| students |          0 | PRIMARY  |            1 | id          | A         |        4 |     NULL | NULL   |      | BTREE      |         |               |
| students |          1 | name     |            1 | name        | A         |        4 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.04 sec)

Table - Table on which the index is applies
Non_Unique - whether the index is unique or not
Column_name - name of the column on which an index is applied, If a multi-column index is defined, it each column will be separately listed
Index_type - the type of index being applied.
Cardinality - Its the count of range of values the column is having. Higher the cardinality, better the search is

SHOW CREATE TABLE

mysql> show create table students \G
*************************** 1. row ***************************
       Table: students
Create Table: CREATE TABLE `students` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  `class` varchar(10) DEFAULT NULL,
  `parent_id` int(10) DEFAULT NULL,
  `date_of_joining` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

The command is used to view how the table was created. It has the details of each column at the beginning. But you can focus on the bold text. It shows the different indexes on the table. So the format of index is 
<index_type> <index_name>(columns_indexed_as_csv)

EXPLAIN

MySQL provides you with an explain command to make sure wont choke the DB server that fast. "EXPLAIN" prefixed to a select query will show you the plan. 

mysql> explain select * from students where id < 3;
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref | rows | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | students | range | PRIMARY       | PRIMARY | 4       | NULL |    2 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.04 sec)

Here the columns you need to be careful about are:
Possible_keys - it hows the list of keys that could make the search fast. As index on name doesnot help us when we are searching on id, its not a possible keys.
Key- The key which is being used from the set of possible values.
Rows - Number of rows expected to match the criteria(its an estimate, need not be accurate)
Remaining columns we will leave for the time being. 

Suppose, if its not using an index or if you think its using a wrong index or if rows matching is too large, the query can be expected to be time consuming and need to be avoided.

USE/FORCE INDEX

Be careful using these clauses.

So suppose, when you did the above query, you observed MySQL is using name index instead of primary key and resulting in more rows than expected. MySQL provides an option to guide and force your indexing logic.

You can use this
mysql> explain select * from students use index(name) where id < 3;
So a USE INDEX after table name helps MySQL, to determine the better keys

mysql> explain select * from students FORCE index(name) where id < 3;

FORCE INDEX after table name mandates to use the particular index. In this particular case, we forced a wrong index(index named name) and results in full table scan. So be extremely careful before using it. If you are not sure about index, let MySQL pick the index for you.

Saturday, January 26, 2013

SELECT Query


The word select according to dictionary is "Carefully choose as being the best or most suitable"Its so true in context of SQL too. A select query can have do a lot. If you talk to DB based programmer, he will be spending most of his time on getting proper select queries. Faster the query, better it is (obviously, it should provide you with right data).

The smallest possible select query is "SELECT 1;" while the lengthiest one can go any long. (I myself have seen 50+ line select query, which used 44 tables - and for obvious reason was never good).

Syntax

At first level a select syntax can be given as

SELECT <field_selections>
FROM <associated tables>
WHERE <conditions>
GROUP BY <grouping fields> HAVING <conditions>
ORDER BY <ordering fields>
LIMIT <limits>
Note: This order of clause is important

Lets consider these two tables for the discussion.






students

parents


id

id


name

name


class

contact_number


date_of_joining




parent_id








 In students table we have the details of the students. Parents table holds the details of the parents. A student can have parents details entered in the system. The parent_id in students table is mapped to parents table.

Lets try by writing how each part is used. Hope you still remember you first select query which used "select", "from" and "limit" clauses.

WHERE CLAUSE

Query 1: Get me the details of the student named  Paul Matt.
mysql> select * from students where name = 'Paul Matt';
+----+-----------+-------+-----------+---------------------+
| id | name      | class | parent_id | date_of_joining     |
+----+-----------+-------+-----------+---------------------+
|  1 | Paul Matt | 7     |      NULL | 2011-06-10 00:00:00 |
+----+-----------+-------+-----------+---------------------+
1 row in set (0.08 sec)

So here comes "WHERE" clause. A where clause is used to filter rows. name = 'Paul Matt' checks whether the name is "Paul Matt".

Query 2: Get me the details of the student named who joined after Jan 2008
mysql> select * from students where date_of_joining > '2008-01-01';
+----+-----------+-------+-----------+---------------------+
| id | name      | class | parent_id | date_of_joining     |
+----+-----------+-------+-----------+---------------------+
|  1 | Paul Matt | 7     |      NULL | 2011-06-10 00:00:00 |
|  2 | Liz Frank | 8     |      NULL | 2008-06-12 00:00:00 |
+----+-----------+-------+-----------+---------------------+
2 rows in set (0.00 sec)

Query 3: Get me the details of the student named who joined after Jan 2008 and is not in 8th grade
mysql> select * from students where date_of_joining > '2008-01-01' and class != '8';
+----+-----------+-------+-----------+---------------------+
| id | name      | class | parent_id | date_of_joining     |
+----+-----------+-------+-----------+---------------------+
|  2 | Liz Frank | 8     |      NULL | 2008-06-12 00:00:00 |
+----+-----------+-------+-----------+---------------------+
1 rows in set (0.00 sec)

Query 2 and 3 are self explanatory explanatory. Please go through them as each has special operators

Some more where clauses: 
All students in 7th, 8th or 9th grade - where class in ("7", "8", "9")  // same as (class = "7" or class ="8" or class="9")
All students whose parent details are not provided - where parent_id is null  // "is null" a special operator to check null values
All students whose name start with Mark - where name like 'Mark%' 

LIKE and NOT LIKE is used for matching the patterns. % matches 0 or more characters and _(underscore) matches exactly one character

Note: You need to make sure you escape any special characters like quotes, slashes, etc wherever needed.

Try yourself :
P1: All students' name who has parent details are entered and is in grade 8 and name ends with "Eddie", roll number is even and joined in 2009.

Now lets have more fun.

DISTINCT

Distinct is a keyword used to list unique result set. 
Query 4: Get all different name among students
mysql> select distinct name 
from students 

+-----------+

| name      |
+-----------+
| Paul Matt |
| Liz Frank |
+-----------+
2 rows in set (0.00 sec)
In this case, if there a second another Liz Frank, her name wont be listed. 
Note: distinct has to be specified before specifying any column. and it applies to all fields. If you have id field, as id is unique, might not do the desired

JOINS

Join clause is what makes most select queries complex. What is join? To be simple, its combining a table other tables to fetch all the details.

Query 5: Get me name of all students and their parents.
mysql> select parents.name, students.name 
from students 

join parents on parents.id =students.parent_id;

+--------+-----------+
| name   | name      |
+--------+-----------+
| Smith  | Paul Matt |
| Muller | Liz Frank |
+--------+-----------+
2 rows in set (0.00 sec)
The join clause combines students and parents tables using ON clause. ON clause is similar to WHERE clause except that defines relation between tables. If ON clause is not mentioned, it will join all students records against all parents records. 
Have you found any issue here? If the parents details is not present, the student details wont be listed. How to fix that.
mysql> select parents.name, students.name 
from students 

left join parents on parents.id =students.parent_id limit 2;

+--------+-----------+
| name   | name      |
+--------+-----------+
| Smith  | Paul Matt |
| null | Francy Linc|
+--------+-----------+
2 rows in set (0.00 sec)
Now we have added a LEFT infront of join. What does it mean? Yes, you got it right. All records in the left table(table mentioned first) will be listed even if it doesnt have a matching record in the right table based on the ON clause.

This query can re-written using a RIGHT JOIN also. Its so self-explanatory
mysql> select parents.name, students.name 
from parents 
right join students on parents.id =students.parent_id limit 2;
Note: If LEFT or RIGHT is mentioned, JOIN is considered as INNER JOIN
Its always advisable to use a LEFT or INNER JOIN based on what you need(MySQL converts RIGHT joins to LEFT join by re-structuring the query)

As we have reached this long, I will introduce the concept of ALIAS. 

mysql> select p.name as p_names.name as s_name
from students as s 

left join parents as p on p.id =s.parent_id limit 2;

+--------+-----------+
| p_name | s_name    |
+--------+-----------+
| Smith  | Paul Matt |
| null | Francy Linc|
+--------+-----------+
2 rows in set (0.00 sec)
If you check the query above you will see the difference. After table names, we have used an "AS <alias_name>". So instead of saying table names everywhere, you can use the alias name. As you proceed further, you will know how useful its going to be.

Now lets do a tricky one.
Query 6: Get me all siblings in the school.
The assumption we have to make is, for siblings parent_id will be same.
mysql> select s1.name, s2.name
from students as s1 
inner join students as s2 
on s1.parent_id = s2.parent_id and s1.id < s2.id 
where s1.parent_id is not null;
+-----------+------+
| name      | name |
+-----------+------+
| Paul Matt | Zony |
+-----------+------+
1 row in set (0.00 sec)

What have we done here? We are joining the students table with itself  such that two students have the same parent (s1.parent_id = s2.parent_id). s1.id < s2.id is used to make sure only once a row sibling is picked, otherwise you might get 3 rows as Paul Matt-Zony, Paul Matt-Paul Matt and Zony-Zony. And finally where clause make sure any student without parent details will be considered for listing( otherwise all students whose parent details are not entered will be siblings!!)

ORDER BY

It defines the order in which the data has to be listed. For example, you would like to get the student listing sorted by recently joined, and if two students joined on same day, first name should come first
mysql> select s.name, s.date_of_joining
from students as s
order by s.date_of_joining desc, s.name ascending 
Ordering can be either ascending(asc) or descending(desc). If ordering is not mentioned, its considered as ascending. (Ordering can be performed on any column from the table, any computed field in selected fields and any computed expression. You will see more of it in future)

LIMIT

MySQL has a non-sql standard for LIMIT clause. Limit is often followed by 1 or 2 numbers both being non-negative. Examples are:
LIMIT 10 - fetches the first 10 matching records (same as LIMIT 0,10)
LIMIT 20, 10 - fetches the 10 matching records starting from offset 21

GROUP BY... HAVING

All clauses so far pretty straight forward. Group by and having clauses are not understood by most people. GROUP BY is used as it says, groups multiple rows. Lets crack them with examples:

Query 7: Get me number of children each parent have
mysql> select p.name, count(*) 
from parents as p 
inner join students as s on s.parent_id = p.id 
group by p.id;
+--------+----------+
| name   | count(*) |
+--------+----------+
| Smith  |        2 |
| Muller |        1 |
+--------+----------+
2 rows in set (0.05 sec)

COUNT(*) is called an aggregate function(it is applies after grouping the together rows.) in MySQL and returns the count of matching rows. Here, we are grouping on p.id(parent_id) and joining the students table on parent_id fields. And we want to get the number of matching rows for the parent. 
Some of the other commonly used aggregate functions are - SUM, AVG, MIN, MAX.
For example, you want to get the when his child joined school, then MIN(date_of_joining) can be used.

You might think of what will happen if I dont use a GROUP BY clause while using aggregate function. It will search across matching rows considering it as single group.
mysql> select count(*) 
from parents as p 
inner join students as s on s.parent_id = p.id 
Will provide you number of students whose parent details are available.

Now we have the trickier part. HAVING - its very similar to WHERE, except that it is applied after aggregation.
Query 8: Get me number of children  each parent have if they have atleast two and atleast one joining before 2008
mysql> select p.name, count(*) 
from parents as p 
inner join students as s on s.parent_id = p.id 
group by p.id
having count(*)>=2 and min(date_of_joining)<'2008-01-01'
;
+--------+----------+
| name   | count(*) |
+--------+----------+
| Smith  |        2 |
| Steven |        3 |
+--------+----------+
2 rows in set (0.05 sec)
min(date_of_joining)<'2008-01-01'  make sure there is atleast one student joined before 2008.


More to try for yourself:
P2:
Get me the parents who have both kids of both gender(assume you have a gender column in students table)
Tip: count(distinct gender) 

P3:
Get me the count of students who have parent's details and who does not.


IMPORTANT NOTE

The execution of query always happens as 
1. FROM AND JOIN clauses
2. WHERE clause
3. GROUP BY 
4. HAVING
5. ORDER BY
6. LIMIT

If you have an order by and a group by, ORDER BY is applied only after grouping. 
mysql> select s.name, parent_id 
from students as s 
group by parent_id
order by s.id
So in the above query, you might not get the name of the first id , but it just picks first records it finds while grouping and sort the results on id.

Answers
P1:
mysql> select name from students where parent_id is not null and class = '8' and name like '%Eddie' and id%2=0 and date_of_joining >= '2009-01-01 00:00:00' and date_of_joining < '2010-01-01 00:00:00';
If you are interested explore between operator , " <field> between <value1> and <value2> "