MySQL for Newbies: 2013

Thursday, April 11, 2013

Php Script to load data to HIVE from MySQL tables

I used the below code to load the data from MySQL to hive with no transformation what so ever.

<?
$startTime = microtime(true);
$host = "localhost";
$user = "root";
$password = "";
$backupFolder = '/tmp/hivebackup';
$hiveRestoreFolder = '/tmp/hivebackup';

$mysqli = new mysqli($host, $user, $password);

/* check connection */
if ($mysqli->connect_errno) {
printf("Connect failed: %s\n", $mysqli->connect_error);
exit();
}

//tables to backup
$tablesArr = array(
"db.tbl1",
"db2.tbl2",
);

$fp = fopen("$backupFolder/restore.sql", "w");
$hiveql = "\ncreate database if not exists db1; ";
$hiveql.= "\ncreate database if not exists db2;";
fwrite($fp, $hiveql);

$tablesCompleted = 0;
//foreach table take backup
foreach($tablesArr as $table)
{
$tableStartTime = microtime(true);
print "\nProcessing $table";
@unlink("$backupFolder/$table.log");

//get the table desc
if ($result = $mysqli->query("DESC $table", MYSQLI_USE_RESULT))
{
$sql = "SELECT ";
$hiveql = "\n\nDROP table if exists $table;";
$hiveql .= "\nCREATE TABLE $table(";

while($col = $result->fetch_array())
{
$sql .= ' replace(replace(replace( `'.$col["Field"].'`, "\t", ""), "\n", ""),"\r", "") ,'; $hiveql .= "\n\t`". $col["Field"]. "` " .getHiveType($col["Type"]).",";
}
$sql = substr($sql,0, -1);
$hiveql = substr($hiveql,0, -1);

$sql .= " INTO outfile '$backupFolder/$table.log' from $table \n";
$hiveql .= "\n)\n".'ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LINES TERMINATED BY "\n"; ';
$hiveql .= "\nLOAD DATA INPATH '$hiveRestoreFolder/$table.log' INTO TABLE $table;";

$mysqli->query($sql);
fwrite($fp, $hiveql);

$tablesCompleted++;
}
print "\nBackup taken for $table in ". number_format(microtime(true) - $tableStartTime, 2). " sec";
}
fclose($fp);
print "\n$tablesCompleted tables are backed up in ".number_format(microtime(true) - $startTime, 2). " sec";

print "\n\t\t===== ===== ===== ===== ===== Done ===== ===== ===== ===== =====\n";

function getHiveType($mysqlType)
{
$mysqlType = explode('(', $mysqlType);
$mysqlType = $mysqlType[0];

switch(strtoupper($mysqlType))
{
case 'TINYINT' : return 'TINYINT';
case 'SMALLINT' : return 'SMALLINT';
case 'INT' : return 'INT';
case 'BIGINT' : return 'BIGINT';
case 'FLOAT' : return 'FLOAT';
case 'DECIMAL' : return 'FLOAT';
case 'DOUBLE' : return 'DOUBLE';
case 'BOOLEAN' : return 'BOOLEAN';
case 'TIMESTAMP': return 'TIMESTAMP';
case 'DATETIME' : return 'TIMESTAMP';
default : return 'STRING';

}
}
?>

## Now execute the php file.
$ php myfile.php

# restore from the file you have created
$ hive -f /tmp/hivebackup/restore.sql

Note: I needed only selective tables for my purpose, to load whole DB can find all table or tables based on some condition, SHOW TABLES FROM DB1 like 'xx%' can be used

Saturday, February 23, 2013

More On Select Queries

We have gone thru Select queries fundamentals earlier. You can revisit the same again here. Its time to dig deeper into more what you can do with Select.

Lets try to refresh with a query.

Query 1: To get the name of all parents who has two or more girls in the school.

select p.name
from parents as p
inner join students as s on s.id = parent_id
where gender = 'F'
group by p.id having count(*)>2;

Lets now see select query inside query - sub-queries

Query 2: Find the list of students who has no parent details in the system
There are two approaches - The common approach is using sub-query with not in clause

select s.name
from students as s
where s.parent_id not in ( select id from parents)
Query is like first identifying the all rows from students table and checking with list of all the parent-ids to identify the required records. But if you analyse more, the sub-query is ran for each matching row. And it can be done better with join

select s.name
from students as s
left join parents as p on p.id = s.parent_id
where p.id is null
For those who the query is confusing, ignore the where clause first.It will list all the students and gets parent ids, if they have on (we are using left join). Now we add the where clause, its saying to ignore if parent id is mapped. Now the time take for the query will be lesser. But there can be cases where the sub-queries can be useful.

Just for the sake of knowledge, I will show you another query - but its heavy and I would say, never ever use it.
Query 3 : List all students with number of students from the family currently studying in the school

select s.name,
(select count(*) from students as s1 where s.id = s1.id) as students_from_family
from students as s
Here again the select sub-query is ran for each matching row.This could again be better executed using a JOIN query.
select s.name, count(s1.id)
from students as s
left join students as s1 on s1.parent_id = s.parent_id
group by s.id

Now lets look into another type of query - derived-query. It uses a select query inside FROM or JOIN clause.
Query 4: To get the count of parents and students where parent_id is greater than 4

select s.name, count(s.id), p.name
from students as s
left join (select id, name from parents as p where p.id <4) as p
on p.id = s.parent_id
We can again get the result without join but its good to keep all these weapon with you. If you do an explain, you will notice there is a Temporary table creation involved here(row 2)

+----+-------------+------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | PRIMARY | s | ALL | NULL | NULL | NULL | NULL | 5 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | |
| 2 | DERIVED | p | range | PRIMARY | PRIMARY | 4 | NULL | 2 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+-------------+

I would rather suggest to create your own Temporary Table, execute the query by joining your tables and finally dropping the Temporary Table.

create temporary table temp_parents(
id int(11) not null,
name varchar(30),
index(id)
)
select id, name
from parents
where id > 4;

select s.name, count(s.id), p.name
from students as s
left join temp_parents as p
on p.id = s.parent_id;

delete temporary table temp_parents;

The advantage of creating your own temp table is, we can define your own indexes and optimize the query. Also as your join query is ran on temp table, the original table huge table wont be locked during your query.

Note: Temp tables exists only with in session - ie same table name but with different content can exists across 2 session and will expire soon after the session is closed
In the above create table query, we have combined both CREATE and INSERT into single query. See more options of CREATE TABLE here

These Temp table is approach is useful especially when you parent table is huge (with millions of records) and your join clause needs to match only few thousands rows from the huge table.

Thursday, January 31, 2013

Common Functions and Expressions

Lets now peep into common function in MySQL and probably when they are uses.

Comparison Operators

Operator	Operation	Usage	Comments
=	Equal	a=b	true if equal, else false
!= or <>	Not equal	a!=b or a<>b	true if not equal
<	Less than
<=	Less than or equal
>	Greater than
>=	Greater than or equal
IN	equal to any one in list	a in (1,2,3)	true if a=1 or a=2 or a=3
LIKE	string matching	matches the pattern	strings starting with C followed by a character, then d followed by any number of characters
BETWEEN … AND	searches in a range	a between 5 and 10

More if interested at : http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html

Logical Operations

Operator	Operation	Usage
AND , &&	Logical AND two expressions	1=2 AND 5>6
OR , \|\|	Logical OR
NOT, !	Negative check	NOT a and b !c
XOR	Logical XOR

Control Functions

if - if(expr_to_check, true_expr, false_expr)

eg:

if (5 > 3, "yes", "no") => returns "yes"

ifnull - ifnull(expr_to_check, expr_if_first_expr_empty)

eg:

if(5, 10) => returns 5

if(5 and NULL, "was null") => returns "was null" (5 AND NULL evaluates to null)

case .. when

This is similar to switch..case in programming languages

eg:

case 1+3

when 4-2 then "4-2"

when 6-2 then "6-2"

when 4 then "4"

else "else"

end

=> return "6-2" (it wont go to case 4 as, case 6-2 already matched)

Common Functions

concat

eg: concat( "hello", " ", "world") => returns "hello world"

trim, rtrim, ltrim - Trim white spaces in a string

eg: trim(" a ")

date_format

eg: date_format(now(), '%Y-%m-%d') => returns "2013-01-31"

adddate / subdate

eg: adddate('2012-01-20', interval 5 day) => return '2012-01-25 00:00:00'

interval is a keyword followed by no:of units and unit to operate

cast/convert - for type casting

eg: cast("124as" as unsigned) / convert("124as", unsigned) => returns 124

You might have noticed by now various default MySQL type casting happening here.

Some Tips

If you are having some indexes, to use the advantage of it, do not use it with function. For example convert(id as char)= "10" will not make use of the index on id
While using like, if search pattern starts with %, the index could not be used.
concat(col1, col2) returns null if either one of them is null
MySQL does internal type casting, but could be touch slower(as indexes might not be utilized) if types does not match

By now we have seen just the basics of running select queries, modifying data and some basic DB schema design. Its time to move to phase 2 where we will get deeper into everything we have seen so far.

Keep reading :)

Normalization

Now comes a lot of rules and constraints, we call it Normalization. It is again applied at various level and are numbered as below

1NF

The first normal form asks the attributes to be atomic - means a column should contain only one piece of information. We should not save the say employee ID card number and employee number in single column separated by some separator. The schema we have is already in 1NF

2NF

The 2NF is a rule on primary key. No column should be dependent on subset of primary key. This will be maintained in most cased with the help of auto-increment id acting as primary key

3NF

The 3NF says, all the values in a table should be fully dependent on the primary key only. Or in other words, a column in a table should not be obtained based another column(s) in the same table. So in our case employee_id/card_number in swipe details and attendance register are mutually dependent. If you know one, you can get the other from employee entity. So we will have to remove employee id or card number. Employee id being the key identifier for employee table, I would prefer to save employee id over card number(also card number is string which occupies more space and searching is slower compared to searching numbers).

When I started career, I always tried to do normalization. But now I understand, these are guidelines and not rules. More you learn the database and your application, you will find an excuse not to follow these. If you are interested, read more on the same.

Wednesday, January 30, 2013

Designing Your First Table

Till now, I have been using the schema someone before you have created. Its time to do a simple DB design for yourself. Lets suppose we have to make a simple DB design for an employee attendance tracking application. So application need to save the details of employee and daily attendance timing.

While DB designing anything, the following steps are usually

Understand the system
Identify the key entities and what all information need to be saved
Define the first schema

Try with other possible structures if required
Re-define based on expected usage and behaviour of application

Add indexes based on expected usage

Understanding the System

We can assume that the attendance is tracked using swiping machines which are located at the entrance and the exit. An employee can swipe in and swipe out any number of times on a given day. A users first swipe-in and last swipe-out will be used to make sure he was present. He can be present(8+hours), half day(4-8hours) or leave based on his availability at office on a given day.

Identifying the key entities

So here most important entity is employee. Each employee has name, email, gender, and ID card number (Lets bother only about these many information as of now). Now we have swipe in and out. There are two approaches here - 1. considering swiping in/out as entity or 2. attendance itself as an entity. Lets us consider both cases, we are saving all swipe ins and outs plus direct attendance information. If swipe-in and swipe-out will have a time associated with it. Attendance will have first swipe-in and last-swipe out along with derived information of hours at office and his presence type.

Defining First Schema

Lets define for each entity the attributes possible and their data types

Employees

id - Int - Auto-increment id generated in the system(internally we will refer any employee using id)

name - string - user name is string with max size as say 100 characters

ID card-number - string - Each employee has a unique card number, but once the employee exits the company the same card number can be allocated to a different employee

gender - enum - an employee can be male or female(we can assume be default employees are female)

email - string - optional and can have max of 50 characters

Swipe Details

id - Int - Auto-increment id generated in the system

time - timestamp, we can gracefully assume that time at which the entry is made is same as swipe time. (Default value has to be now())

swipe_action - enum - swipe in or swipe out

card_number - string - card number mentioned in the card

employee_id - employee id for whom the card belongs (captured from employee table)

Attendance Register

id - Int - Auto-increment id generated in the system

swipe_in_time - timestamp, we can gracefully assume that time at which the entry is made is same as swipe-in time. (Default value has to be now())

swipe_out_time - timestamp - time of last swipe-out

card_number - string - card number mentioned in the card

employee_id - employee id for whom the card belongs (captured from employee table)

time_at_office - time - time for which he was present

attendance_type - enum - can be present, half day or leave

The need to maintain attendance register alone and/or swipe details alone depends on you need. Attendance register cannot give all swipe information, but if you record all swipes, you may look at as junk data. If you look other way round, if you want to store all swipes, attendance_register is redundant data; again based on your application, it might be good to have a register for quick report generation.

Trade-Offs In Data Type & Column

Integer, String and float number selection are quite obvious if you are familiar with some programming language and I dont want to spend much time on the same.
Enum Vs TinyInt - You can define an Enum which can also be achieved using integers. If you use integers, it lack readability and also all applications using that column should have a mapping for what each number stands for. But if you use number, its easy to add a new value(To allow a new value to an existing ENUM field, you have to alter table, which might be time consuming if its a big table). In this case itself, suppose an employee came to office on a weekend and left in 3 hours, he will marked as leave and it was decided to introduce a new status saying "holiday" so that.
Datetime Vs Timestamp - As said in earlier post, timestamp is internally saved as a number and is always better than a string base datetime.
Text Vs Blob - Both fields saves long text. For now you can take it like this - searchable text field refers to text and in other cases blob(blobs usually used for binary data chunk)
Employee Id Vs Card number in attendance register - The question is strictly based on you load. If you have such a huge employees list, on each swipe, it might not be possible to get the employee id from employees table(as it might result in too many hits on you employees table also). If you are expecting huge reports on attendance, you can think of some fetching and updating employee id once in a day to make reports faster. So its all based on your application you have to make the decision.

Some common fields I recommend

As someone said, "only change is permanent", so is you application. It always good to leave space for future when u design today.

id - Usually for tables we can go for unsigned int. If you are expecting huge data, bigint would be good.(1billion records in DB not that often, so I would say int)
Audit fields - In any table, I would always recommend these 4 fields - created_on, created_by, last_modified_on, last_modified_by(wherever possible). That is quick reference log, even if you have audit logs, these are good to have
status - In previous post on delete, I said never to DELETE, but to use a status field which is usually by default 1(active). I would recommend it to be tinyint rather than boolean - again for purpose.
additional_info - This column is not available in all my DBs, but I would use it whenever there is a space of some additional info coming. Assume in our employee table we need to add blood group today and then comes request for hobbies next day, then home town and so on. If I am sure, these are just some readonly data and no search will be done on these details(as indexing wont work in this case), I would json_encode and dump the data into that column. This is violation of 1NF and recommended only if you senior approves this :)

Note: Naming convention for MySQL tables and columns is usually words separated by underscore(my_table, column_name_1) but camel case is also good

Selection Of Indexes

Which all columns should be indexed. In normal case, I would say any column you are expecting a search, add an index. By searching, I include the columns used for join.

I would make any column(s) expected to be unique in the system, be indexed as unique and reduce the pain on application for checking if it already exists or not. Here I would recommend employee card number to be unique in employees table. Also adding a column of type date to attendance_register and making it unique along with employee_id would be good(so one record per employee per day)

Note:

Indexing on columns like status might not actually serve the purpose. If the search yields more than 30% of data, mysql prefers full table scan to using the index usage

If there are two indexes on column_1 and column_2 and if the search is on both 2 columns, mysql will use one of these or both indexes based on necessity

Now lets see what we have come up


employees	swipe_details	attendance_register
id	id	id
name	swiped_on	swiped_in
card_number	swipe_action	swipe_out
gender	card_number	card_number
email	employee_id	employee_id
status		date
created_on		time_at_office
created_by		attendance_type
modified_on
modified_by
additional_info

Query

So lets now write our first create table query:

create table employees(

id int(11) unsigned not null auto_increment,

name varchar(100) not null,

card_number varchar(50) not null,

gender enum('Male', 'Female') default 'Female',

email varchar(100),

status tinyint(3) unsigned not null default 1,

created_on timestamp default current_timestamp,

created_by int not null default -1,

modified_on timestamp null,

modified_by int null,

additional_info text,

primary key (id),

index indx_name(name),

unique index indx_card_number(card_number)

);

For the sake of explanation, I would give a brief explanation

Generic Syntax for column specification:

<column_name> <data_type[(<size allowed>)]> [[NOT ]NULL] [DEFAULT <value>]

column_name can be alphanumberic starting with non digit(if you give column name inside ``, you can infact give anything as column name- keywords, quotes, slashes, so be careful

datatype and size defines how much memory it takes. For int fields, size is for only display purpose, doesnot effect actual memory

null - means the column can have null values

default - this defines the default value

Generic Syntax for index specification

<index_type> [index_name](<col1>[, col2[,col3..]])

Now try to create other table for yourself. Feel free to make changes based on the application you can think of.

Note: I have purposefully removed concept of normalization from here feeling its too much for first table and I have got into habit of violating it :)

Sunday, January 27, 2013

Writing Data to Tables

So far we have been reading data from existing database. Its time to start how the data came into the tables.

INSERT

Insert, as the name indicates adds new records to a table. The most common and simple syntax is

INSERT INTO <table_names>

[(col1[,col2 [,col3..]])]

VALUES

(value1[,value2 [,value3..]])

[,(value1[,value2 [,value3..]])]

;

Lets looks a real example.

mysql> insert into students

(name, class, date_of_joining) values

("Prior", "8", now());

Query OK, 1 row affected (0.06 sec)

mysql> select last_insert_id();

+------------------+

| last_insert_id() |

+------------------+

| 5 |

+------------------+

1 row in set (0.05 sec)

mysql> select * from students where id = 5;

mysql> select * from students order by id desc limit 1;

+----+-------+-------+-----------+---------------------+

+----+-------+-------+-----------+---------------------+

| 5 | Prior | 8 | NULL | 2013-01-27 21:48:20 |

+----+-------+-------+-----------+---------------------+

1 row in set (0.00 sec)

So lets analyse it. First line says, insert into students so we are selecting the table. Now we specify the columns to which we are going to update the data. Then after the VALUES keyword, we specify all the column values. You can insert multiple rows also using a single insert query.

Note : You might have noticed now() being used for date_of_joining. It returns the current timestamp.

In the query we have not passed the id, and the MySQL has auto generated the value for id based on the current max id in the table. So to retrieve the that id, we use the special function last_insert_id().

Note: The function is not related to a table but to a connection, ie irrespective of the table and database, it returns the last id it has created.

Third query is quite obvious, verifying the data.

UPDATE

Now, Prior got promoted to 9th grade. So basically we have to the class for him.

mysql> update students

set class="9", date_of_joining=now()

where id = 5;

Query OK, 1 row affected (0.09 sec)

Rows matched: 1 Changed: 1 Warnings: 0

mysql> select * from students where id = 5;

+----+-------+-------+-----------+---------------------+

+----+-------+-------+-----------+---------------------+

| 5 | Prior | 9 | NULL | 2013-01-27 21:48:20 |

+----+-------+-------+-----------+---------------------+

1 row in set (0.00 sec)

The UPDATE query is similar to SELECT query as it has table selection(can select multiple table using index) and has a where clause. An update cannot have GROUP by clause.

So here again I am breaking the query to 3 parts. UPDATE students, select the table to apply the update statement. Then comes the SET keyword, followed by columns to be updated. A comma separated list of "columns=value" can be given. Followed by optional WHERE clause.

You can use an update on multiple tables as well.

mysql> update students as s

left join parents as p on p.name = "Smith"

set class="9", date_of_joining=now(), s.parent_id = p.id

where s.id = 5;

Query OK, 1 row affected (0.09 sec)

Rows matched: 1 Changed: 1 Warnings: 0

Hope you can understand this query. We are updating the parent details of Prior with Smith.

Before running an update query, its always good to run a select query to identify the updating rows. Its simple to do.

Step 1: Replace "update" with "select * from"

Step 2: Remove text from "set" till where.

So the above query will look like

mysql> select * from students as s

left join parents as p on p.name = "Smith"

where s.id = 5;

Query OK, 1 row affected (0.09 sec)

Rows matched: 1 Changed: 1 Warnings: 0

Its even better, if you do an explain also before running the select query also.

DELETE

On any given day, I would say no deletion of data from your database. If you dont need the data, add a column status/active/enabled(or whatever u like) and set the value accordingly, effectively using only an update query.

For sake of information, deletion can be done as:

Prior has completed the course and leaving school now. So, we would like to remove his information.

mysql> delete from students where id = 5;

Query OK, 1 row affected (0.01 sec)

mysql> select * from students where id = 5;

Empty set (0.00 sec)

Its exactly same as SELECT, except that it cannot use GROUP BY.

We can also delete from only specific tables when you are using join.

mysql> delete s.*

from students as s

INNER join parents as p on s.parent_id = p.id

where s.id = 5;

This will select records from both students and parents but will delete only from s.*(students table).By default all tables will be cleared.

Right now we have seen only the basic INSERT, UPDATE and DELETE queries. More complex things are yet to come and we will explore it later.

Indexes

What is an Index?

Consider a dictionary in which words are not arranged in alphabetic order and you want to search for a word. There is no easy way to get the find the word. Same is true about database. Most common method of index is B-Tree(with leaf nodes having the data and intermediate nodes for navigation, read more if you are interested, but not relevant for proceeding with MySQL). Any index basically makes searching of data fast. So if you want to search for an id = 5, it searches through the B-tree(or whatever indexing data-structure is) and gives result without scanning whole table to identify which record has id = 5.

Types of Indexes

There are 4 major types of indexes in MySQL.

Primary

Primary index or primary key is value that can uniquely identify a row in a table. In a table there can be only one primary key. Usually id is the field used as the primary key for the table. Most databases has an option of making the primary key auto_increment field. So you dont have to bother about what is the next value of the id while adding a new record, MySQL will take care of it by itself. As a primary key need to uniquely identify a row, it can never have a NULL value. Its always good to have a primary key in a table, preferably as id(but its not a rule, but a guideline)

Note: Only primary key field can have auto-increment option.

Index

A generic index is used when you are expecting a lot of search is expected on a column. Assume the column student_name. We need to search on student name quite often but it cannot be a primary key( two students might have same name). So we add an index to the column and make search on the column fast.

Unique

Unique index is similar to primary key, except that it can accept NULL values. For example, if a students table has an examination roll number column, no two student have same roll number but if he has not registered for an exam yet, he wont have a roll number. So any column which is having a NOT NULL and UNIQUE index can be considered to a primary key.

Full Text

Its similar to generic index but is usually applies on text/string columns to enable search across whole string.

Foreign Key

Lets think as if we have not seen that index for now. Its used to define relationship between two tables. For example, the parent_id in students table need to be a value defined in parents table, otherwise it has no meaning. We will leave the details for time being

Data Storing

In MySQL, the data is stored in the hard-disk in the same order as your primary key. For any other index, it will created a another B-tree with leaf notes having pointers to actual physical memory at the leaf notes.

Multi-Column indexes

You can always have index across multiple columns, ie Suppose you have a student has a roll number in his particular class. So the roll number is unique in each class. There can be roll number 1 in each class, but there can not be same roll number in same class. So we can create a unique index on class and roll number.

Digging one step ahead. The order in which the columns are specified in multip;e-column index is very important. Let us take a same example. There is a separate attendance register for each class and student details are listed based on roll numbers. Now you want to find the student details of student in Grade 7 with roll number 18. Pick the 7th grade register and search quickly to roll number 17.

Now you want to find the details of all students in in 8th grade, with 70% attendance. You can define your search register of class 8 here.

Now you want to find the first name in all the classes. Now your have to search through all classes(in database terms, a full table scan). If you had the index as roll number and class(with roll number one of all classes in a register), you could have done a quicker search. Hope you got the point.

View Indexes on a table

There are two common ways of viewing the indexes on a table

SHOW INDEX

mysql> show indexes from students;

+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+-------------+

+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| students | 0 | PRIMARY | 1 | id | A | 4 | NULL | NULL | | BTREE | | |

+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

2 rows in set (0.04 sec)

Table - Table on which the index is applies

Non_Unique - whether the index is unique or not

Column_name - name of the column on which an index is applied, If a multi-column index is defined, it each column will be separately listed

Index_type - the type of index being applied.

Cardinality - Its the count of range of values the column is having. Higher the cardinality, better the search is

SHOW CREATE TABLE

mysql> show create table students \G

*************************** 1. row ***************************

Table: students

Create Table: CREATE TABLE `students` (

`id` int(10) NOT NULL AUTO_INCREMENT,

`name` varchar(50) NOT NULL,

`class` varchar(10) DEFAULT NULL,

`parent_id` int(10) DEFAULT NULL,

`date_of_joining` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

PRIMARY KEY (`id`),

KEY `name` (`name`)

) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

The command is used to view how the table was created. It has the details of each column at the beginning. But you can focus on the bold text. It shows the different indexes on the table. So the format of index is

<index_type> <index_name>(columns_indexed_as_csv)

EXPLAIN

MySQL provides you with an explain command to make sure wont choke the DB server that fast. "EXPLAIN" prefixed to a select query will show you the plan.

Here the columns you need to be careful about are:

Possible_keys - it hows the list of keys that could make the search fast. As index on name doesnot help us when we are searching on id, its not a possible keys.

Key- The key which is being used from the set of possible values.

Rows - Number of rows expected to match the criteria(its an estimate, need not be accurate)

Remaining columns we will leave for the time being.

Suppose, if its not using an index or if you think its using a wrong index or if rows matching is too large, the query can be expected to be time consuming and need to be avoided.

USE/FORCE INDEX

Be careful using these clauses.

So suppose, when you did the above query, you observed MySQL is using name index instead of primary key and resulting in more rows than expected. MySQL provides an option to guide and force your indexing logic.

You can use this

mysql> explain select * from students use index(name) where id < 3;

So a USE INDEX after table name helps MySQL, to determine the better keys

mysql> explain select * from students FORCE index(name) where id < 3;

FORCE INDEX after table name mandates to use the particular index. In this particular case, we forced a wrong index(index named name) and results in full table scan. So be extremely careful before using it. If you are not sure about index, let MySQL pick the index for you.