Sunday, January 27, 2013

Indexes

What is an Index?

Consider a dictionary in which words are not arranged in alphabetic order and you want to search for a word. There is no easy way to get the find the word. Same is true about database. Most common method of index is B-Tree(with leaf nodes having the data and intermediate nodes for navigation, read more if you are interested, but not relevant for proceeding with MySQL). Any index basically makes searching of data fast. So if you want to search for an id = 5, it searches through the B-tree(or whatever indexing data-structure is) and gives result without scanning whole table to identify which record has id = 5.

Types of Indexes

There are 4 major types of indexes in MySQL. 
Primary  
Primary index or primary key is value that can uniquely identify a row in a table. In a table there can be only one primary key. Usually id is the field used as the primary key for the table. Most databases has an option of making the primary key auto_increment field. So you dont have to bother about what is the next value of the id while adding a new record, MySQL will take care of it by itself. As a primary key need to uniquely identify a row, it can never have a NULL value. Its always good to have a primary key in a table, preferably as id(but its not a rule, but a guideline)
Note: Only primary key field can have auto-increment option.

Index
A generic index is used when you are expecting a lot of search is expected on a column. Assume the column student_name. We need to search on student name quite often but it cannot be a primary key( two students might have same name). So we add an index to the column and make search on the column fast.

Unique 
Unique index is similar to primary key, except that it can accept NULL values. For example, if a students table has an examination roll number column, no two student have same roll number but if he has not registered for an exam yet, he wont have a roll number. So any column which is having a NOT NULL and UNIQUE index can be considered to a primary key.

Full Text
Its similar to generic index but is usually applies on text/string columns to enable search across whole string.

Foreign Key
Lets think as if we have not seen that index for now. Its used to define relationship between two tables. For example, the parent_id in students table need to be a value defined in parents table, otherwise it has no meaning. We will leave the details for time being

Data Storing

In MySQL, the data is stored in the hard-disk in the same order as your primary key. For any other index, it will created a another B-tree with leaf notes having pointers to actual physical memory at the leaf notes.

Multi-Column indexes

You can always have index across multiple columns, ie Suppose you have a student has a roll number in his particular class. So the roll number is unique in each class. There can be roll number 1 in each class, but there can not be same roll number in same class. So we can create a unique index on class and roll number.

Digging one step ahead. The order in which the columns are specified in multip;e-column index is very important. Let us take a same example. There is a separate attendance register for each class and student details are listed based on roll numbers. Now you want to find the student details of student in Grade 7 with roll number 18. Pick the 7th grade register and search quickly to roll number 17. 
Now you want to find the details of all students in in 8th grade, with 70% attendance. You can define your search register of class 8 here.
Now you want to find the first name in all the classes. Now your have to search through all classes(in database terms, a full table scan). If you had the index as roll number and class(with roll number one of all classes in a register), you could have done a quicker search. Hope you got the point.

View Indexes on a table

There are two common ways of viewing the indexes on a table

SHOW INDEX
mysql> show indexes from students;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+-------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| students |          0 | PRIMARY  |            1 | id          | A         |        4 |     NULL | NULL   |      | BTREE      |         |               |
| students |          1 | name     |            1 | name        | A         |        4 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.04 sec)

Table - Table on which the index is applies
Non_Unique - whether the index is unique or not
Column_name - name of the column on which an index is applied, If a multi-column index is defined, it each column will be separately listed
Index_type - the type of index being applied.
Cardinality - Its the count of range of values the column is having. Higher the cardinality, better the search is

SHOW CREATE TABLE

mysql> show create table students \G
*************************** 1. row ***************************
       Table: students
Create Table: CREATE TABLE `students` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) NOT NULL,
  `class` varchar(10) DEFAULT NULL,
  `parent_id` int(10) DEFAULT NULL,
  `date_of_joining` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

The command is used to view how the table was created. It has the details of each column at the beginning. But you can focus on the bold text. It shows the different indexes on the table. So the format of index is 
<index_type> <index_name>(columns_indexed_as_csv)

EXPLAIN

MySQL provides you with an explain command to make sure wont choke the DB server that fast. "EXPLAIN" prefixed to a select query will show you the plan. 

mysql> explain select * from students where id < 3;
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref | rows | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | students | range | PRIMARY       | PRIMARY | 4       | NULL |    2 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.04 sec)

Here the columns you need to be careful about are:
Possible_keys - it hows the list of keys that could make the search fast. As index on name doesnot help us when we are searching on id, its not a possible keys.
Key- The key which is being used from the set of possible values.
Rows - Number of rows expected to match the criteria(its an estimate, need not be accurate)
Remaining columns we will leave for the time being. 

Suppose, if its not using an index or if you think its using a wrong index or if rows matching is too large, the query can be expected to be time consuming and need to be avoided.

USE/FORCE INDEX

Be careful using these clauses.

So suppose, when you did the above query, you observed MySQL is using name index instead of primary key and resulting in more rows than expected. MySQL provides an option to guide and force your indexing logic.

You can use this
mysql> explain select * from students use index(name) where id < 3;
So a USE INDEX after table name helps MySQL, to determine the better keys

mysql> explain select * from students FORCE index(name) where id < 3;

FORCE INDEX after table name mandates to use the particular index. In this particular case, we forced a wrong index(index named name) and results in full table scan. So be extremely careful before using it. If you are not sure about index, let MySQL pick the index for you.

No comments:

Post a Comment