In the process of developing software, each language adds indexes to the corresponding fields in advance , Learn from good examples python This language needs to have a high awareness of index , Let's have a look !

1、 Index Introduction 、 Method 、 type

① Introduce

Can help users quickly find the content they need ; stay MySQL Also known as “ key ”, It is a data structure used by the storage engine to quickly find records . Can greatly improve the query efficiency . Especially when the amount of data is very large , When a query involves multiple tables , Using indexes can often speed up queries thousands of times .

summary :

The purpose of indexing is to improve query efficiency , It's the same thing as the catalog we use to look up books : First, I'll go to chapter , Then go to the next section of the chapter , Then find the number of pages . Similar examples include : consult a dictionary , Look up the map and so on

The essence :

Filter out the final desired results by constantly narrowing the range of data you want to obtain , And turn random events into sequential events , in other words , With this indexing mechanism , We can always lock data in the same way .

② Index method - MySQL Of B+TREE Improved index

Is a kind of index value according to a certain algorithm , Store in a tree like data structure

Concept bedding ——B-Tree data structure

structure B-TREE Each node is a binary array : [key, data], All nodes can store data .key Index key,data Divide by key Other data . chart

Search principle

First, binary search from the root node , If found, return the data, Otherwise, recursively search the node pointed by the pointer of the corresponding interval , Until the node is found or not returned null The pointer . shortcoming

Inserting or deleting new data records will destroy B-Tree The nature of , So when inserting a delete , A split of the tree is needed 、 Merge 、 Transfer and other operations to maintain B-Tree nature . cause IO Frequent operation .

Interval search may need to return to the upper node to repeatedly traverse ,IO Cumbersome operation .

Concept bedding ——B+Tree(B-Tree Variants )

structure And B-Tree comparison ,B+Tree There are the following differences : Non leaf nodes do not store data, Store index only key; Only leaf nodes store data. chart

Picture description here

Mysql in B+Tree Introduce

structure In the classic B+Tree On the basis of the optimization , Added sequential access pointer . stay B+Tree Add a pointer to the adjacent leaf node for each leaf node of , It forms a B+Tree. This improves interval access performance : If you want to query key For from 18 To 49 All the data records of , When it finds 18 after , All data nodes can be accessed at one time by traversing them in the order of nodes and pointers , It greatly mentions the efficiency of interval query ( There is no need to return the upper parent node, and repeat traversal search reduces the number of searches IO operation ). chart

Picture description here Mysql choice B+TREE The reason for indexing

The index itself is big , It's impossible to store everything in memory , Therefore, indexes are often stored on disks in the form of index files . In this case , Disk will be generated during index search I/O Consume , Relative to memory access ,I/O How many orders of magnitude higher is the consumption of access , Therefore, the structure of the index organization should try to reduce the disk in the search process I/O Access times of , Improve indexing efficiency . MyISAM & InnoDB All use B+Tree Index structure . But the underlying index store is different ,MyISAM Using nonclustered indexes , and InnoDB Using a clustered index .

Picture description here

Picture description here

More about MySQL Index data link


③ Index method -HASH Indexes

hash It's a kind of (key=>value) Key value pairs of form , Allow multiple key Corresponding to the same value, But not one key Corresponding multiple value, Create... For a column or columns hash Indexes , Will use the value of this column or several columns to calculate a hash value , Corresponding to one or more lines of data .hash The index can be located at one time , You don't need to look at it layer by layer like a tree index , So it's very efficient .

④HASH And B+TREE Compare

hash Index of type : Quick to query a single item , Range query is slow

btree Index of type :b+ Trees , The more layers , More data , Range query and random query are fast (innodb Default index type )

Different storage engines support different index types

InnoDB Support transactions , Support row level locking , Support Btree、Hash Wait for the index , I won't support it Full-text Indexes ;

MyISAM Unsupported transaction , Support table level locking , Support Btree、Full-text Wait for the index , I won't support it Hash Indexes ;

Memory Unsupported transaction , Support table level locking , Support Btree、Hash Wait for the index , I won't support it Full-text Indexes ;

NDB Support transactions , Support row level locking , Support Hash Indexes , I won't support it Btree、Full-text Wait for the index ;

Archive Unsupported transaction , Support table level locking , I won't support it Btree、Hash、Full-text Wait for the index ;

⑤ Index type

General index : Speed up queries

Create table + Indexes

Create a table and add name The field is a normal index

create table tb1(

id int not null auto_increment primary key,

name varchar(100) not null,

index idx_name(name)


Create index

Specify the normal index for the table separately

create index idx_name on tb1(name);

Delete index

drop index idx_name on tb1;

Look at the index

show index from tb1;

Look at the index 、 Column introduction

·Table The name of the table .

·Non_unique If the index is unique , Then for 0, If yes, it is 1.

·Key_name Name of index

·Seq_in_index Column sequence number in index , from 1 Start .

·Column_name Column name .

·Collation How columns are stored in indexes . stay MySQL in , Valuable ‘A’( Ascending ) or NULL( No classification ).

·Cardinality An estimate of the number of unique values in the index .

·Sub_part If the column is only partially indexed , Is the number of characters indexed . If the entire column is indexed , Then for NULL.

·Packed Indicates how keywords are compressed . If it's not compressed , Then for NULL.

·Null If the column contains NULL, It contains YES. without , The column contains NO.

·Index_type Used indexing methods (BTREE, FULLTEXT, HASH, RTREE).

·Comment A variety of comments

unique index : Speed up queries and Unique constraint ( It can contain one null value )

Create table + only (unique) Indexes

            create table tb2(
              id int not null auto_increment primary key,
              name varchar(50) not null,
              age int not null,
              unique index idx_age (age) 

establish unique Indexes

create unique index idx_age on tb2(age);

primary key : Speed up queries and Unique constraint ( Do not include null)、 There can be at most one primary key index in a table

Create table + Primary key

Mode one :

create table tb3(

id int not null auto_increment primary key,

name varchar(50) not null,

age int default 0


Mode two :

create table tb3(

id int not null auto_increment,

name varchar(50) not null,

age int default 0 ,

primary key(id)


Create primary key

alter table tb3 add primary key(id);

Delete primary key

Mode one

alter table tb3 drop primary key;

Mode two :

If the current primary key is self incrementing , You can't delete it directly . You need to modify the auto increment attribute first , And then delete

alter table tb3 modify id int ,drop primary key;

Composite index : A composite index is a combination of n Columns combined into an index

Create table + Composite index

create table tb4(

id int not null ,

name varchar(50) not null,

age int not null,

index idx_name_age (name,age)


Create a composite index

create index idx_name_age on tb4(name,age);

Index application scenarios

For example, you are making a membership card system for a shopping mall . The system has a membership table that contains the following fields :

Membership number INT、 Membership name VARCHAR(10)、 Membership ID number VARCHAR(18)、 Membership telephone VARCHAR(10)

Member Address VARCHAR(50)、 Member notes TEXT

So this membership number , A primary key , Use PRIMARY

If you want to index member names , So it's normal INDEX

If the membership ID number is to be indexed, , Then you can choose UNIQUE( Unique , No repetition )

2、 Aggregate index 、 Secondary index

① Clustered index :InnoDB Table index organizes tables , That is, the data in the table is based on the primary key B+ Tree storage , The leaf node stores the whole data directly , There can only be one clustered index per table .

When you define a primary key ,InnnodDB The storage engine uses it as a clustered index If you don't define a primary key , be InnoDB Go to the first unique index , And all column values of the index are empty , Then treat it as a clustered index If the table does not have a primary key or an appropriate unique index INNODB It produces a hidden line ID value 6 Byte line ID Clustered index Add : Because the actual data page can only be in accordance with one B+ Trees sort , So each table can only have one clustered index , Clustered index is very beneficial for sorting and range searching of primary keys . Example : For example, the library has a new batch of books . So these books need to be in the library . There is a rule about how to put books , It's a magazine 101 shelves , The literary category is 102 shelves , In science and engineering 103 Shelves and so on . These rules of storage determine where each book should be placed , Finding the corresponding shelf is equivalent to finding all the books . In this case, the clustered index is the category of the book .

② Secondary index :( Also called nonclustered index ) A leaf node does not contain all the data of a row , The leaf node contains key values , It also contains a bookmark connection , Find the corresponding row data through the bookmark .

For it this way , You need two steps :

Query the location of the record . Use this location to get the record you are looking for .

③ The differences between the two and the applicable scenarios

Difference between the two :

The same thing : Whether it's a clustered index or a secondary index , It's all inside B+ Tree form , That is, the height is balanced , Leaf nodes hold all the data .

The difference is : The leaf node of clustered index stores a whole line of information , The auxiliary index leaf node stores the information of a single index column .

When to use clustered or nonclustered indexes

Action description Using a clustered index Use nonclustered indexes Columns are often sorted in groups Should be Should be Returns a range of data Should be Should not One or very little difference in value Should not Should not Frequently updated Columns Should not Should be Foreign key column Should be Should be Primary key column Should be Should be Modify index columns frequently Should not Should be

3、 Test index

① Create data

– 1. Create table


id int NOT NULL,
age int,
sex char(1) not null,
email varchar(64) default null


Be careful :MYISAM The storage engine does not generate engine transactions , Data insertion is extremely fast , To facilitate quick insertion of test data , After we plug in the data , Then change the storage type to InnoDB

② Create stored procedure , insert data

– 2. Create stored procedure


CREATE PROCEDURE insert_user_info(IN num INT)


--  Loop for data insertion
WHILE n <= num DO
    set val = rand()*50;
    INSERT INTO userInfo(id,name,age,sex,email)values(n,concat('alex',val),rand()*50,if(val%2=0,' Woman ',' male '),concat('alex',n,''));
    set n=n+1;
end while;

END $$


③ Calling stored procedure , Insert 500 Ten thousand data

call insert_user_info(5000000);

④ This step can be ignored . Modify the engine to INNODB


⑤ Test index

· Test query speed without index

SELECT * FROM userinfo WHERE id = 4567890;
Be careful : No index case ,mysql I don't know at all id be equal to 4567890 Where are your records , You can only scan the data sheet from the beginning to the end , How many at this time
How many disk blocks do you need to do IO operation , So the query speed is very slow .

· On the premise that a large amount of data already exists in the table , Index a field segment , It's going to be slow to build

CREATE INDEX idx_id on userinfo(id);

· After the index is established , When this field is used as the query criteria , The query speed has increased significantly

select * from userinfo where id  = 4567890;

⑥ Be careful

mysql Go to the index first b+ The principle of tree search soon found id by 4567890 The data of ,IO Greatly reduced , So the speed is significantly increased We can go mysql Of data Find the table in the directory , You can see that the table takes up more disk space after adding the index If you use fields that have not been indexed for conditional queries , It's still going to be slow

4、 Use index correctly

① Adding index to database table will make query speed take off , But the premise must be the correct use of index to query , If used in the wrong way , Even indexing doesn't work . Even if you index , The index doesn't work either

·1. Range queries (>、>=、<、<=、!= 、between…and)

#1. =  Equal sign
select count(*) from userinfo where id = 1000 --  Execute index , Index efficiency is high
#2. > >= < <= between...and  Interval query
select count(*) from userinfo where id <100; --  Execute index , The smaller the range , The more efficient the index is
select count(*) from userinfo where id >100; --  Execute index , The larger the range , The less efficient the index is
select count(*) from userinfo where id between 10 and 500000; --  Execute index , The larger the range , The less efficient the index is 

#3. != It's not equal to

select count(*) from userinfo where id != 1000; – Large index range , Inefficient indexing

· ‘%xx%’

# by  name  Add index to field
create index idx_name on userinfo(name);
select count(*) from userinfo where name like '%xxxx%'; --  Full fuzzy query , Inefficient indexing
select count(*) from userinfo where name like '%xxxx';  --  What ends a fuzzy query , Inefficient indexing
# exception :  When like Using what to start with will lead to high index utilization
select * from userinfo where name like 'xxxx%';

·3. or

select count(*) from userinfo where id = 12334 or email ='xxxx'; -- email Not an index field , Index this query full table scan
# exception : When or The condition has unindexed columns to fail , The following will be indexed
select count(*) from userinfo where id = 12334 or name = 'alex3'; -- id  and  name  All are index fields , or Conditions also execute the index 

·4. Using functions

select count(*) from userinfo where reverse(name) = '5xela'; -- name Index field , When using functions , Index failure
# exception : The value corresponding to the index field can use the function , We can change the form to
select count(*) from userinfo where name = reverse('5xela');

·5. Different types

# If the column is of string type , The incoming condition is that it must be enclosed in quotation marks , Otherwise ...
select count(*) from userinfo where name = 454;
# The same type
select count(*) from userinfo where name = '454';

·6.order by

# Sort by index , be select Fields must also be index fields , Otherwise, you can't hit  
select email from userinfo ORDER BY name DESC; --  Unable to hit index
select name from userinfo ORDER BY name DESC;  --  Hit index
# Special : If you want to sort the primary key , It's still very fast :
select id from userinfo order by id desc;

5、 Composite index

① Composite index : It refers to the combination of multiple columns on a table to make an index

② The benefits of Composite Index : In short, there are two main reasons :

“ One for three ”. Built a (a,b,c) Composite index of , So actually it's equivalent to building (a),(a,b),(a,b,c) Three indexes , Because every one more index , Will increase the cost of write operations and disk space . For tables with large amounts of data , It's not a small expense ! More index columns , The less data you can filter through the index . Yes 1000W Table of data , There are the following sql:select * from table where a = 1 and b =2 and c = 3, Suppose that each condition can be screened out 10% The data of , If there is only one value index , Then we can filter out 1000W*10%=100w Data , Then return to the table from 100w Match found in data b=2 and c= 3 The data of , And then sort , Page again ; If it's a composite index , Filter out... By index 1000w *10% *10% *10%=1w, And then sort 、 Pagination , Which is more efficient , You can tell at a glance Left most matching principle : Use from left to right , If an index in the middle is not used , So the index section in front of the breakpoint works , The index after the breakpoint doesn't work ;

select * from mytable where a=3 and b=5 and c=4;

#abc All three indexes are in where It's used in the conditions , And it all worked

select * from mytable where c=4 and b=6 and a=3;

# This statement is listed just to show mysql It's not that stupid ,where The order of the conditions inside will be changed before the query mysql Automatic optimization , The effect is the same as the sentence

select * from mytable where a=3 and c=7;

#a Use index ,b of no avail , therefore c There is no index effect

select * from mytable where a=3 and b>7 and c=3;

#a Yes ,b Also used. ,c Not used , This place b It's the range value , It's also a breakpoint , It just uses the index itself

select * from mytable where b=3 and c=4;

# because a The index is not used , So here bc No indexing effect

select * from mytable where a>4 and b=7 and c=9;

#a Yes b Not used ,c Not used

select * from mytable where a=3 order by b;

#a Index is used ,b Index effect is also used in result sorting

select * from mytable where a=3 order by c;

#a Index is used , But this place c No ranking effect , Because there's a break in the middle

select * from mytable where b=3 order by a;

#b No index is used , Sorting a It doesn't have an indexing effect

6、 matters needing attention

· Avoid using select *

· Other databases use count(1) or count( Column ) Instead of count(), and mysql In the database count() After optimization , Efficiency is basically the same as the first two .

· Try to create tables when char Instead of varchar

· The order of fields in the table is fixed length fields first

· Composite index instead of multiple single column indexes ( When multiple criteria queries are often used )

· Use connections (JOIN) Instead of subquery (Sub-Queries)

· Don't have more than 4 More than table joins (JOIN)

· Prioritize connections that can significantly reduce results .

· Pay attention to the consistency of condition types when connecting tables

· Index hash values are not suitable for indexing , example : The gender is not suitable for

7、 Query plan

① Grammar format

explain + Inquire about SQL - Used for display SQL Execute message parameters , According to the reference information, we can do SQL Optimize

② Implementation plan : Give Way mysql Estimate execution actions ( Generally correct )

type : The connection type of the query plan , There are multiple parameters , First, from the best type to the worst type

performance : null > system/const > eq_ref > ref > ref_or_null > index_merge > range > index > all

slow :

explain select * from userinfo where email='alex';
type: ALL( Full table scan )
Special : select * from userinfo limit 1;

fast :

explain select * from userinfo where name='alex';
type: ref( Go to the index )

③EXPLAIN Parameters, :http://www…com/wangfengming/articles/8275448.html

8、 Slow query log

① Concept

take mysql The correlation in the server that affects the performance of the database SQL Statement to log file , Through these special SQL analysis of sentences , Improve to improve database performance .

② Slow query log parameters :

long_query_time : Set the threshold of slow query , Beyond the set value SQL That is, it is recorded in the slow query log , The default value is 10s

slow_query_log : Specify whether to turn on slow query log

log_slow_queries : Specify whether to turn on slow query log ( This parameter has been slow_query_log replace , Make compatibility reservation )

slow_query_log_file : Specify the slow log file location , Can be null , The system will give a default file host_name-slow.log

log_queries_not_using_indexes: If the value is set to ON, All queries that do not use the index will be recorded .

③ see MySQL Slow log information

#. Query slow log configuration information :

show variables like ‘%query%’;

#. Modify configuration information

set global slow_query_log = on;

④ View the status of the parameter without index :

Display parameters

show variables like ‘%log_queries_not_using_indexes’;

On state

set global log_queries_not_using_indexes = on;

⑤ See how slow logs are displayed

# How to view slow logging

show variables like ‘%log_output%’;

# Set slow log to record in both file and table

set global log_output=‘FILE,TABLE’;

⑥ Test slow query log

# Query time exceeds 10 Seconds will be recorded in the slow query log

select sleep(3) FROM user ;

# Look at the logs in the table

select * from mysql.slow_log;

9、 Big data paging optimization

① Optimization plan 1

Simple and crude , It's just that you're not allowed to view this kind of backward data

② Optimization scheme II

When querying the next page, put the line of the previous page id Pass it as a parameter to the client program , namely select * from tb1 where id>3000000 limit 10;

There's another way , such as 100 page 10 Data select * from tb1 where id>100*10 limit 10;

③ Optimization plan 3 : Delayed correlation

Let's analyze why this sentence is slow , This is where the machine is * Inside , This watch except id There must be other fields in the primary key ,

because select * therefore mysql Along the id When you leave the primary key, you need to go back to the line to get the data , Let's go and get the data , If you change the sentence to

select id from tb1 limit 3000000,10; You'll find that time is cut in half ; And then we're taking id Go and get them separately 10 That's all ;

The sentence changes to this :

select table.* from tb1 inner join ( select id from tb1 limit 3000000,10 ) as tmp on;

④ The first of these three methods is considered , Second, the second , The third is that there is no choice