Optimizing MySQL performance – queries

MySQL performance optimizations can be pided into two parts. One is query optimization and the other is MySQL server configuration. To get optimum results, you have to work on both of these parts. Without proper configuration, queries will not provide consistent performance; on the other hand, without proper queries and a database structure, queries may take much longer to produce results.

In this recipe, we will learn how to evaluate query performance, set indexes, and identify the optimum database structure for our data.

Getting ready

You will need access to an admin account on the MySQL server.

You will need a large dataset to test queries. Various tools are available to generate test data. I will be using test data available at https://github.com/datacharmer/test_db.

How to do it…

Follow these steps to optimize MySQL performance:

  1. The first and most basic thing is to identify key columns and add indexes to them:
    mysql> alter table salaries add index (salary);
    
  2. Enable the slow query log to identify long-running queries. Enter the following commands from the MySQL console:
    mysql> set global log_slow_queries = 1;
    mysql> set global slow_query_log_file = ‘/var/log/mysql/slow.log’;
    
  3. Once you identify the slow and repeated query, execute that query on the database and record query timings. The following is a sample query:
    mysql> select count(*) from salaries where salary between 30000 and 65000 and from_date > ‘1986-01-01’;
    
  4. Next, use explain to view the query execution plan:
    mysql> explain select count(*) from salaries where salary between 30000 and 65000 and from_date > ‘1986-01-01’;
    
  5. Add required indexes, if any, and recheck the query execution plan. Your new index should be listed under possible_keys and key columns of explain output:
    mysql> alter table `salaries` add index ( `from_date` ) ;
    
  6. If you found that MySQL is not using a proper index or using another index than expected then you can explicitly specify the index to be used or ignored:
    mysql> select * from salaries use index (salaries) where salary between 30000 and 65000 and from_date > ‘1986-01-01’;
    mysql> select * from salaries where salary between 30000 and 65000 and from_date > ‘1986-01-01’ ignore index (from_date);
    

    Now execute the query again and check query timings for any improvements.

  7. Analyze your data and modify the table structure. The following query will show the minimum and maximum length of data in each column. Add a small amount of buffer space to the reported maximum length and reduce additional space allocation if any:
    mysql> select * from `employees` procedure analyse();
    

    The following is the partial output for the analyse() procedure:

  8. Check the database engines you are using. The two major engines available in MySQL are MyISAM and InnoDB:
    mysql> show create table employees;
    

How it works…

MySQL uses SQL to accept commands for data processing. The query contains the operation, such as select, insert, and update; the target that is a table name; and conditions to match the data. The following is an example query:

select * from employee where id = 1001;

In the preceding query, select * is the operation asking MySQL to select all data for a row. The target is the employee table, and id = 1001 is a condition part.

Once a query is received, MySQL generates query execution plan for it. This step contains various steps such as parsing, preprocessing, and optimization. In parsing and pre-processing, the query is checked for any syntactical errors and the proper order of SQL grammar. The given query can be executed in multiple ways. Query optimizer selects the best possible path for query execution. Finally, the query is executed and the execution plan is stored in the query cache for later use.

The query execution plan can be retrieved from MySQL with the help of the explain query and explain extended. Explain executes the query until the generation of the query execution plan and then returns the execution plan as a result. The execution plan contains table names used in this query, key fields used to search data, the number of rows needed to be scanned, and temporary tables and file sorting used, if any. The query execution plan shows possible keys that can be used for query execution and then shows the actual key column used. Key is a column with an index on it, which can be a primary index, unique index, or non-unique index. You can check the MySQL documentation for more details on query execution plans and explain output.

If a specific column in a table is being used repeatedly, you should consider adding a proper index to that column. Indexes group similar data together, which reduces the look up time and total number of rows to be scanned. Also keep in mind that indexes use large amounts of memory, so be selective while adding indexes.

Secondly, if you have a proper index set on a required column and the query optimization plan does not recognize or use the index, you can force MySQL to use a specific index with the USE INDEX index_name statement. To ignore a specific index, use the statement IGNORE INDEX index_name.

You may get a small improvement with table maintenance commands. Optimize table is useful when a large part of the table is modified or deleted. It reorganizes table index data on physical storage and improves I/O performance. Flush table is used to reload the internal cache. Check table and Analyze table check for table errors and data distribution respectively. The improvements with these commands may not be significant for smaller tables. Reducing the extra space allocated to each column is also a good idea for reducing total physical storage used. Reduced storage will optimize I/O performance as well as cache utilization.

You should also check the storage engines used by specific tables. The two major storage engines used in MySQL are MyISAM and InnoDB. InnnoDB provides full transactional support and uses row-level locking, whereas MyISAM does not have transaction support and uses table-level locking. MyISAM is a good choice for faster reads where you have a large amount of data with limited writes on the table. MySQL does support the addition of external storage engines in the form of plugins. One popular open source storage engine is XtraDB by Percona systems.

There’s more…

If your tables are really large, you should consider partitioning them. Partitioning tables distributes related data across multiple files on disk. Partitioning on frequently used keys can give you a quick boost. MySQL supports various different types of partitioning such as hash partitions, range partitions, list partitions, key partitions, and also sub-partitions.

You can specify hash partitioning with table creation as follows:

create table employees (
    id int not null,
    fname varchar(30),
    lname varchar(30),
    store_id int
) partition by hash(store_id) partitions 4;

Alternatively, you can also partition an existing table with the following query:

mysql> alter table employees partition by hash(store_id) partitions 4;
Sharding MySQL

You can also shard your database. Sharding is a form of horizontal partitioning where you store part of the table data across multiple instances of a table. The table instance can exist on the same server under separate databases or across different servers. Each table instance contains parts of the total data, thus improving queries that need to access limited data. Sharding enables you to scale a database horizontally across multiple servers.

The best implementation strategy for sharding is to try to avoid it for as long as possible. Sharding requires additional maintenance efforts on the operations side and the use of proxy software to hide sharding from an application, or to make your application itself sharding aware. Sharding also adds limitations on queries that require access to the entire table. You will need to create cross-server joins or process data in the application layer.

See also