One little known fact about MySQL Indexing, however very important for successfull MySQL Performance Optimization is understanding when exactly MySQL is going to use index and how it is going to do them.So if you have table people with KEY(age,zip) and you will run query something like
SELECT name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347) do you think it will use index effectively ? In theory it could – it could look at each of the ages from the range and look at all zip codes supplied. In practice – it will not:mysql> explain SELECT name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | people | range | age | age | 4 | NULL | 90556 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
As you see instead only first index keypart is used (key_len is 4) and zip part where clause is applied after rows are retrived. Notice Using Where. There are even more bad news. Full rows will need to be read to check if zip is in the list, while it could be done only by reading data from the index. MySQL can ether read index only for all rows, in this case you will see “Using Index” in EXPLAIN output or it will read row data for all rows – it can’t read Index and perform row read only if it needs to be done at this point.
So MySQL Will not use indexes in all cases when it is technically possible. For multiple key part indexes MySQL will only be able to use multiple keyparts if first keyparts matched with “=”. Here is example:mysql> explain SELECT name FROM people WHERE age=18 AND zip IN (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | people | range | age | age | 4 | NULL | 3 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
Note number of rows has decreased from 90556 to 3, whle “key_len” remains the same. This however looks like a bug in the MySQL 5.0.18 I’m using for this demo. It should have had increased to 8.Lets see how query times differ in these cases:mysql> SELECT sql_no_cache name FROM people WHERE age=19 AND zip IN (12345,12346, 12347);
+----------------------------------+
| name |
+----------------------------------+
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
1 row in set (0.06 sec)
mysql> SELECT SQL_NO_CACHE name FROM people WHERE age BETWEEN 18 AND 22 AND zip IN (12345,12346, 12347);
+----------------------------------+
| name |
+----------------------------------+
| ed4481336eb9adca222fd404fa15658e |
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
2 rows in set (1 min 56.09 sec)
As you see difference is tremendous. And it is not what you would intuitively expect – why range which covers 5 rows is hundreds of times slower than single row ? If MySQL Optimizer would handle this case right it would not be but in this case we only can give a hand to MySQL Optimizer and change the query so it can handle it well…. use UNION:mysql> SELECT name FROM people WHERE age=18 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=19 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=20 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=21 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=22 AND zip IN (12345,12346, 12347);
+----------------------------------+
| name |
+----------------------------------+
| ed4481336eb9adca222fd404fa15658e |
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
2 rows in set (0.09 sec)
Ethen though this query looks much more complicated MySQL is able to execute it much faster, delivering us expected performance.You can also use this approach when first key column is not in where clause at all if it has just few values. For example if we would have gender instead of age with just two possible values it would be faster to run such query with union. I bet it would even be so with age even if it would take some 100 queries in the union to do so.This strategy is best applied if no others work well. Ie if there are range on both keyparts and none of them is selective enough by itself. For example if we would like to only lookup people within single zip I would advice to use index in (zip,age) instead of using this workaround.And… yes this example is a bit artificial. You would probably use date (or at least year) or birth instead of age, and put zip as first column in the index as it is more selective but it is good enough for illustrative purposes 
SELECT name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347) do you think it will use index effectively ? In theory it could – it could look at each of the ages from the range and look at all zip codes supplied. In practice – it will not:
MySQL: Followup on UNION for query optimization, Query profiling
First I should mention double IN also works same way so you do not have to use the union. So changing query to:
mysql> SELECT sql_no_cache name FROM people WHERE age in(18,19,20) AND zip IN (12345,12346, 12347); +----------------------------------+ | name | +----------------------------------+ | ed4481336eb9adca222fd404fa15658e | | 888ba838661aff00bbbce114a2a22423 | +----------------------------------+ 2 rows in set (0.00 sec) mysql> explain SELECT sql_no_cache name FROM people WHERE age in(18,19,20) AND zip IN (12345,12346, 12347); +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | people | range | age | age | 4 | NULL | 9 | Using where | +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec)
So as you see there are really different types of ranges in MySQL. IN range allows to optimize lookups on the second key part, while BETWEEN and other ranges do not. Using same access type in EXPLAIN makes it very confusing.
I also was wrong about bug in key length in 5.0 explain. Actually I used tinyint for age and mediumint for zip which makes 4 right answer for using full key.
Be careful however with these nested IN clauses. MySQL has to internally build all possible combinations for row retrieval which ma become very slow if IN lists are large. Take 3 IN lists 1000 values each, on appropriate 3 keyparts and you may finish your lunch before query completes even if table has just couple of rows.
Let me however show how you can profile queries to see what exactly happens during query execution – very helpful for MySQL Performance optimization:
mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> SELECT sql_no_cache name FROM people WHERE age BETWEEN 18 and 20 AND zip IN (12345,12346, 12347); +----------------------------------+ | name | +----------------------------------+ | ed4481336eb9adca222fd404fa15658e | | 888ba838661aff00bbbce114a2a22423 | +----------------------------------+ 2 rows in set (0.39 sec) mysql> show status like "Handler%"; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 1 | | Handler_read_next | 42250 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 0 | | Handler_write | 14 | +----------------------------+-------+ 15 rows in set (0.00 sec)
So you can do FLUSH STATUS to reset counters run the query (assiming your system does not do anything) and run SHOW STATUS to see how counters have changed. It was quite inconvenient you could only do it on idle box so as in MySQL 5.0 you do not have to any more. SHOW STATUS now will show per session counter increments and to get global counters SHOW GLOBAL STATUS needs to be used.
Let us look at this handler statistic – we can see Handler_read_key=1 – this means one index range scan was initiated. Handler_read_next=42250 means 42250 rows were analyzed during this scan. Basically MySQL started scanning Index with age>=18 and continue scanning as soon as it met something larger than 20.
Now let’s see what UNION can handle what IN can’t:
Lets say we want to show people in appropriate age group sorting by time when they were last online. If age is fixed this works great and it is efficient, however if we have multiple ages to deal with ether as BETWEEN range or as IN filesort appears and query becomes very slow:
mysql> explain select * from people where age=18 order by last_online desc limit 10; +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+ | 1 | SIMPLE | people | ref | age | age | 1 | const | 12543 | Using where | +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+ 1 row in set (0.00 sec) mysql> explain select * from people where age in(18,19,20) order by last_online desc limit 10; +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+ | 1 | SIMPLE | people | range | age | age | 1 | NULL | 37915 | Using where; Using filesort | +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+ 1 row in set (0.00 sec)
We can however use UNION to avoid filesort of full table:
mysql> explain (select * from people where age=18 order by last_online desc limit 10) UNION ALL (select * from people where age=19 order by last_online desc limit 10) UNION ALL (select * from people where age=20 order by last_online desc limit 10) ORDER BY last_online desc limit 10; +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+ | 1 | PRIMARY | people | ref | age | age | 1 | const | 12543 | Using where | | 2 | UNION | people | ref | age | age | 1 | const | 12741 | Using where | | 3 | UNION | people | ref | age | age | 1 | const | 12631 | Using where | |NULL | UNION RESULT || ALL | NULL | NULL | NULL | NULL | NULL | Using filesort | +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+ 4 rows in set (0.01 sec)
In this case there is also filesort but it applied only to very small table which is result of union, so it is rather fast. from
0 Comments:
Post a Comment