前言
数据库优化,sql 优化深刻而复杂,关系到数据库的原理。
我在这里收集并分类整理下遇到的常见或者不常见的案例,借以学习理解执行计划与 Mysql/Oracle 数据库。
长期更新。
23/3/15 更新回表频繁
23/12/30 更新
24/1/31 更新 null
24/2/19 先 join 再筛选或者排序
初始化
表结构
mysql> desc ele_file;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| ID | bigint | NO | PRI | NULL | auto_increment |
| ELE_ID | bigint | YES | | NULL | |
| NAME | varchar(255) | YES | | NULL | |
| TYPE | varchar(32) | YES | MUL | NULL | |
| PATH | varchar(255) | YES | | NULL | |
| PAGE_NUM | tinyint | YES | MUL | NULL | |
| COMPLETED_FLAG | tinyint | YES | | 0 | |
| AVAILABLE_FLAG | tinyint | YES | | NULL | |
| CREATED_TIME | datetime | YES | | NULL | |
| UPDATED_TIME | datetime | YES | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)
mysql> desc element;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| ID | bigint | NO | PRI | NULL | auto_increment |
| TYPE | varchar(32) | NO | MUL | NULL | |
| FILE_DIR | varchar(255) | YES | | NULL | |
| FILE_SIZE | bigint | YES | | 0 | |
| SHARED_FLAG | tinyint | YES | | 0 | |
| PUBLISHED_FLAG | tinyint | YES | | 0 | |
| UPLOADER | varchar(128) | YES | | NULL | |
| AVAILABLE_FLAG | tinyint | YES | | 1 | |
| COMPLETED_FLAG | tinyint | YES | | 0 | |
| CREATED_TIME | datetime | YES | MUL | NULL | |
| UPDATED_TIME | datetime | YES | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
11 rows in set (0.00 sec)
索引
mysql> show index from ele_file;
+----------+------------+------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| ele_file | 0 | PRIMARY | 1 | ID | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| ele_file | 1 | type_index | 1 | TYPE | A | 0 | NULL | NULL | YES | BTREE | | | YES | NULL |
| ele_file | 1 | type_three | 1 | PAGE_NUM | A | 0 | NULL | NULL | YES | BTREE | | | YES | NULL |
| ele_file | 1 | type_three | 2 | COMPLETED_FLAG | A | 0 | NULL | NULL | YES | BTREE | | | YES | NULL |
| ele_file | 1 | type_three | 3 | AVAILABLE_FLAG | A | 0 | NULL | NULL | YES | BTREE | | | YES | NULL |
+----------+------------+------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
5 rows in set (0.00 sec)
mysql> show index from element;
+---------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+---------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| element | 0 | PRIMARY | 1 | ID | A | 543812 | NULL | NULL | | BTREE | | | YES | NULL |
| element | 1 | ele_type | 1 | TYPE | A | 9 | NULL | NULL | | BTREE | | | YES | NULL |
| element | 1 | ele_time | 1 | CREATED_TIME | A | 814 | NULL | NULL | YES | BTREE | | | YES | NULL |
| element | 1 | ele_time | 2 | FILE_SIZE | A | 543817 | NULL | NULL | YES | BTREE | | | YES | NULL |
| element | 1 | ele_time | 3 | AVAILABLE_FLAG | A | 543817 | NULL | NULL | YES | BTREE | | | YES | NULL |
+---------+------------+----------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
5 rows in set (0.01 sec)
分类及案例
没走索引
函数转换或隐式函数转换
mysql> explain select * from ele_file where type =1;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | ele_file | NULL | ALL | type_index | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
mysql> explain select * from ele_file where type ='1';
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | ele_file | NULL | ref | type_index | type_index | 131 | const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
TYPE 的类型是 varchar(32),这样 select 经过了隐式函数转换,走不了索引只能全表扫描。
可以 show warnings;
,也可以看到 Mysql 给的提示。
未符合最左匹配
运用索引字段必须要根据字段在索引中的顺序,从左到右,中间用了范围条件会阻止后续的索引的使用。
mysql> explain select * from ele_file where completed_flag=1 and available_flag =1;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | ele_file | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from ele_file where page_num=10 and completed_flag=1 and available_flag =1;
+----+-------------+----------+------------+------+---------------+------------+---------+-------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------------+---------+-------------------+------+----------+-------+
| 1 | SIMPLE | ele_file | NULL | ref | type_three | type_three | 6 | const,const,const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+------------+---------+-------------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from ele_file where page_num=10 and available_flag =1;
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | ele_file | NULL | ref | type_three | type_three | 2 | const | 1 | 100.00 | Using index condition |
+----+-------------+----------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
调整 sql,或者重建符合 sql 的索引。
从第三个例子也可以看到,只有紧临最左的列才能走索引,中间隔了一列,后面的列就不能走索引了,key_len 由 6 减少到了 2。
优化可以考虑重写 sql,或者重新组织索引。
索引分离度差
当数据库认为走索引比不走索引更慢的时候,优化器就会让 sql 全表扫描。
在错误的字段建立了索引。简单的例子:某张表对状态(status)字段作了索引,查询时不走索引,与之相对的,对 时间(created_time)作了索引,查询时走了索引。数据库作这样的决定原因很明显了,状态筛选出的数据太多了,还不如表扫,而时间是区分度很高,筛选出的数据很少,正适合用索引。
良好的标准:
- fairly selective (<4% of ORDER per distinct value)
- high cardinality (<10 rows in table per distinct value)
数据库优化为表扫
select in 后数据过多
当 in 后的数据条数过多(超过 200 条),或者 in 后数据占全表数比例较大,优化器会不走索引而是会全表扫描。如果要尝试优化,可以强行让查询走索引,用 force index 或者拆分成一条一条的查询最后 union all。
mysql> (select * from element where type ='1' order by file_size limit 10) union all (select * from element where type ='3' order by file_size limit 10) order by file_size limit 10;
10 rows in set (0.22 sec)
mysql> select * from element where type in('1','3') order by file_size limit 10;
10 rows in set (0.37 sec)
可以看到 用 union all 速度快一些。或者 force index 也可以实现强行让 in 语句走索引的效果。
mysql> explain (select * from element where type ='1' order by file_size limit 10) union all (select * from element where type ='3' order by file_size limit 10) order by file_size limit 10;
+----+--------------+------------+------------+------+---------------+----------+---------+-------+--------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+------------+------------+------+---------------+----------+---------+-------+--------+----------+---------------------------------+
| 1 | PRIMARY | element | NULL | ref | ele_type | ele_type | 130 | const | 103564 | 100.00 | Using filesort |
| 2 | UNION | element | NULL | ref | ele_type | ele_type | 130 | const | 105350 | 100.00 | Using filesort |
| NULL | UNION RESULT | <union1,2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort |
+----+--------------+------------+------------+------+---------------+----------+---------+-------+--------+----------+---------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql> explain select * from element where type in('1','3') order by file_size limit 10;
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
| 1 | SIMPLE | element | NULL | ALL | ele_type | NULL | NULL | NULL | 543817 | 38.42 | Using where; Using filesort |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)
in 的语句进行了全表扫描,而 union all 是分别走了两次索引,type 还是 ref 的,很快。
这个优化最好还是在真是的数据上自己测试一下,是否为优化。
下面是个反例,这时候全表扫描又比走索引要快了,因为筛选出的数据占全表的数量太多了。
mysql> select * from element where type in('1','3','5') order by file_size limit 10;
10 rows in set (0.29 sec)
mysql> (select * from element where type ='1' order by file_size limit 10) union all (select * from element where type ='3' order by file_size limit 10) union all (select * from element where type ='5' order by file_size limit 10) order by file_size limit 10;
10 rows in set (0.33 sec)
null 值
null 意味着它什么都不是,我们不能认为 null 等于空字符串或者任何东西,它们是完全不同的。
null 值的判定要使用 is null,= null 按照 null 的实际含义是无法理解的,任何情况下不应该使用 = null。null = null 在大部分数据库里都是 false。
查询条件中的 null 是否会走索引,这一点不同数据库也有不同。详细可以分别看执行计划。
- mysql 会走索引
- mysql 将 null 视为独立的可辨别的值,可以走索引
- innodb 中,还允许在唯一索引的字段中出现多个null值
- 任何字段 = null 的值为 null,即
select * from a where b=null
和select * from a where null
等价,也就是没有任何行会被选出。
- oracle 不会在索引中存储 null 值,因此 is null 不会走索引,is not null 是走索引的
- = null 同样不会有任何列被筛选出
- 还有奇技淫巧运用函数索引可以将 null 强行纳入索引,神奇!但这种 null 需要索引的情况下,比起运用这些技巧,为什么不直接为列设置一个默认值呢?null 值映射为一个新值,仍然会占据值域的一个其他值,这个值本身就应该被设置为默认值。Oracle is null不走索引问题
用旧表创建新表最好重建索引与表数据统计信息
当存在数据更新,最好重建索引,并补全相关的表统计数据,这些信息会影响 sql 优化器的判断,进而影响是否走索引。这可能会存在于一些复杂的存储过程中。大部分这样来源的问题,只能遇到了再详细进行分析。
这个例子来源于我们有一个特殊的服务,会每天更新表的数据,是通过 CREATE TABLE AS SELECT (CTAS),重命名来替换旧表的,新表的统计数据没有随之更新,导致一个查询不走索引,表扫,查询时间到了秒级。
回表频繁
mysql> explain select * from ele_file where page_num=10 and completed_flag=1 limit 10000,10;
+----+-------------+----------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | ele_file | NULL | ref | type_three | type_three | 4 | const,const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select t1.* from ele_file t1,(select id from ele_file where page_num=10 and completed_flag=1 limit 10000,10)t2 where t1.id=t2.id;
+----+-------------+------------+------------+------+---------------+-------------+---------+-----------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+-----------------+------+----------+-------------+
| 1 | PRIMARY | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | partridge.t1.ID | 2 | 100.00 | Using index |
| 2 | DERIVED | ele_file | NULL | ref | type_three | type_three | 4 | const,const | 1 | 100.00 | Using index |
+----+-------------+------------+------------+------+---------------+-------------+---------+-----------------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
如果直接查询,虽然也走了索引,但又回表查询出了 10000 条数据,取了最后 10 条。
我们希望 limit 这个步骤在索引处就完成,不需要回表后再 limit。因此,这也要求我们的索引全覆盖我们的查询条件及 limit 条件。
多次重复的子查询(DEPENDENT SUBQUERY)
大表,sql 中有子查询且和外部查询有关,每一行都会进行一次子查询,导致速度非常慢。这种 sql 会有极大程度的负面影响,会进行多次表扫。凭借直觉思路写出来的 where 子查询可能会出现这种情况。
优化方式就是处理子查询为 join,对 join 的结果进行筛选。
先 join 再筛选或者排序
可以先筛选后再排序,select * from (select * from a where ...) as o left join b
尽量让筛选和排序发生在 join 前,不过大部分优化器都会帮我们优化这一步。
总结
总之多尝试,多查询,并一定要以生产数据实际的查询表现为准,用实际数据测试。
数据库的原理的逻辑与内容太多了,难以只通过使用时的表现来把握。
reference
Do I have to drop and rebuild an index when I am doing a CTAS with rename
文档信息
- 本文作者:nyaaar
- 本文链接:https://nyaaarlathotep.github.io/2023/02/19/%E6%85%A2-sql-%E6%A1%88%E4%BE%8B%E6%80%BB%E7%BB%93/
- 版权声明:自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)