mysql笔记 - SELECT 语句

453 查看

mysql笔记 - SELECT 语句

 SELECT
     [ALL | DISTINCT | DISTINCTROW ] *
       [HIGH_PRIORITY]
       [STRAIGHT_JOIN]
       [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
       [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
     select_expr [, select_expr ...]
     [FROM table_references
     [WHERE where_condition]
     [GROUP BY {col_name | expr | position}
       [ASC | DESC], ... [WITH ROLLUP]]
     [HAVING where_condition]
     [ORDER BY {col_name | expr | position}
       [ASC | DESC], ...]
     [LIMIT {[offset,] row_count | row_count OFFSET offset}]
     [PROCEDURE procedure_name(argument_list)]
     [INTO OUTFILE 'file_name'
         [CHARACTER SET charset_name]
         export_options
       | INTO DUMPFILE 'file_name'
       | INTO var_name [, var_name]]
     [FOR UPDATE | LOCK IN SHARE MODE]]

 select * from employees; -- 取出所有数据
 desc employees; -- 查看表信息

通常会要求开发人员不要使用select * 即使是select所有字段推荐写上所有的列名,虽然很麻烦但是规范上推荐这样做

一个重要的原因是alter table会对表结构进行修改的很多操作,alter table后select * from取出的字段可能比一开始涉及的要多,可能会遇到各种各样的问题。另外select alter table可以加列,可以加在某个字段中间的位置,那么程序那边取得的数据并不是一开始想要的数据,容易出错。所以建议select的时候建议把所有的列都带上

LIMIT 限制取出来的数据量

不加LIMIT限制会一次取出所有的数据,数据量大的时候可能是灾难性的事情,通常都会加一个LIMIT,图形化工具一般默认会加上一个LIMIT

     select * from employees limit 0,3; -- 从第0条开始取出3条数据
     select * from employees limit 1,3; -- 从第1条开始取出3条数据

LIMIT常用于分页,但是LIMIT offset数字越大性能越差,为什么呢?

 select * from employees limit 30; -- 取30条,扫30条,查询快(0.00 sec)
 select * from employees limit 1000000,30; -- 同样取30条,扫1000030行记录,查询变很慢(14.55 sec)

好的做法

 select emp_no,birth_date,first_name,last_name,gender from employees limit 10;
 +--------+------------+------------+-----------+--------+
 | emp_no | birth_date | first_name | last_name | gender |
 +--------+------------+------------+-----------+--------+
 |  10001 | 1953-09-02 | Georgi     | Facello   | M      |
 |  10002 | 1964-06-02 | Bezalel    | Simmel    | F      |
 |  10003 | 1959-12-03 | Parto      | Bamford   | M      |
 |  10004 | 1954-05-01 | Chirstian  | Koblick   | M      |
 |  10005 | 1955-01-21 | Kyoichi    | Maliniak  | M      |
 |  10006 | 1953-04-20 | Anneke     | Preusig   | F      |
 |  10007 | 1957-05-23 | Tzvetan    | Zielinski | F      |
 |  10008 | 1958-02-19 | Saniya     | Kalloufi  | M      |
 |  10009 | 1952-04-19 | Sumant     | Peac      | F      |
 |  10010 | 1963-06-01 | Duangkaew  | Piveteau  | F      |
 +--------+------------+------------+-----------+--------+
 
 select emp_no,birth_date,first_name,last_name,gender from employees where emp_no > 10010 limit 10;
 +--------+------------+------------+-------------+--------+
 | emp_no | birth_date | first_name | last_name   | gender |
 +--------+------------+------------+-------------+--------+
 |  10011 | 1953-11-07 | Mary       | Sluis       | F      |
 |  10012 | 1960-10-04 | Patricio   | Bridgland   | M      |
 |  10013 | 1963-06-07 | Eberhardt  | Terkki      | M      |
 |  10014 | 1956-02-12 | Berni      | Genin       | M      |
 |  10015 | 1959-08-19 | Guoxiang   | Nooteboom   | M      |
 |  10016 | 1961-05-02 | Kazuhito   | Cappelletti | M      |
 |  10017 | 1958-07-06 | Cristinel  | Bouloucos   | F      |
 |  10018 | 1954-06-19 | Kazuhide   | Peha        | F      |
 |  10019 | 1953-01-23 | Lillian    | Haddadi     | M      |
 |  10020 | 1952-12-24 | Mayuko     | Warwick     | M      |
 +--------+------------+------------+-------------+--------+

从之前取得数据的最大值开始取数据,这样的一个好处是不管你的值多大,因为是通过大于等于来定位的,所以只是取了10条数据不会再去扫这么多的数据

下面的语句表示从mysql中随机取出3条数据,select语句不带order by表示随机取3条数据

 select emp_no,first_name,last_name from employees limit 3;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  10001 | Georgi     | Facello   |
 |  10002 | Bezalel    | Simmel    |
 |  10003 | Parto      | Bamford   |
 +--------+------------+-----------+

数据量小的时候不容易看出来,数据量大的时候容易发现这个问题,或者alter table一下后再试

 alter table employees add index idx_name(first_name,last_name);
 select emp_no,first_name,last_name from employees limit 3;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  69256 | Aamer      | Anger     |
 | 486584 | Aamer      | Armand    |
 | 237165 | Aamer      | Azevdeo   |
 +--------+------------+-----------+

order by 表示根据哪个字段进行排序,可以按照物理数据顺序取出数据

 select emp_no,first_name,last_name from employees order by emp_no limit 3;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  10001 | Georgi     | Facello   |
 |  10002 | Bezalel    | Simmel    |
 |  10003 | Parto      | Bamford   |
 +--------+------------+-----------+

查询分区表

 select * from titles partition (p19) limit 1;
 +--------+--------------+------------+------------+
 | emp_no | title        | from_date  | to_date    |
 +--------+--------------+------------+------------+
 |  10052 | Senior Staff | 2002-01-31 | 9999-01-01 |
 +--------+--------------+------------+------------+

ORDER BY 对取出来的数据做排序

order by根据某个字段进行分页

 select emp_no,first_name,last_name from employees order by last_name limit 10;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  11761 | Bartek     | Aamodt    |
 |  15427 | Aluzio     | Aamodt    |
 |  18182 | Dekang     | Aamodt    |
 |  16572 | Matt       | Aamodt    |
 |  12791 | Mokhtar    | Aamodt    |
 |  12516 | Sreenivas  | Aamodt    |
 |  12982 | Sachem     | Aamodt    |
 |  17400 | Basim      | Aamodt    |
 |  19898 | Vidar      | Aamodt    |
 |  17885 | Takanari   | Aamodt    |
 +--------+------------+-----------+

WHERE 过滤数据

 select emp_no,first_name,last_name from employees where emp_no = 10001;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  10001 | Georgi     | Facello   |
 +--------+------------+-----------+
 
 select emp_no,first_name,last_name from employees where emp_no >= 10001 limit 10;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  10001 | Georgi     | Facello   |
 |  10002 | Bezalel    | Simmel    |
 |  10003 | Parto      | Bamford   |
 |  10004 | Chirstian  | Koblick   |
 |  10005 | Kyoichi    | Maliniak  |
 |  10006 | Anneke     | Preusig   |
 |  10007 | Tzvetan    | Zielinski |
 |  10008 | Saniya     | Kalloufi  |
 |  10009 | Sumant     | Peac      |
 |  10010 | Duangkaew  | Piveteau  |
 +--------+------------+-----------+
 
 select emp_no,first_name,last_name from employees where emp_no >= 20001 and first_name like 'Am%' limit 10;
 +--------+------------+---------------+
 | emp_no | first_name | last_name     |
 +--------+------------+---------------+
 | 100860 | Amabile    | Aamodt        |
 | 285669 | Amabile    | Akiyama       |
 | 276002 | Amabile    | Albarhamtoshy |
 | 454340 | Amabile    | Alencar       |
 |  86625 | Amabile    | Anger         |
 | 416143 | Amabile    | Antonisse     |
 | 491486 | Amabile    | Antonisse     |
 | 451988 | Amabile    | Apsitis       |
 | 409363 | Amabile    | Atchley       |
 | 208844 | Amabile    | Baar          |
 +--------+------------+---------------+
 
 select emp_no,first_name,last_name from employees where emp_no >= 20001 and first_name like '%Am%' limit 10;
 +--------+------------+--------------+
 | emp_no | first_name | last_name    |
 +--------+------------+--------------+
 |  20044 | Kiam       | Gist         |
 |  20062 | Uinam      | Heuser       |
 |  20114 | Ramalingam | Zyda         |
 |  20118 | Mohammed   | Schneeberger |
 |  20142 | Arumugam   | Emmart       |
 |  20159 | Isamu      | Valiente     |
 |  20167 | Stamatina  | Kobara       |
 |  20217 | Tamiya     | Ruemmler     |
 |  20265 | Amalendu   | Willoner     |
 |  20285 | Pramod     | Escriba      |
 +--------+------------+--------------+
 
 select emp_no,first_name,last_name from employees where emp_no >= 20001 or first_name like '%Am%' limit 10;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  69256 | Aamer      | Anger     |
 | 486584 | Aamer      | Armand    |
 | 237165 | Aamer      | Azevdeo   |
 | 413688 | Aamer      | Azuma     |
 | 281363 | Aamer      | Baak      |
 | 242368 | Aamer      | Baaleh    |
 | 206549 | Aamer      | Baar      |
 | 259089 | Aamer      | Baba      |
 |  60922 | Aamer      | Bahl      |
 | 283280 | Aamer      | Bahl      |
 +--------+------------+-----------+

where 1 = 1表示没有任何条件全部成立,因为不知道第一个where应该怎么写,如果提前把where 1 = 1写好,这样程序那边拼接后面的条件就比较方便

 select emp_no,first_name,last_name from employees where 1 = 1 limit 10;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  69256 | Aamer      | Anger     |
 | 486584 | Aamer      | Armand    |
 | 237165 | Aamer      | Azevdeo   |
 | 413688 | Aamer      | Azuma     |
 | 281363 | Aamer      | Baak      |
 | 242368 | Aamer      | Baaleh    |
 | 206549 | Aamer      | Baar      |
 | 259089 | Aamer      | Baba      |
 |  60922 | Aamer      | Bahl      |
 | 283280 | Aamer      | Bahl      |
 +--------+------------+-----------+

比如

 select emp_no,first_name,last_name from employees where 1 = 1 and emp_no = 20000;
 +--------+------------+-----------+
 | emp_no | first_name | last_name |
 +--------+------------+-----------+
 |  20000 | Jenwei     | Matzke    |
 +--------+------------+-----------+

GROUP BY

分组就是根据某个字段当中相同的值来进行分组,并且分组完之后是要加一个聚集(统计)函数

 use test;
 create table t (a int auto_increment primary key, b int, c int);
 insert into t select NULL,1,1;
 insert into t select NULL,1,7;
 insert into t select NULL,1,8;
 insert into t select NULL,2,8;
 insert into t select NULL,2,10;
 insert into t select NULL,3,8;
 select * from t;
 +---+------+------+
 | a | b    | c    |
 +---+------+------+
 | 1 |    1 |    1 |
 | 2 |    1 |    7 |
 | 3 |    1 |    8 |
 | 4 |    2 |    8 |
 | 5 |    2 |   10 |
 | 6 |    3 |    8 |
 +---+------+------+
 
 select b,sum(b),avg(b) from t group by b;
 +------+--------+--------+
 | b    | sum(b) | avg(b) |
 +------+--------+--------+
 |    1 |      3 | 1.0000 |
 |    2 |      4 | 2.0000 |
 |    3 |      3 | 3.0000 |
 +------+--------+--------+

HAVING

HAVING表示对group by中的聚合函数进行过滤

 select b,sum(b),avg(b) from t where avb(b)>2 group by b; -- 这样是会报错的
 ERROR 1305 (42000): FUNCTION test.avb does not exist

分组中需要对分组的条件进行过滤需要使用HAVING

 select b,sum(b),avg(b) from t group by b having avg(b) > 2;
 +------+--------+--------+
 | b    | sum(b) | avg(b) |
 +------+--------+--------+
 |    3 |      3 | 3.0000 |
 +------+--------+--------+

不是说使用了group by就代表不能使用where了,这样也是可以使用where进行过滤的,只不过where过滤的是非聚合的结果(对查询的记录进行过滤),而having是用来过滤聚合的结果

 select b,sum(b),avg(b) from t where b < 3 group by b;
 +------+--------+--------+
 | b    | sum(b) | avg(b) |
 +------+--------+--------+
 |    1 |      3 | 1.0000 |
 |    2 |      4 | 2.0000 |
 +------+--------+--------+

双重过滤

 select b,sum(b),avg(b) from t where b < 3 group by b having avg(b) < 2;
 +------+--------+--------+
 | b    | sum(b) | avg(b) |
 +------+--------+--------+
 |    1 |      3 | 1.0000 |
 +------+--------+--------+

所以where是一开始选数据的时候就开始过滤了,having是在聚集结果出来之后才开始过滤

having也可以过滤指定的一列而不是聚集函数,但是这样写没有什么意义,使用where可以获得更好的性能,所以having一般都跟聚集函数搭配使用

 select b,sum(b),avg(b) from t where b < 3 group by b having b < 2;
 +------+--------+--------+
 | b    | sum(b) | avg(b) |
 +------+--------+--------+
 |    1 |      3 | 1.0000 |
 +------+--------+--------+

select字段中没有选择的字段,having中使用会报错

 select b,sum(b),avg(b) from t where b < 3 group by b having a < 2;
 ERROR 1054 (42S22): Unknown column 'a' in 'having clause'

这样的语句没有意义,因为取出来的a字段数据不一定是146,mysql会随机取a字段的数据,而且这种写法在orcal中好像不支持

 select a,b,sum(b) from t group by b;
 +---+------+--------+
 | a | b    | sum(b) |
 +---+------+--------+
 | 1 |    1 |      3 |
 | 4 |    2 |      4 |
 | 6 |    3 |      3 |
 +---+------+--------+

JOIN

基本的多表关联查询

 SELECT
     concat(first_name,' ',last_name), dept_name -- concat连接了2个字段的字符串
 FROM
     employees e, -- e是别名,下面3个雷同
     dept_emp de,
     departments d
 WHERE
     e.emp_no = de.emp_no -- 关联条件
         AND de.dept_no = d.dept_no -- 再一次关联条件
 LIMIT 3;
 +----------------------------------+------------------+
 | concat(first_name,' ',last_name) | dept_name        |
 +----------------------------------+------------------+
 | Mary Sluis                       | Customer Service |
 | Huan Lortz                       | Customer Service |
 | Basil Tramer                     | Customer Service |
 +----------------------------------+------------------+

新建一些例子

 create table a (a int);
 create table b (a int);
 insert into a select 1;
 insert into a select 2;
 insert into a select 3;
 
 insert into b select 1;
 insert into b select 2;
 select * from a;select * from b;
 +------+
 | a    |
 +------+
 |    1 |
 |    2 |
 |    3 |
 +------+
 3 rows in set (0.00 sec)
 
 +------+
 | a    |
 +------+
 |    1 |
 |    2 |
 +------+
 2 rows in set (0.03 sec)

最简单的两表关联,并不是2张表关联就会做笛卡尔积,如果关联的时候没有写where关联条件就会产生笛卡尔积

 select * from a,b where a.a = b.a;
 +------+------+
 | a    | a    |
 +------+------+
 |    1 |    1 |
 |    2 |    2 |
 +------+------+
 
 select * from a,b; -- 这样会产生笛卡尔积
 +------+------+
 | a    | a    |
 +------+------+
 |    1 |    1 |
 |    1 |    2 |
 |    2 |    1 |
 |    2 |    2 |
 |    3 |    1 |
 |    3 |    2 |
 +------+------+

三张表只有1个过滤条件会出现图示效果,会产生笛卡尔积的效果,出现了4条记录

 create table c (a int);
 insert into c select 2;
 insert into c select 3;
 select * from a,b,c where a.a = b.a; -- 这样会产生笛卡尔积
 +------+------+------+
 | a    | a    | a    |
 +------+------+------+
 |    1 |    1 |    2 |
 |    2 |    2 |    2 |
 |    1 |    1 |    3 |
 |    2 |    2 |    3 |
 +------+------+------+

还可以这样写,关联条件可以非等值

 select * from a,b where a.a < b.a;
 +------+------+
 | a    | a    |
 +------+------+
 |    1 |    2 |
 +------+------+

修改一下字段名先。。。

 alter table a change a x int;
 alter table b change a y int;
 alter table c change a z int;

INNER JOIN

这3个写法没有区别

 select * from a inner join b on a.x = b.y;
 select * from a join b on a.x = b.y;
 select * from a,b where a.x = b.y;
 -- 都返回
 +------+------+
 | x    | y    |
 +------+------+
 |    1 |    1 |
 |    2 |    2 |
 +------+------+

以下语句的关联,通常来说会选择关联过滤度高的条件进行关联,这里b.y > 1的关联度会高一些。

 select * from a,b where a.x = b.y and b.y > 1;
 +------+------+
 | x    | y    |
 +------+------+
 |    2 |    2 |
 +------+------+

假设a表100w记录b表100w记录,其中的x,y一一对应,优化机通畅都会先从b>100 and b<200条件先进行过滤,这样就只需要100w100条记录进行join就可以了,如果先从a.x = b.y进行过滤,那就是100w100w的记录进行join,所以一般优化机是会把b>100 and b<200的过滤条件设置得更高

INNER JOIN的时候过滤条件即可写在on里面也可以写在where条件里面,对结果来说是没有区别的

 select * from a inner join b on (a.x = b.y and b.y > 1);
 select * from a inner join b on a.x = b.y where b.y > 1;
 都返回
 +------+------+
 | x    | y    |
 +------+------+
 |    2 |    2 |
 +------+------+

OUTER JOIN

outer join 分成left/right两种,outer可以关键词可以省略,left/right join代表左/右外连接

左连接中左边的表为保留表,保留表中所有的字段都是要出现的,如果关联条件存在的话就是一对一的情况,右表中关联条件不存在则关联出来的结果就是NULL值

 select * from a left join b on a.x = b.y;
 +------+------+
 | x    | y    |
 +------+------+
 |    1 |    1 |
 |    2 |    2 |
 |    3 | NULL |
 +------+------+

 select * from a right join b on a.x = b.y;
 +------+------+
 | x    | y    |
 +------+------+
 |    1 |    1 |
 |    2 |    2 |
 +------+------+

a表中存在的但是b表中不存在的数据,左连接实现

 select * from a left join b on a.x = b.y where b.y is NULL;
 +------+------+
 | x    | y    |
 +------+------+
 |    3 | NULL |
 +------+------+

a表中但是不在b表中,不使用left join 使用not in + 子查询,不过会存在一些性能上的问题

 select * from a where a.x not in (select y from b);
 +------+
 | x    |
 +------+
 |    3 |
 +------+

这样的查询b.y会全部返回NULL,因为a.x = b.y and b.y is NULL不成立,所以b.y全部用NULL填充

 select * from a left join b on a.x = b.y and b.y is NULL;
 +------+------+
 | x    | y    |
 +------+------+
 |    1 | NULL |
 |    2 | NULL |
 |    3 | NULL |
 +------+------+

过滤条件写在on后还是where后是有讲究的,推荐使用on来进行两张表之间的关联,where用来进行数据的过滤。

外连接中where不能写在on前面,内连接则可以,left join是用到比较多的一个方式

多次left join

 select * from a left join b on a.x = b.y left join c on a.x = c.z;
 +------+------+------+
 | x    | y    | z    |
 +------+------+------+
 |    2 |    2 |    2 |
 |    1 |    1 | NULL |
 |    3 | NULL |    3 |
 +------+------+------+

 select * from a left join b on a.x = b.y left join c on b.y = c.z order by a.x;
 +------+------+------+
 | x    | y    | z    |
 +------+------+------+
 |    1 |    1 | NULL |
 |    2 |    2 |    2 |
 |    3 | NULL | NULL |
 +------+------+------+

第二次join使用等值连接尝试

 select * from a left join b on a.x = b.y inner join c on b.y = c.z order by a.x;
 +------+------+------+
 | x    | y    | z    |
 +------+------+------+
 |    2 |    2 |    2 |
 +------+------+------+

一些例子

查询不是经理的员工

 SELECT 
     e.emp_no
 FROM
     employees e
         LEFT JOIN
     dept_manager dm ON e.emp_no = dm.emp_no
 WHERE
     dm.emp_no IS NULL
 LIMIT 3;

分页技巧

使用where而不是使用limit 10000,10 因为limit会扫10000前面的1w条数据

 select * from employees where emp_no > 10000 limit 10;

根据生日分页

 alter table employees add index idx_birth(birth_date,emp_no); -- 建立联合索引
 
 select * from employees where birth_date > '1952-02-02' order by birth_date limit 10;
 +--------+------------+------------+-----------+--------+------------+
 | emp_no | birth_date | first_name | last_name | gender | hire_date  |
 +--------+------------+------------+-----------+--------+------------+
 |  16093 | 1952-02-03 | Luise      | Tramer    | M      | 1992-02-28 |
 |  16447 | 1952-02-03 | Zhiguo     | Savasere  | F      | 1987-04-28 |
 |  23857 | 1952-02-03 | Kristen    | Frijda    | M      | 1986-05-31 |
 |  27259 | 1952-02-03 | Michaela   | Lipner    | M      | 1997-09-05 |
 |  32094 | 1952-02-03 | Mohd       | Buchter   | F      | 1986-03-16 |
 |  32641 | 1952-02-03 | Tua        | Tetzlaff  | F      | 1988-12-01 |
 |  41374 | 1952-02-03 | JiYoung    | Schurmann | M      | 1988-01-25 |
 |  43737 | 1952-02-03 | Debatosh   | Beerel    | F      | 1987-12-07 |
 |  56225 | 1952-02-03 | Miquel     | Rusmann   | F      | 1990-06-17 |
 |  59577 | 1952-02-03 | Adel       | Swiler    | F      | 1986-11-27 |
 +--------+------------+------------+-----------+--------+------------+
 -- 错误的结果,日期直接跳到了02-03
 
 select * from employees where (birth_date,emp_no) > ('1952-02-02', 10010) order by birth_date limit 10;
 +--------+------------+------------+-------------+--------+------------+
 | emp_no | birth_date | first_name | last_name   | gender | hire_date  |
 +--------+------------+------------+-------------+--------+------------+
 |  12282 | 1952-02-02 | Tadahiro   | Delgrange   | M      | 1997-01-09 |
 |  13944 | 1952-02-02 | Takahito   | Maierhofer  | M      | 1989-01-18 |
 |  22614 | 1952-02-02 | Dung       | Madeira     | M      | 1989-01-24 |
 |  29456 | 1952-02-02 | Barun      | Krohm       | F      | 1992-11-23 |
 |  33131 | 1952-02-02 | Reinhold   | Savasere    | M      | 1998-01-30 |
 |  40660 | 1952-02-02 | Piyush     | Erbe        | F      | 1988-04-04 |
 |  48910 | 1952-02-02 | Zhongwei   | DuBourdieux | M      | 1999-12-19 |
 |  51486 | 1952-02-02 | Jianwen    | Sigstam     | F      | 1989-07-20 |
 |  59884 | 1952-02-02 | Fan        | Przulj      | M      | 1991-09-25 |
 |  61382 | 1952-02-02 | Kristof    | Ranft       | M      | 1989-04-21 |
 +--------+------------+------------+-------------+--------+------------+
 -- 正确的结果

 select o_orderkey,o_orderstatus,o_totalprice from orders where(o_orderdate,o_orderkey) > ('1992-01-01',88199)  order by o_orderdate limit 10;
 select o_orderkey,o_orderstatus,o_totalprice from orders order by o_orderdate limit 10,10;
 -- 结果一致

当然分页推荐使用redis来做,用数据库实现其实意义不大