SinoDB数据库中不同写法的count()的区别

roamer · 2023 年7 月 20 日 03:53

首先要弄清楚 count() 的函数的语义。count() 是一个聚合函数，对于返回的结果集，一行行地进行判断，如果 count 函数的参数不是 NULL，累计值就加 1，否则不加。最后返回累计值。所以，count(*)、count(1)和count(主键) 都表示返回满足条件的结果集的总行数；而 count(字段），则表示返回满足条件的数据行里面字段不为 NULL 的总个数。

1.测试演示

以表fj_news为例，一共加载了15819行的数据，其中允许为null的字段title中存在3个null值。

create table fj_news 
(
    id serial not null ,
    title varchar(100) 
        default '',
    seo_title varchar(50) not null ,
    zone_id integer 
        default 0 not null ,
    primary key (id)
  ) ;

count(可空字段)

扫描全表，判断字段可空，扫描该字段所有值，判断字段是否为空，不为空则累加。

> select count(title) from fj_news;

         (count) 

           15816

1 row(s) retrieved.

查看其执行计划，可以看到对全表做了完整的顺序扫描：

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:29:45)
------
select count(title) from fj_news

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(非空字段)

扫描全表，判断字段不可空，按行累加。

> select count(seo_title) from fj_news;

         (count) 

           15819

1 row(s) retrieved.

查看其执行计划，也是顺序扫描全表：

select count(seo_title) from fj_news

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(id)

扫描全表，判断字段不可空，按行累加。

> select count(id) from fj_news;


         (count) 

           15819

1 row(s) retrieved.

查看其执行计划，执行的是索引键值扫描：

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:30:55)
------
select count(id) from fj_news

Estimated Cost: 579
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: INDEX PATH

    (1) Index Name: sinodb. 193_117
        Index Keys: id   (Key-Only)  (Serial, fragments: ALL)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   580     

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(1)

扫描全表，但不取值，默认每一行都是1，判断不可能是null，按值累加。

> select count(1) from fj_news;


         (count) 

           15819

1 row(s) retrieved.

查看其执行计划，顺序扫描整表：

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:31:24)
------
select count(1) from fj_news

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(*)

SinoDB在执行count(*)的时候，查看执行计划，可以发现优化器专门做了优化。如果不带where条件，将直接从内部系统表中返回值。有where条件的话会顺序扫描全表，但不取值，按行累加。

> select count(*) from fj_news;


      (count(*)) 

           15819

1 row(s) retrieved.

查看其执行计划，不带where条件的话，直接从内部系统表返回结果：

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:31:41)
------
select count(*) from fj_news

Estimated Cost: 1
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: INDEX PATH

    (1) Index Name: (count)
        Index Keys: (count)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         0          00:00.00

如果带where条件，则遵循条件限制：

> select count(*) from fj_news where title like '福州%';


      (count(*)) 

             954

1 row(s) retrieved.

>

查看其执行计划，顺序扫描全表：

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:32:01)
------
select count(*) from fj_news where title like '福州%'

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN

        Filters: sinodb.fj_news.title LIKE '福州%' 


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      790       15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

2. 结论

通过对比以上各个查询计划中的Estimated Cost值，我们有以下结论：
count(*) 无限制条件 > count(主键) > count(1) ~= count(非空字段) ~= count(可空字段)