SinoDB数据库中不同写法的count()的区别

  首先要弄清楚 count() 的函数的语义。count() 是一个聚合函数,对于返回的结果集,一行行地进行判断,如果 count 函数的参数不是 NULL,累计值就加 1,否则不加。最后返回累计值。所以,count(*)、count(1)和count(主键) 都表示返回满足条件的结果集的总行数;而 count(字段),则表示返回满足条件的数据行里面字段不为 NULL 的总个数。

1.测试演示

  以表fj_news为例,一共加载了15819行的数据,其中允许为null的字段title中存在3个null值。

create table fj_news 
(
    id serial not null ,
    title varchar(100) 
        default '',
    seo_title varchar(50) not null ,
    zone_id integer 
        default 0 not null ,
    primary key (id)
  ) ;

count(可空字段)

扫描全表,判断字段可空,扫描该字段所有值,判断字段是否为空,不为空则累加。

> select count(title) from fj_news;

         (count) 

           15816

1 row(s) retrieved.

查看其执行计划,可以看到对全表做了完整的顺序扫描:

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:29:45)
------
select count(title) from fj_news

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(非空字段)

扫描全表,判断字段不可空,按行累加。

> select count(seo_title) from fj_news;

         (count) 

           15819

1 row(s) retrieved.

查看其执行计划,也是顺序扫描全表:

select count(seo_title) from fj_news

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(id)

扫描全表,判断字段不可空,按行累加。

> select count(id) from fj_news;


         (count) 

           15819

1 row(s) retrieved.

查看其执行计划,执行的是索引键值扫描:

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:30:55)
------
select count(id) from fj_news

Estimated Cost: 579
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: INDEX PATH

    (1) Index Name: sinodb. 193_117
        Index Keys: id   (Key-Only)  (Serial, fragments: ALL)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   580     

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(1)

扫描全表,但不取值,默认每一行都是1,判断不可能是null,按值累加。

> select count(1) from fj_news;


         (count) 

           15819

1 row(s) retrieved.

查看其执行计划,顺序扫描整表:

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:31:24)
------
select count(1) from fj_news

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      15819     15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

count(*)

SinoDB在执行count(*)的时候,查看执行计划,可以发现优化器专门做了优化。如果不带where条件,将直接从内部系统表中返回值。有where条件的话会顺序扫描全表,但不取值,按行累加。

> select count(*) from fj_news;


      (count(*)) 

           15819

1 row(s) retrieved.

查看其执行计划,不带where条件的话,直接从内部系统表返回结果:

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:31:41)
------
select count(*) from fj_news

Estimated Cost: 1
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: INDEX PATH

    (1) Index Name: (count)
        Index Keys: (count)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         0          00:00.00

如果带where条件,则遵循条件限制:

> select count(*) from fj_news where title like '福州%';


      (count(*)) 

             954

1 row(s) retrieved.

> 

查看其执行计划,顺序扫描全表:

QUERY: (OPTIMIZATION TIMESTAMP: 07-20-2023 14:32:01)
------
select count(*) from fj_news where title like '福州%'

Estimated Cost: 1093
Estimated # of Rows Returned: 1

  1) sinodb.fj_news: SEQUENTIAL SCAN

        Filters: sinodb.fj_news.title LIKE '福州%' 


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                fj_news

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     15819      790       15819      00:00.00   1094    

  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         15819      00:00.00

2. 结论

  通过对比以上各个查询计划中的Estimated Cost值,我们有以下结论:
  count(*) 无限制条件 > count(主键) > count(1) ~= count(非空字段) ~= count(可空字段)