模拟出现skip duplicate index scan的场景

Sodakold · 2023 年5 月 17 日 03:35

测试模拟出现Skip-duplicate-index scan的场景。

Skip-duplicate-index scan路径访问：
为索引扫描的其中一种，一般在这种情况下，系统会自动检测到索引中存在的重复值，并跳跃重复扫描这些索引行为，这样就可以大量减少因join带来多表关联的IO，提升了性能。

环境：SinoDB12.10 FC7 on Linux x64
数据库testdb: no logging mode, dbs_collate en_US.819
表a：
Create table t_staff
(
seqid bigint,
gender varchar(8,0),
deptno integer,
logtime datetime year to fraction(5),
keypoint decimal(16,0)
);
create index idx_staff_deptno on t_staff (deptno);

对表t_staff插入deptno值为100，101，102…109各5000条，总共50000条数据。分布情况：

select deptno,count() from t_staff group by deptno;
deptno (count())
100 5000
101 5000
102 5000
103 5000
104 5000
105 5000
106 5000
107 5000
108 5000
109 5000
10 row(s) retrieved.

表b:

create table t_dept as select * from t_staff where seqid>30000;
4135 row(s) retrieved into table.

从表a获取4000条左右记录插入到t_dept表。

创建列deptno的索引：
> create index idx_dept_deptno on t_dept (deptno);

b表数据分布情况：

select deptno,count() from t_dept group by deptno;
deptno (count())
100 382
101 381
102 453
103 437
104 406
105 431
106 432
107 392
108 423
109 398

10 row(s) retrieved.
可以看到：a表中deptno大量重复value，b表约为a表的十分之一左右，分布情况类似。

测试场景：skip-duplicate-index scan
a表t_staff为大表，有50000条记录
b表t_dept为小表，有5000条记录
两表的deptno列存在大量重复的值，都建有索引。

现构造一个a表和b表使用exists或in关联deptno统计数量的SQL语句。
SQL执行情况：

select /+explain/ count(*) from t_staff a where exists (select 1 from t_dept b where a.deptno=b.deptno) and a.deptno<105;

  (count(*)) 
     25000

–使用exists查看explain的结果文件Sqexplain.out:
QUERY: (OPTIMIZATION TIMESTAMP: 03-03-2017 09:53:03)
------
select /+explain/ count(*) from t_staff a where exists (select 1 from t_dept b where a.deptno=b.deptno) and a.deptno<105

DIRECTIVES FOLLOWED:
EXPLAIN
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 309966
Estimated # of Rows Returned: 1

sinodb.b: INDEX PATH (Skip Duplicate)
(1) Index Name: sinodb.idx_dept_deptno
Index Keys: deptno (Key-Only) (Serial, fragments: ALL)
Upper Index Filter: sinodb.b.deptno < 105
sinodb.a: INDEX PATH

(1) Index Name: sinodb.idx_staff_deptno
    Index Keys: deptno   (Key-Only)  (Serial, fragments: ALL)
    Lower Index Filter: sinodb.a.deptno = sinodb.b.deptno

NESTED LOOP JOIN

Query statistics:
---------------------

Table map :
-------------------
Internal name Table name
----------------------------
t1 b
t2 t_staff

type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t1 5 2059 2059 00:00.00 74

type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t2 25000 25008 25000 00:00.00 150

type rows_prod est_rows time est_cost
-------------------------------------------------
nljoin 25000 5149738 00:00.00 309966

type rows_prod est_rows rows_cons time
-------------------------------------------------
group 1 1 25000 00:00.00

以上显示，SQL使用exists关联子查询使用了skip-duplicate-index scan的访问路径。
分析：
驱动表b表中存在大量重复的deptno，系统为避免索引重复扫描这些键值，采用了跳跃式扫描从而避免了大量IO，结合key-only访问路径精准定位，可以大大提高执行效率。

 那么，大家还有什么其他场景可以模拟出skip duplicate index scan呢？