External table又叫SinoDB外部表,外部表采用多线程机制,支持多线程读取、写入数据文件以及多线程数据转换、插入操作。多线程机制只需要消耗相对较少的系统资源,但是能提供高速数据导入、导出,可以应用在数据采集、表重建、数据库迁移等业务场景中。
对于一个普通表,SinoDB负责其数据存储管理,有相应的DBSPACE、Chunk、extent、page 等,可以创建相应的索引。然而,对于SinoDB的外部表而言,其数据文件由操作系统管理,简单来讲外部表就是一个或者多个文件,SinoDB数据库引擎提供对文件的 SQL 接口访问。文件的格式支持固定格式的文本文件,也支持SinoDB的内部二进制格式。
1.External table创建语法
create external table table_name([column definition | sameas table_template])
using (
datafiles ('disk:/textfile'),
format 'delimited',
delimiter 'delimiter_string'
);
- table_name:要创建的外部表名称。
- table_template:可以指定一个模板,以该表的列定义创建一个外部表。
- datafiles:指定外部表使用的数据文件。
- format:指定文件采用的格式化方式。
- delimiter:指定分隔符。
2. External table使用演示
示例1:创建外部表,定义字段信息,采用“.”作为分隔**
[informix@vm84145 data]$ cat exe_data.unl
1.1.Bill.01\.06\.1983.
2.1.John.25\.12\.1985.
3.2.Mary.10\.10\.1987.
4.3.Kate.11\.11\.1989.
5.1.Will Smith.28\.02\.1981.
[informix@vm84145 data]$ export GL_DATE="%d.%m.%iY"
[informix@vm84145 data]$ cat exeternal.sql
-- export GL_DATE="%d.%m.%iY"
create external table t_ext_table(f_employeeid int, f_deptid int, f_employeename varchar(20), f_birthdate date)
using(
datafiles ('disk:/home/informix/data/exe_data.unl'),
format 'delimited',
delimiter '.'
);
select * from t_ext_table;
[informix@vm84145 data]$ dbaccess testdb exeternal.sql
Database selected.
Table created.
f_employeeid f_deptid f_employeename f_birthdate
1 1 Bill 01.06.1983
2 1 John 25.12.1985
3 2 Mary 10.10.1987
4 3 Kate 11.11.1989
5 1 Will Smith 28.02.1981
5 row(s) retrieved.
Database closed.
[informix@vm84145 data]$
示例2:采用sameas快速定义表字段信息,字段定义与已有表一致:
[informix@vm84145 daqi]$ ll
total 121196
-rw-rw-r-- 1 informix informix 209 Jul 10 10:23 fjta_news1_ext.sql
-rw-rw-r-- 1 informix informix 1756664 Jul 10 10:21 fjta_news1.unl1
-rw-rw-r-- 1 informix informix 811624 Jul 10 10:26 fjta_news1.unl2
[informix@vm84145 daqi]$ cat fjta_news1_ext.sql
create external table fjta_news1_ext sameas fjta_news1
using(
datafiles('disk:/home/informix/daqi/fjta_news1.unl1',
'disk:/home/informix/daqi/fjta_news1.unl2'),
format 'delimited',
delimiter '|'
);
[informix@vm84145 daqi]$ dbaccess testdb3 fjta_news1_ext.sql
Database selected.
Table created.
Database closed.
[informix@vm84145 daqi]$ echo "select count(1) from fjta_news1_ext;" |dbaccess testdb3;
Database selected.
(count)
15000
1 row(s) retrieved.
Database closed.
示例3:使用模式正则表达式定义多个文件。
fjta_news1%r(1…4).unl定义了4个文件,如:fjta_news11.unl、fjta_news12.unl、fjta_news13.unl、fjta_news14.unl:
create external table fjta_news1_ext sameas fjta_news1
using(
datafiles('disk:/home/informix/daqi/fjta_news1%r(1..4).unl'),
format 'delimited',
delimiter '|'
);
示例4:重建大数据量表
[informix@vm84145 daqi]$ cat new_fjta_news1_ext.sql
create external table new_fjta_news1_ext sameas fjta_news1
using(
datafiles('disk:/home/informix/daqi/new_fjta_news1%r(1..4).unl'),
format 'delimited',
delimiter '|'
);
[informix@vm84145 daqi]$ dbaccess testdb3 new_fjta_news1_ext.sql
Database selected.
Table created.
Database closed.
[informix@vm84145 daqi]$ dbaccess testdb3 -
Database selected.
> set pdqpriority 90;
PDQ Priority set.
> set isolation dirty read;
Isolation level set.
> set environment IFX_BATCHEDREAD_TABLE '1';
Environment set.
> insert into new_fjta_news1_ext select * from fjta_news1;
15819 row(s) inserted.
> set pdqpriority 0;
PDQ Priority set.
[informix@vm84145 daqi]$ ll new*
-rw-rw-rw- 1 informix informix 679391 Jul 10 11:12 new_fjta_news11.unl
-rw-rw-rw- 1 informix informix 725839 Jul 10 11:12 new_fjta_news12.unl
-rw-rw-rw- 1 informix informix 679529 Jul 10 11:12 new_fjta_news13.unl
-rw-rw-rw- 1 informix informix 630876 Jul 10 11:12 new_fjta_news14.unl
[informix@vm84145 daqi]$ cat raw_fjta_news1.sql
create raw table raw_fjta_news1
(
id serial ,
title varchar(100)
default '' ,
seo_title varchar(50) ,
seo_keyword varchar(50) ,
seo_description varchar(100) ,
zone_id integer
default 0 ,
cat_id integer
default 0 ,
source varchar(100)
default '',
tag varchar(200)
default '',
img_source varchar(100)
default '',
is_audit smallint
default 0 ,
is_recommend smallint
default 0 ,
is_top smallint
default 0 ,
add_time integer
default 0 ,
number_hits integer
default 0 ,
is_home smallint
default 0 ,
person_thumb varchar(255) ,
person_name varchar(100) ,
person_desc lvarchar(512)
) extent size 256 next size 64 lock mode row;
[informix@vm84145 daqi]$ dbaccess testdb3 raw_fjta_news1.sql
Database selected.
Table created.
Database closed.
[informix@vm84145 daqi]$ dbaccess testdb3 -
Database selected.
> set pdqpriority 90;
PDQ Priority set.
> insert into raw_fjta_news1 select * from new_fjta_news1_ext;
15819 row(s) inserted.
> drop table fjta_news1;
Table dropped.
> rename table raw_fjta_news1 to fjta_news1;
Table renamed.
> create index idx_fjta_news1 on fjta_news1(id);
Index created.
> alter table fjta_news1 type(standard);
Table altered.
> set pdqpriority 0;
PDQ Priority set.
3. External table使用注意点
- 以下设置可以提高外部表导入导出性能:
把大数据文件切分为多个相对较小的文件
开启PDQ,set pdqpriority=90;
导出数据时,启用light scan,此时需要设置足够大的虚拟内存段;
导入数据时采用raw table;
导入数据时DISABLE约束和索引;
-
ontape/onbar不能备份外部表,需要手工单独进行备份;
-
外部表不能进行update statistics;
-
在少量数据的情况下推荐直接使用unload/load,在大数据量表的情况下推荐使用外部表;