|
|
@ -27,3 +27,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
**结论:主键模型对数据倾斜和高频更新场景极其敏感**
|
|
|
|
**结论:主键模型对数据倾斜和高频更新场景极其敏感**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 解决办法
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
在无法优化导入的前提下:
|
|
|
|
|
|
|
|
1. streamload 发起端增加 group_commit 参数
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
# 导入时在 header 中增加"group_commit:async_mode"配置
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
curl --location-trusted -u {user}:{passwd} -T data.csv -H "group_commit:async_mode" -H "column_separator:," http://{fe_host}:{http_port}/api/db/dt/_stream_load
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"TxnId": 7009,
|
|
|
|
|
|
|
|
"Label": "group_commit_c84d2099208436ab_96e33fda01eddba8",
|
|
|
|
|
|
|
|
"Comment": "",
|
|
|
|
|
|
|
|
"GroupCommit": true,
|
|
|
|
|
|
|
|
"Status": "Success",
|
|
|
|
|
|
|
|
"Message": "OK",
|
|
|
|
|
|
|
|
"NumberTotalRows": 2,
|
|
|
|
|
|
|
|
"NumberLoadedRows": 2,
|
|
|
|
|
|
|
|
"NumberFilteredRows": 0,
|
|
|
|
|
|
|
|
"NumberUnselectedRows": 0,
|
|
|
|
|
|
|
|
"LoadBytes": 19,
|
|
|
|
|
|
|
|
"LoadTimeMs": 35,
|
|
|
|
|
|
|
|
"StreamLoadPutTimeMs": 5,
|
|
|
|
|
|
|
|
"ReadDataTimeMs": 0,
|
|
|
|
|
|
|
|
"WriteDataTimeMs": 26
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 返回的 GroupCommit 为 true,说明进入了 group commit 的流程
|
|
|
|
|
|
|
|
# 返回的 Label 是 group_commit 开头的,是真正消费数据的导入关联的 label
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
group_commit 可以实现在服务端赞批的功能,当size达到表设置的 group_commit_data_bytes 大小时,提交一次,可以节省事务和compaction开销
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Group Commit 的默认提交数据量为 64 MB,用户可以通过修改表的配置调整:
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
|
|
# 修改提交数据量为 128MB
|
|
|
|
|
|
|
|
ALTER TABLE dt SET ("group_commit_data_bytes" = "134217728");
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*如果设置太大,数据会缓存到内存中,导致内存占用率高,这里需要注意*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
如果设置了group_commit还是出现磁盘占用高的问题,可以尝试临时关闭表压缩功能
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
|
|
|
ALTER TABLE dt SET ("disable_auto_compaction" = "true");
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
设置完成后,这张表的tablet不会压缩,但是会占用大量磁盘空间,非紧急情况不要使用。
|