Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FinBench on TuGraph优化建议 #1

Open
qishipengqsp opened this issue Nov 23, 2023 · 0 comments
Open

FinBench on TuGraph优化建议 #1

qishipengqsp opened this issue Nov 23, 2023 · 0 comments

Comments

@qishipengqsp
Copy link

qishipengqsp commented Nov 23, 2023

以下为较为具体优化建议,供参考:

  1. 写法优化
  2. Query Plan Cache实现
  3. In-Query Call Procedure

不定跳写法优化

实现难度:较易
具体思路
例如 MATCH p = (acc:Account {id:%d})-[e1:transfer *1..3]->(other:Account)<-[e2:signIn]-(medium) WHERE isAsc(relationships(e1, 'timestamp'))=true AND head(relationships(e1, 'timestamp')) > %d AND last(relationships(e1, 'timestamp')) < %d AND e2.timestamp > %d AND e2.timestamp < %d AND medium.isBlocked = true RETURN DISTINCT other.id as otherId, length(p)-1 as accountDistance, medium.id as mediumId, medium.type as mediumType ORDER BY accountDistance, otherId, mediumId;
的不定跳写法,实现了路径上边时间递增的路径过滤,且通过head last约束了时间范围。

当前查询引擎不能很好的支持where中过滤条件的下推与剪枝,可以考虑通过改写为定跳的写法来提高Cypher的查询效率。改写举例:
Match p = (a) -[e1]-> (b) WHERE e1.timestamp < end_time AND e1.timestamp > start_time WITH e1, b MATCH (b) -[e2]-> (c) WHERE e2.timestamp < end_time AND e2.timestamp > start_time AND e1.timestamp < e2.timestamp ......

提示:如有需要,可以通过explains关键字查询每条Cypher的执行计划评估Plan是否有优化点

In-Query Call Procedure

实现难度:中等
具体思路:通过编写in-query call封装一些通用的操作,in-query call是一种可以在cypher中调用存储过程的方式,提高查询的效率。
相关材料

注意:这里自由度较高,请注意避免实现硬编码、不通用的存储过程,尽量通用。

Plan Cache实现

实现难度:较高
具体思路
TuGraph-DB基本的cypher基本查询过程是:cypher -> AST -> ExecutionPlan -> ExecutionEngine
对于Finbench的查询,查询模式比较固定,所以从cypher到ExecutionPlan的过程比较重复。所以从这个角度看,可以在AST或者ExecutionPlan层次做Cache,这将减少解析构建的时间。

1.如何做一个简单的Cache
以一个简单的查询为例子:
MATCH (n:Person{id:2}) RETURN n
生成的逻辑执行计划为
[PlanA] Produce Results Project [n] Node Index Seek [n, id, 2]
那么如果遇到完全相同的语句,可以直接使用次执行计划,那么可以做一个ExecutionPlanCache
ExecutionPlanCache = ["MATCH (n:Person{id:2}) RETURN n", PlanA]

2.输入参数的问题
但是查询的输入总是不确定的,因素需要考虑把输入从Cache中剔除去,那么Cache可以考虑如下情况

ExecutionPlanCache = ["MATCH (n:Person{id:%id}) RETURN n", PlanA]

3.其它:考虑到实现难度等问题,也可以考虑做AstCache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant