diff --git a/docs/en-US/source/5.developer-manual/2.running/3.high-availability-mode.md b/docs/en-US/source/5.developer-manual/2.running/3.high-availability-mode.md index 0ec0f707ec..5d22c81b36 100644 --- a/docs/en-US/source/5.developer-manual/2.running/3.high-availability-mode.md +++ b/docs/en-US/source/5.developer-manual/2.running/3.high-availability-mode.md @@ -31,7 +31,7 @@ After installing TuGraph, you can use the `lgraph_server` command to start a hig ### 3.1.The initial data is consistent When the data in all servers is the same or there is no data at startup, the user can -specify `--conf host1:port1,host2:port2` to start the server. +specify `--ha_conf host1:port1,host2:port2` to start the server. In this way, all prepared TuGraph instances can be added to the initial backup group at one time, All servers in the backup group elect `leader` according to the RAFT protocol, and other servers join the backup group with the role of `follower`. @@ -39,7 +39,7 @@ servers join the backup group with the role of `follower`. An example command to start an initial backup group is as follows: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 ``` After the first server is started, it will elect itself as the 'leader' and organize a backup group with only itself. @@ -47,22 +47,26 @@ After the first server is started, it will elect itself as the 'leader' and orga ### 3.2.Inconsistent initial data If there is already data in the first server (imported by the `lgraph_import` tool or transferred from a server in non-high availability mode), And it has not been used in high-availability mode before, the user should use the boostrap method to start. Start the server with data in bootstrap -mode with the `ha_bootstrap_role` parameter as 1, and specify the machine as the `leader` through the `conf` +mode with the `ha_bootstrap_role` parameter as 1, and specify the machine as the `leader` through the `ha_conf` parameter. In bootstrap mode, the server will copy its own data to the new server before adding the newly joined server to the backup group, so that the data in each server is consistent. An example command to start a data server is as follows: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 1 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 1 ``` -Other servers without data need to specify the `ha_bootstrap_role` parameter as 2, and specify the `leader` through the `conf` parameter. The command example is as follows +Other servers without data need to specify the `ha_bootstrap_role` parameter as 2, and specify the `leader` through the `ha_conf` parameter. The command example is as follows ```bash -**$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 2 +**$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 2 ``` +**You need to pay attention to two points when using bootstrap to start an HA cluster:** +1. You need to wait for the `leader` node to generate a snapshot and start successfully before joining the `follower` node, otherwise the `follower` node may fail to join. When starting the `follower` node, you can configure the `ha_node_join_group_s` parameter to be slightly larger to allow multiple waits and timeout retries when joining the HA cluster. +2. The HA cluster can only use the bootstrap mode when it is started for the first time. It can only be started in the normal mode (see Section 3.1) when it is started later. In particular, multiple nodes of the same cluster cannot be started in the bootstrap mode, otherwise it may cause Data inconsistency + ## 4.Start witness node ### 4.1. Witness nodes are not allowed to become leader @@ -72,7 +76,7 @@ The startup method of `witness` node is the same as that of ordinary nodes. You An example command to start the witness node server is as follows: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 ``` Note: By default, the `witness` node is not allowed to become the `leader` node, which can improve the performance of the cluster, but will reduce the availability of the cluster when the `leader` node crashes. @@ -84,7 +88,7 @@ You can specify the `ha_enable_witness_to_leader` parameter as `true`, so that t An example of the command to start the `witness` node server that is allowed to become the `leader` node is as follows: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1 ``` Note: Although allowing `witness` nodes to become `leader` nodes can improve the availability of the cluster, it may affect data consistency in extreme cases. Therefore, it should generally be ensured that the number of `witness` nodes + 1 is less than half of the total number of cluster nodes. @@ -92,11 +96,11 @@ Note: Although allowing `witness` nodes to become `leader` nodes can improve the ## 5.Scale out other servers After starting the initial backup group, if you want to scale out the backup group, add new servers to the backup group, -The `--conf HOST:PORT` option should be used, where `HOST` can be the IP address of any server already in this backup group, +The `--ha_conf HOST:PORT` option should be used, where `HOST` can be the IP address of any server already in this backup group, And `PORT` is its RPC port. E.g: ```bash -./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 +./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090 ``` This command will start a TuGraph server in high availability mode and try to add it to the backup group containing the server `172.22.224.15:9090`. @@ -108,17 +112,17 @@ When a server goes offline via 'CTRL-C', it will notify the current 'leader' ser If a server is terminated or disconnected from other servers in the backup group, the server is considered a failed node and the leader server will remove it from the backup group after a specified time limit. -If any server leaves the backup group and wishes to rejoin, it must start with the '--conf {HOST:PORT}' option, where 'HOST' is the IP address of a server in the current backup group. +If any server leaves the backup group and wishes to rejoin, it must start with the '--ha_conf {HOST:PORT}' option, where 'HOST' is the IP address of a server in the current backup group. ## 7.Restarting the Server Restarting the entire backup group is not recommended as it disrupts service. All servers can be shut down if desired. But on reboot, It must be ensured that at least N/2+1 servers in the backup group at shutdown can start normally, otherwise the startup will fail. and, Regardless of whether `enable_bootstrap` is specified as true when initially starting the replication group, restarting the server only needs to pass -Specify the `--conf host1:port1,host2:port2` parameter to restart all servers at once. The command example is as follows: +Specify the `--ha_conf host1:port1,host2:port2` parameter to restart all servers at once. The command example is as follows: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 ``` ## 8.docker deploys a highly available cluster @@ -144,7 +148,7 @@ docker run --net=host -itd -p -v {src_dir}:{dst_dir} --name tugraph_ha tugraph/t ### 8.3.Start service Use the following command to start the service on each server, because docker and the host share IP, so you can directly specify to start the service on the host IP ```shell -$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 +$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 ``` ## 9.Server Status diff --git a/docs/zh-CN/source/5.developer-manual/2.running/3.high-availability-mode.md b/docs/zh-CN/source/5.developer-manual/2.running/3.high-availability-mode.md index 7140252dd7..f9abf18093 100644 --- a/docs/zh-CN/source/5.developer-manual/2.running/3.high-availability-mode.md +++ b/docs/zh-CN/source/5.developer-manual/2.running/3.high-availability-mode.md @@ -43,38 +43,39 @@ v3.6及以上版本支持此功能。 ### 3.1.初始数据一致 -当启动时所有服务器中的数据相同或没有数据时,用户可以通过 -指定`--conf host1:port1,host2:port2`启动服务器。 -这种方式可以将准备好的所有TuGraph实例一次性加入初始备份组, -由备份组中的所有服务器根据raft协议选举出`leader`,并将其他 -服务器以`follower`的角色加入备份组。 +当启动时所有服务器中的数据相同或没有数据时,用户可以通过指定`--ha_conf host1:port1,host2:port2`启动服务器。 +这种方式可以将准备好的所有TuGraph实例一次性加入初始备份组,由备份组中的所有服务器根据raft协议选举出`leader`,并将其他服务器以`follower`的角色加入备份组。 启动初始备份组的命令示例如下所示: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 ``` ### 3.2.初始数据不一致 如果第一台服务器中已有数据(以`lgraph_import`工具导入或从非高可用模式的服务器传输得到), 并且之前并未在高可用模式下使用,则用户应使用boostrap方式启动。 -以`ha_bootstrap_role`参数为1在bootstrap模式下启动有数据的服务器,并通过`conf`参数指定本机为`leader`。 +以`ha_bootstrap_role`参数为1在bootstrap模式下启动有数据的服务器,并通过`ha_conf`参数指定本机为`leader`。 在bootstrap模式下,服务器在将新加入的服务器添加到备份组之前会将自己的 数据复制到新服务器中,以使每个服务器中的数据保持一致。 启动有数据服务器的命令示例如下所示: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 1 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 1 ``` -其他无数据的服务器需要指定`ha_bootstrap_role`参数为2,并通过`conf`参数指定`leader`即可,命令示例如下所示 +其他无数据的服务器需要指定`ha_bootstrap_role`参数为2,并通过`ha_conf`参数指定`leader`即可,命令示例如下所示 ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 2 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 2 ``` +**使用bootstrap启动HA集群时需要注意两点:** +1. 需要等待`leader`节点生成snapshot并且成功启动之后再加入`follower`节点,否则`follower`节点可能加入失败。在启动`follower`节点时可以将`ha_node_join_group_s`参数配置的稍大,以在加入HA集群时多次等待和超时重试。 +2. HA集群只有在第一次启动时可以使用bootstrap模式,后续再启动时只能使用普通模式(见3.1节)启动,尤其不能让同一个集群的多个节点以bootstrap模式启动,否则可能产生数据不一致的情况 + ## 4.启动witness节点 ### 4.1.不允许witness节点成为leader @@ -84,7 +85,7 @@ $ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22. 启动`witness`节点服务器的命令示例如下所示: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 ``` 注:默认不允许`witness`节点成为`leader`节点,这可以提高集群的性能,但是在`leader`节点崩溃时会降低集群的可用性。 @@ -96,7 +97,7 @@ $ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22. 启动允许成为`leader`节点的`witness`节点服务器的命令示例如下所示: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1 ``` 注:尽管允许`witness`节点成为`leader`节点可以提高集群的可用性,但是在极端情况下可能会影响数据的一致性。因此一般应保证`witness`节点数量+1少于集群节点总数量的一半。 @@ -104,11 +105,11 @@ $ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22. ## 5.横向扩展其他服务器 启动初始备份组后,如果想对备份组进行横向扩展,要将新服务器添加到备份组, -应使用`--conf HOST:PORT`选项,其中`HOST`可以是该备份组中已有的任何服务器的 IP 地址, +应使用`--ha_conf HOST:PORT`选项,其中`HOST`可以是该备份组中已有的任何服务器的 IP 地址, 而`PORT`是其 RPC 端口。例如: ```bash -./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 +./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090 ``` 此命令将启动一台高可用模式的 TuGraph 服务器,并尝试将其添加到包含服务器`172.22.224.15:9090`的备份组中。 @@ -121,17 +122,17 @@ $ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22. 如果服务器被终止或者与备份组中的其他服务器失去连接,则该服务器将被视为失败节点,`leader`服务器将在特定时限后将其从备份组中删除。 -如果任何服务器离开备份组并希望重新加入,则必须从`--conf HOST:PORT`选项开始,其中`HOST`是当前备份组中的某台服务器的 IP 地址。 +如果任何服务器离开备份组并希望重新加入,则必须从`--ha_conf HOST:PORT`选项开始,其中`HOST`是当前备份组中的某台服务器的 IP 地址。 ## 7.重启服务器 不建议重新启动整个备份组,因为它会中断服务。如果需要,可以关闭所有服务器。但在重新启动时, 必须保证关闭时的备份组中至少有N/2+1的服务器能正常启动,否则启动失败。 并且, 无论初始启动复制组时是否指定`enable_bootstrap`为true,重启服务器时都只需通过 -指定`--conf host1:port1,host2:port2`参数一次性重启所有服务器即可,命令示例如下所示: +指定`--ha_conf host1:port1,host2:port2`参数一次性重启所有服务器即可,命令示例如下所示: ```bash -$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 +$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 ``` ## 8.docker部署高可用集群 @@ -158,7 +159,7 @@ docker run --net=host -itd -p -v {src_dir}:{dst_dir} --name tugraph_ha tugraph/t ### 8.3.启动服务 在每台服务器上使用如下命令启动服务,因为docker和宿主机共享IP,所以可以直接指定在宿主机IP上启动服务 ```shell -$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 +$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 ``` ## 9.查看服务器状态 diff --git a/procedures/algo_cpp/mis_core.cpp b/procedures/algo_cpp/mis_core.cpp index fd008c36a9..22248badb1 100644 --- a/procedures/algo_cpp/mis_core.cpp +++ b/procedures/algo_cpp/mis_core.cpp @@ -33,6 +33,9 @@ void MISCore(OlapBase &graph, ParallelVector &mis, size_t &mis_size while (active_num != 0) { active_num = graph.ProcessVertexInRange( [&](size_t dst) { + if (mis[dst]) { + return (size_t)0; + } auto edges = graph.OutEdges(dst); for (auto &edge : edges) { size_t src = edge.neighbour; diff --git a/procedures/algo_cpp/mis_procedure.cpp b/procedures/algo_cpp/mis_procedure.cpp index 5286bd00b7..62815e7517 100644 --- a/procedures/algo_cpp/mis_procedure.cpp +++ b/procedures/algo_cpp/mis_procedure.cpp @@ -34,7 +34,7 @@ extern "C" bool Process(GraphDB& db, const std::string& request, std::string& re return false; } - size_t construct_param = SNAPSHOT_PARALLEL; + size_t construct_param = SNAPSHOT_PARALLEL | SNAPSHOT_IDMAPPING; if (make_symmetric != 0) { construct_param = SNAPSHOT_PARALLEL | SNAPSHOT_UNDIRECTED; } diff --git a/procedures/algo_cpp/mis_standalone.cpp b/procedures/algo_cpp/mis_standalone.cpp index 0c993275dc..5d1f7d55d4 100644 --- a/procedures/algo_cpp/mis_standalone.cpp +++ b/procedures/algo_cpp/mis_standalone.cpp @@ -24,7 +24,7 @@ using json = nlohmann::json; class MyConfig : public ConfigBase { public: std::string name = std::string("mis"); - int make_symmetric = 0; + int make_symmetric = 1; void AddParameter(fma_common::Configuration& config) { ConfigBase::AddParameter(config); config.Add(make_symmetric, "make_symmetric", true) @@ -53,6 +53,9 @@ int main(int argc, char** argv) { start_time = get_time(); OlapOnDisk graph; MyConfig config(argc, argv); + if (!config.id_mapping) { + printf("id_mapping is false, the results may contain vertices that do not exist\n"); + } if (config.make_symmetric == 0) { graph.Load(config, INPUT_SYMMETRIC);