Skip to content

Commit

Permalink
Merge branch 'master' into fix_package
Browse files Browse the repository at this point in the history
  • Loading branch information
qishipengqsp authored Jan 17, 2024
2 parents 265775e + 98b31f2 commit 075bd3b
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 34 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,38 +31,42 @@ After installing TuGraph, you can use the `lgraph_server` command to start a hig
### 3.1.The initial data is consistent

When the data in all servers is the same or there is no data at startup, the user can
specify `--conf host1:port1,host2:port2` to start the server.
specify `--ha_conf host1:port1,host2:port2` to start the server.
In this way, all prepared TuGraph instances can be added to the initial backup group at one time,
All servers in the backup group elect `leader` according to the RAFT protocol, and other
servers join the backup group with the role of `follower`.

An example command to start an initial backup group is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

After the first server is started, it will elect itself as the 'leader' and organize a backup group with only itself.

### 3.2.Inconsistent initial data
If there is already data in the first server (imported by the `lgraph_import` tool or transferred from a server in non-high availability mode),
And it has not been used in high-availability mode before, the user should use the boostrap method to start. Start the server with data in bootstrap
mode with the `ha_bootstrap_role` parameter as 1, and specify the machine as the `leader` through the `conf`
mode with the `ha_bootstrap_role` parameter as 1, and specify the machine as the `leader` through the `ha_conf`
parameter. In bootstrap mode, the server will copy its own data to the new server before adding the newly
joined server to the backup group, so that the data in each server is consistent.

An example command to start a data server is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 1
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 1
```

Other servers without data need to specify the `ha_bootstrap_role` parameter as 2, and specify the `leader` through the `conf` parameter. The command example is as follows
Other servers without data need to specify the `ha_bootstrap_role` parameter as 2, and specify the `leader` through the `ha_conf` parameter. The command example is as follows

```bash
**$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 2
**$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 2
```

**You need to pay attention to two points when using bootstrap to start an HA cluster:**
1. You need to wait for the `leader` node to generate a snapshot and start successfully before joining the `follower` node, otherwise the `follower` node may fail to join. When starting the `follower` node, you can configure the `ha_node_join_group_s` parameter to be slightly larger to allow multiple waits and timeout retries when joining the HA cluster.
2. The HA cluster can only use the bootstrap mode when it is started for the first time. It can only be started in the normal mode (see Section 3.1) when it is started later. In particular, multiple nodes of the same cluster cannot be started in the bootstrap mode, otherwise it may cause Data inconsistency

## 4.Start witness node

### 4.1. Witness nodes are not allowed to become leader
Expand All @@ -72,7 +76,7 @@ The startup method of `witness` node is the same as that of ordinary nodes. You
An example command to start the witness node server is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1
```

Note: By default, the `witness` node is not allowed to become the `leader` node, which can improve the performance of the cluster, but will reduce the availability of the cluster when the `leader` node crashes.
Expand All @@ -84,19 +88,19 @@ You can specify the `ha_enable_witness_to_leader` parameter as `true`, so that t
An example of the command to start the `witness` node server that is allowed to become the `leader` node is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1
```

Note: Although allowing `witness` nodes to become `leader` nodes can improve the availability of the cluster, it may affect data consistency in extreme cases. Therefore, it should generally be ensured that the number of `witness` nodes + 1 is less than half of the total number of cluster nodes.

## 5.Scale out other servers

After starting the initial backup group, if you want to scale out the backup group, add new servers to the backup group,
The `--conf HOST:PORT` option should be used, where `HOST` can be the IP address of any server already in this backup group,
The `--ha_conf HOST:PORT` option should be used, where `HOST` can be the IP address of any server already in this backup group,
And `PORT` is its RPC port. E.g:

```bash
./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090
./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090
```

This command will start a TuGraph server in high availability mode and try to add it to the backup group containing the server `172.22.224.15:9090`.
Expand All @@ -108,17 +112,17 @@ When a server goes offline via 'CTRL-C', it will notify the current 'leader' ser

If a server is terminated or disconnected from other servers in the backup group, the server is considered a failed node and the leader server will remove it from the backup group after a specified time limit.

If any server leaves the backup group and wishes to rejoin, it must start with the '--conf {HOST:PORT}' option, where 'HOST' is the IP address of a server in the current backup group.
If any server leaves the backup group and wishes to rejoin, it must start with the '--ha_conf {HOST:PORT}' option, where 'HOST' is the IP address of a server in the current backup group.

## 7.Restarting the Server

Restarting the entire backup group is not recommended as it disrupts service. All servers can be shut down if desired. But on reboot,
It must be ensured that at least N/2+1 servers in the backup group at shutdown can start normally, otherwise the startup will fail. and,
Regardless of whether `enable_bootstrap` is specified as true when initially starting the replication group, restarting the server only needs to pass
Specify the `--conf host1:port1,host2:port2` parameter to restart all servers at once. The command example is as follows:
Specify the `--ha_conf host1:port1,host2:port2` parameter to restart all servers at once. The command example is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

## 8.docker deploys a highly available cluster
Expand All @@ -144,7 +148,7 @@ docker run --net=host -itd -p -v {src_dir}:{dst_dir} --name tugraph_ha tugraph/t
### 8.3.Start service
Use the following command to start the service on each server, because docker and the host share IP, so you can directly specify to start the service on the host IP
```shell
$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

## 9.Server Status
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,38 +43,39 @@ v3.6及以上版本支持此功能。

### 3.1.初始数据一致

当启动时所有服务器中的数据相同或没有数据时,用户可以通过
指定`--conf host1:port1,host2:port2`启动服务器。
这种方式可以将准备好的所有TuGraph实例一次性加入初始备份组,
由备份组中的所有服务器根据raft协议选举出`leader`,并将其他
服务器以`follower`的角色加入备份组。
当启动时所有服务器中的数据相同或没有数据时,用户可以通过指定`--ha_conf host1:port1,host2:port2`启动服务器。
这种方式可以将准备好的所有TuGraph实例一次性加入初始备份组,由备份组中的所有服务器根据raft协议选举出`leader`,并将其他服务器以`follower`的角色加入备份组。

启动初始备份组的命令示例如下所示:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

### 3.2.初始数据不一致

如果第一台服务器中已有数据(以`lgraph_import`工具导入或从非高可用模式的服务器传输得到),
并且之前并未在高可用模式下使用,则用户应使用boostrap方式启动。
`ha_bootstrap_role`参数为1在bootstrap模式下启动有数据的服务器,并通过`conf`参数指定本机为`leader`
`ha_bootstrap_role`参数为1在bootstrap模式下启动有数据的服务器,并通过`ha_conf`参数指定本机为`leader`
在bootstrap模式下,服务器在将新加入的服务器添加到备份组之前会将自己的
数据复制到新服务器中,以使每个服务器中的数据保持一致。

启动有数据服务器的命令示例如下所示:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 1
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 1
```

其他无数据的服务器需要指定`ha_bootstrap_role`参数为2,并通过`conf`参数指定`leader`即可,命令示例如下所示
其他无数据的服务器需要指定`ha_bootstrap_role`参数为2,并通过`ha_conf`参数指定`leader`即可,命令示例如下所示

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090 --ha_bootstrap_role 2
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 2
```

**使用bootstrap启动HA集群时需要注意两点:**
1. 需要等待`leader`节点生成snapshot并且成功启动之后再加入`follower`节点,否则`follower`节点可能加入失败。在启动`follower`节点时可以将`ha_node_join_group_s`参数配置的稍大,以在加入HA集群时多次等待和超时重试。
2. HA集群只有在第一次启动时可以使用bootstrap模式,后续再启动时只能使用普通模式(见3.1节)启动,尤其不能让同一个集群的多个节点以bootstrap模式启动,否则可能产生数据不一致的情况

## 4.启动witness节点

### 4.1.不允许witness节点成为leader
Expand All @@ -84,7 +85,7 @@ $ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.
启动`witness`节点服务器的命令示例如下所示:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1
```

注:默认不允许`witness`节点成为`leader`节点,这可以提高集群的性能,但是在`leader`节点崩溃时会降低集群的可用性。
Expand All @@ -96,19 +97,19 @@ $ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.
启动允许成为`leader`节点的`witness`节点服务器的命令示例如下所示:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1
```

注:尽管允许`witness`节点成为`leader`节点可以提高集群的可用性,但是在极端情况下可能会影响数据的一致性。因此一般应保证`witness`节点数量+1少于集群节点总数量的一半。

## 5.横向扩展其他服务器

启动初始备份组后,如果想对备份组进行横向扩展,要将新服务器添加到备份组,
应使用`--conf HOST:PORT`选项,其中`HOST`可以是该备份组中已有的任何服务器的 IP 地址,
应使用`--ha_conf HOST:PORT`选项,其中`HOST`可以是该备份组中已有的任何服务器的 IP 地址,
`PORT`是其 RPC 端口。例如:

```bash
./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090
./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090
```

此命令将启动一台高可用模式的 TuGraph 服务器,并尝试将其添加到包含服务器`172.22.224.15:9090`的备份组中。
Expand All @@ -121,17 +122,17 @@ $ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.

如果服务器被终止或者与备份组中的其他服务器失去连接,则该服务器将被视为失败节点,`leader`服务器将在特定时限后将其从备份组中删除。

如果任何服务器离开备份组并希望重新加入,则必须从`--conf HOST:PORT`选项开始,其中`HOST`是当前备份组中的某台服务器的 IP 地址。
如果任何服务器离开备份组并希望重新加入,则必须从`--ha_conf HOST:PORT`选项开始,其中`HOST`是当前备份组中的某台服务器的 IP 地址。

## 7.重启服务器

不建议重新启动整个备份组,因为它会中断服务。如果需要,可以关闭所有服务器。但在重新启动时,
必须保证关闭时的备份组中至少有N/2+1的服务器能正常启动,否则启动失败。 并且,
无论初始启动复制组时是否指定`enable_bootstrap`为true,重启服务器时都只需通过
指定`--conf host1:port1,host2:port2`参数一次性重启所有服务器即可,命令示例如下所示:
指定`--ha_conf host1:port1,host2:port2`参数一次性重启所有服务器即可,命令示例如下所示:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

## 8.docker部署高可用集群
Expand All @@ -158,7 +159,7 @@ docker run --net=host -itd -p -v {src_dir}:{dst_dir} --name tugraph_ha tugraph/t
### 8.3.启动服务
在每台服务器上使用如下命令启动服务,因为docker和宿主机共享IP,所以可以直接指定在宿主机IP上启动服务
```shell
$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

## 9.查看服务器状态
Expand Down
3 changes: 3 additions & 0 deletions procedures/algo_cpp/mis_core.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ void MISCore(OlapBase<Empty> &graph, ParallelVector<bool> &mis, size_t &mis_size
while (active_num != 0) {
active_num = graph.ProcessVertexInRange<size_t>(
[&](size_t dst) {
if (mis[dst]) {
return (size_t)0;
}
auto edges = graph.OutEdges(dst);
for (auto &edge : edges) {
size_t src = edge.neighbour;
Expand Down
2 changes: 1 addition & 1 deletion procedures/algo_cpp/mis_procedure.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ extern "C" bool Process(GraphDB& db, const std::string& request, std::string& re
return false;
}

size_t construct_param = SNAPSHOT_PARALLEL;
size_t construct_param = SNAPSHOT_PARALLEL | SNAPSHOT_IDMAPPING;
if (make_symmetric != 0) {
construct_param = SNAPSHOT_PARALLEL | SNAPSHOT_UNDIRECTED;
}
Expand Down
5 changes: 4 additions & 1 deletion procedures/algo_cpp/mis_standalone.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ using json = nlohmann::json;
class MyConfig : public ConfigBase<Empty> {
public:
std::string name = std::string("mis");
int make_symmetric = 0;
int make_symmetric = 1;
void AddParameter(fma_common::Configuration& config) {
ConfigBase<Empty>::AddParameter(config);
config.Add(make_symmetric, "make_symmetric", true)
Expand Down Expand Up @@ -53,6 +53,9 @@ int main(int argc, char** argv) {
start_time = get_time();
OlapOnDisk<Empty> graph;
MyConfig config(argc, argv);
if (!config.id_mapping) {
printf("id_mapping is false, the results may contain vertices that do not exist\n");
}

if (config.make_symmetric == 0) {
graph.Load(config, INPUT_SYMMETRIC);
Expand Down

0 comments on commit 075bd3b

Please sign in to comment.