Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dr-autosync] online recover time out after switching to backup cluster in sync_recover mode #6803

Closed
mayjiang0203 opened this issue Jul 13, 2023 · 8 comments

Comments

@mayjiang0203
Copy link

Bug Report

What did you do?

What did you expect to see?

What did you see instead?

client logs:

[2023/07/13 03:02:58.889 +08:00] [INFO] [cluster.go:386] ["will run cmd"] [cmd:="tiup ctl:v6.5.3 pd -u http://pd3-peer.e2e-dr-auto-sync-5r-stability-tps-1818426-1-236:2379 unsafe remove-failed-stores --auto-detect"]
2023-07-13T03:02:58.894+0800        INFO        k8s/client.go:132        it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129

...
2023-07-13T03:15:00.398+0800	INFO	k8s/client.go:132	it should be noted that a long-running command will not be interrupted even the use case has ended. For more information, please refer to https://github.com/pingcap/test-infra/discussions/129
[2023/07/13 03:15:00.577 +08:00] [INFO] [cluster.go:909] [stdout]
[
  {
    "info": "Unsafe recovery enters collect report stage",
    "time": "2023-07-13 03:02:58.970",
    "details": [
      "auto detect mode with no specified failed stores"
    ]
  },
  {
    "info": "Unsafe recovery enters force leader stage",
    "time": "2023-07-13 03:03:09.859",
    "actions": {
      "store 1": [
        "force leader on regions: 110, 331, 359, 394, 401, 422, 443, 450, 464, 478, 485, 499, 520, 527, 534, 590, 639, 723, 744, 751, 835, 863, 877, 884, 905, 919, 954, 1017, 1038, 1059, 1073, 1087, 1122, 1164, 1206, 1213, 1248, 1283, 1290, 1318, 1409, 1423, 1465, 1521, 1556, 1570, 1577, 1619, 1647, 1661, 1682, 1696, 1752, 1766, 1808, 1927, 1934, 2025, 2046, 2123, 2130, 2144, 2158, 2165, 2172, 2221, 2242, 2347, 2354, 2438, 2466, 2613, 2627, 2641, 2690, 2725, 2774, 2802, 2928, 2991, 3117, 3362, 3397, 3740, 3789, 3796, 3859, 3950, 3978, 3985, 4104, 4223, 4363, 4517, 4580, 4692, 5000, 5007, 14071, 5154, 5168, 5196, 5224, 5231, 5252, 5259, 5287, 5308, 5329, 5350, 5357, 5364, 5371, 5805, 5812, 6022, 6414, 6456, 6533, 6554, 6785, 6848, 7114, 7135, 7142, 7149, 7156, 7170, 7212, 7219, 7226, 7233, 7240, 7254, 7303, 7324, 7352, 7359, 7373, 7387, 7394, 7401, 7415, 7422, 7443, 7499, 7513, 7520, 7527, 7548, 7562, 7576, 7590, 7597, 7632, 7639, 7919, 7926, 7933, 7947, 7954, 7968, 7982, 8010, 8031, 8052, 8073, 8080, 8087, 8094, 8108, 8129, 8136, 8157, 8178, 8206, 8213, 8227, 8248, 8255, 8269, 8353, 8388, 8514, 8591, 8619, 8626, 8633, 8640, 8654, 8675, 8717, 8738, 8745, 8752, 8759, 8773, 8787, 8794, 8822, 8843, 8857, 8864, 8871, 8878, 8885, 8934, 8962, 9004, 9011, 9039, 9046, 9095, 9130, 9137, 9158, 9165, 9179, 9186, 9207, 9235, 9249, 9263, 9298, 9312, 9319, 9326, 9347, 9424, 9487, 9543, 9564, 9578, 9599, 9669, 9746, 9802, 9879, 9907, 9921, 9956, 10005, 10012, 10026, 10089, 10257, 10271, 10278, 10348, 10383, 10390, 10439, 10467, 10474, 10495, 10509, 10516, 10544, 10565, 10586, 10614, 10621, 10663, 10726, 10733, 10810, 10845, 10908, 10922, 10971, 10992, 11013, 11139, 11160, 11209, 11223, 11244, 11258, 11307, 11314, 11356, 11412, 11426, 11461, 11489, 11587, 11622, 11636, 11657, 11664, 11685, 11811, 11853, 11867, 12000, 12021, 12028, 12056, 12140, 12182, 12203, 12224, 12252, 14043, 14078"
      ],
      "store 10": [
        "force leader on regions: 165, 387, 457, 513, 541, 555, 583, 611, 625, 632, 646, 660, 667, 681, 688, 709, 716, 758, 772, 779, 786, 814, 821, 842, 856, 870, 891, 898, 912, 926, 933, 940, 947, 961, 982, 1010, 1045, 1052, 1108, 1115, 1136, 1143, 1150, 1171, 1185, 1220, 1255, 1262, 1304, 1332, 1346, 1360, 1367, 1381, 1388, 1395, 1416, 1430, 1444, 1479, 1486, 1507, 1528, 1542, 1549, 1584, 1591, 1598, 1626, 1633, 1640, 1654, 1668, 1703, 1717, 1724, 1738, 1773, 1780, 1787, 1815, 1843, 1850, 1892, 1899, 1906, 1913, 1962, 1969, 1983, 2004, 2018, 2053, 2060, 2067, 2081, 2088, 2095, 2102, 2137, 2179, 2186, 2193, 2235, 2263, 2270, 2277, 2284, 2298, 2305, 2312, 2319, 2333, 2361, 2368, 2382, 2410, 2473, 2480, 2487, 2494, 2508, 2529, 2564, 2592, 2599, 2606, 2620, 2634, 2648, 2655, 2662, 2676, 2711, 2718, 2753, 2781, 2788, 2809, 2844, 2851, 2879, 2886, 2907, 2942, 2949, 2956, 2963, 2977, 2984, 2998, 3040, 3047, 3054, 3061, 3075, 3089, 3103, 3124, 3138, 3159, 3166, 3201, 3208, 3229, 3250, 3257, 3271, 3285, 3306, 3334, 3390, 3411, 3432, 3467, 3474, 3488, 3509, 3544, 3579, 3600, 3614, 3635, 3642, 3649, 3656, 3663, 3670, 3677, 3684, 3691, 3698, 3733, 3824, 3838, 3852, 3873, 3880, 3922, 3943, 3957, 3971, 3999, 4006, 4034, 4048, 4055, 4069, 4076, 4097, 4118, 4125, 4132, 4139, 4153, 4160, 4167, 4174, 4181, 4188, 4202, 4244, 4251, 4272, 4286, 4314, 4321, 4342, 4349, 4370, 4377, 4384, 4433, 4447, 4461, 4475, 4489, 4510, 4524, 4538, 4601, 4622, 4629, 4636, 4650, 4699, 4706, 4762, 4783, 4797, 4811, 4818, 4846, 4853, 4867, 4881, 4888, 4895, 4902, 4923, 4965, 4979, 5021, 5035, 5063, 5070, 5077, 5084, 5098, 5112, 14092, 5161, 5182, 5189, 5203, 5210, 5238, 5245, 5294, 5301, 5315, 5322, 5343, 5378, 5588, 5644, 5651, 5672, 5679, 5700, 5707, 5714, 5728, 5735, 5742, 5756, 5763, 5777, 5784, 5798, 5819, 5833, 5847, 5854, 5861, 5868, 5889, 5896, 5910, 5924, 5938, 5966, 5980, 5987, 5994, 6008, 6015, 6029, 6064, 6106, 6113, 6120, 6127, 6148, 6162, 6176, 6183, 6197, 6218, 6232, 6246, 6253, 6260, 6274, 6281, 6288, 6295, 6302, 6309, 6323, 6330, 6337, 6365, 6407, 6428, 6442, 6477, 6484, 6491, 6512, 6519, 6575, 6582, 6589, 6596, 6617, 6624, 6631, 6638, 6652, 6666, 6673, 6680, 6694, 6701, 6708, 6722, 6729, 6736, 6743, 6820, 6827, 6855, 6862, 6876, 6883, 6946, 6953, 6967, 6974, 6981, 7002, 7023, 7030, 7205, 7296, 7310, 7317, 7331, 7345, 7366, 7380, 7436, 7450, 7485, 7492, 7506, 7583, 7604, 7618, 7653, 7660, 7688, 7744, 7758, 7765, 7870, 7996, 8003, 8024, 8066, 8122, 8143, 8164, 8220, 8241, 8262, 8325, 8367, 8416, 8423, 8430, 8444, 8472, 8493, 8500, 8528, 8563, 8570, 8577, 8605, 8647, 8689, 8731, 8766, 8801, 8815, 8836, 8899, 8906, 8913, 8927, 8976, 8990, 9074, 9102, 9200, 9256, 9270, 9277, 9305, 9333, 9361, 9368, 9410, 9431, 9445, 9452, 9459, 9494, 9522, 9536, 9557, 9585, 9592, 9627, 9634, 9641, 9655, 9676, 9690, 9732, 9739, 9760, 9767, 9781, 9788, 9795, 9809, 9816, 9823, 9837, 9844, 9865, 9893, 9900, 9942, 9949, 9991, 10019, 10040, 10054, 10061, 10131, 10138, 10145, 10173, 10201, 10208, 10222, 10229, 10236, 10243, 10285, 10299, 10327, 10334, 10341, 10376, 10418, 10432, 10446, 10453, 10460, 10488, 10530, 10537, 10551, 10593, 10600, 10635, 10642, 10649, 10656, 10677, 10698, 10740, 10789, 10831, 10852, 10859, 10873, 10894, 10901, 10915, 10929, 10978, 10999, 11006, 11020, 11041, 11062, 11083, 11090, 11097, 11104, 11111, 11118, 11132, 11195, 11202, 11251, 11265, 11272, 11286, 11300, 11321, 11335, 11349, 11384, 11391, 11405, 11419, 11454, 11503, 11510, 11517, 11524, 11531, 11538, 11545, 11552, 11559, 11566, 11608, 11629, 11650, 11692, 11727, 11734, 11755, 11762, 11776, 11825, 11846, 11860, 11888, 11902, 11923, 11937, 11951, 11972, 11979, 11993, 12007, 12035, 12070, 12077, 12084, 12105, 12112, 12119, 12133, 12161, 12168, 12175, 12196, 12210, 12245, 12280, 12287, 12308, 14029"
      ],
      "store 7": [
        "force leader on regions: 12331, 12338, 116, 122, 159, 338, 345, 352, 366, 373, 380, 408, 415, 429, 436, 471, 492, 506, 548, 562, 569, 576, 597, 604, 618, 653, 674, 695, 702, 730, 737, 765, 793, 800, 807, 828, 849, 968, 975, 989, 996, 1003, 1024, 1031, 1066, 1080, 1094, 1101, 1129, 1157, 1178, 1192, 1199, 1227, 1234, 1241, 1269, 1276, 1297, 1311, 1325, 1339, 1353, 1374, 1402, 1437, 1451, 1458, 1472, 1493, 1500, 1514, 1535, 1563, 1605, 1612, 1675, 1689, 1710, 1731, 1745, 1759, 1794, 1801, 1822, 1829, 1836, 1857, 1864, 1871, 1878, 1885, 1920, 1941, 1948, 1955, 1976, 1990, 1997, 2011, 2032, 2039, 2074, 2109, 2116, 2151, 2200, 2207, 2214, 2228, 2249, 2256, 2291, 2326, 2340, 2375, 2389, 2396, 2403, 2417, 2424, 2431, 2445, 2452, 2459, 2501, 2515, 2522, 2536, 2543, 2550, 2557, 2571, 2578, 2585, 2669, 2683, 2697, 2704, 2732, 2739, 2746, 2760, 2767, 2795, 2816, 2823, 2830, 2837, 2858, 2865, 2872, 2893, 2900, 2914, 2921, 2935, 2970, 3005, 3012, 3019, 3026, 3033, 3068, 3082, 3096, 3110, 3131, 3145, 3152, 3173, 3180, 3187, 3194, 3215, 3222, 3236, 3243, 3264, 3278, 3292, 3299, 3313, 3320, 3327, 3341, 3348, 3355, 3369, 3376, 3383, 3404, 3418, 3425, 3439, 3446, 3453, 3460, 3481, 3495, 3502, 3516, 3523, 3530, 3537, 3551, 3558, 3565, 3572, 3586, 3593, 3607, 3621, 3628, 3705, 3712, 3719, 3726, 3747, 3754, 3761, 3768, 3775, 3782, 3803, 3810, 3817, 3831, 3845, 3866, 3887, 3894, 3901, 3908, 3915, 3929, 3936, 3964, 3992, 4013, 4020, 4027, 4041, 4062, 4083, 4090, 4111, 4146, 4195, 4209, 4216, 4230, 4237, 4258, 4265, 4279, 4293, 4300, 4307, 4328, 4335, 4356, 4391, 4398, 4405, 4412, 4419, 4426, 4440, 4454, 4468, 4482, 4496, 4503, 4531, 4545, 4552, 4559, 4566, 4573, 4587, 4594, 4608, 4615, 4643, 4657, 4664, 4671, 4678, 4685, 4713, 4720, 4727, 4734, 4741, 4748, 4755, 4769, 4776, 4790, 4804, 4825, 4832, 4839, 4860, 4874, 4909, 4916, 4930, 4937, 4944, 4951, 4958, 4972, 4986, 4993, 5014, 5028, 5042, 5049, 5056, 5091, 5105, 5119, 5126, 5133, 5140, 5147, 5175, 5217, 5266, 5273, 5280, 5336, 5385, 5392, 5399, 5406, 5413, 5420, 5427, 5434, 5441, 5448, 5455, 5462, 5469, 5476, 5483, 5490, 5497, 5504, 5511, 5518, 5525, 5532, 5539, 5546, 5553, 5560, 5567, 5574, 5581, 5595, 5602, 5609, 5616, 5623, 5630, 5637, 5658, 5665, 5686, 5693, 5721, 5749, 5770, 5791, 5826, 5840, 5875, 5882, 5903, 5917, 5931, 5945, 5952, 5959, 5973, 6001, 6036, 6043, 6050, 6057, 6071, 6078, 6085, 6092, 6099, 6134, 6141, 6155, 6169, 6190, 6204, 6211, 6225, 6239, 6267, 6316, 6344, 6351, 6358, 6372, 6379, 6386, 6393, 6400, 6421, 6435, 6449, 6463, 6470, 6498, 6505, 6526, 6540, 6547, 6561, 6568, 6603, 6610, 6645, 6659, 6687, 6715, 6750, 6757, 6764, 6771, 6778, 6792, 6799, 6806, 6813, 6834, 6841, 6869, 6890, 6897, 6904, 6911, 6918, 6925, 6932, 6939, 6960, 6988, 6995, 7009, 7016, 7037, 7044, 7051, 7058, 7065, 7072, 7079, 7086, 7093, 7100, 7107, 7121, 7128, 7163, 7177, 7184, 7191, 7198, 7247, 7261, 7268, 7275, 7282, 7289, 7338, 7408, 7429, 7457, 7464, 7471, 7478, 7534, 7541, 7555, 7569, 7611, 7625, 7646, 7667, 7674, 7681, 7695, 7702, 7709, 7716, 7723, 7730, 7737, 7751, 7772, 7779, 7786, 7793, 7800, 7807, 7814, 7821, 7828, 7835, 7842, 7849, 7856, 7863, 7877, 7884, 7891, 7898, 7905, 7912, 7940, 7961, 7975, 7989, 8017, 8038, 8045, 8059, 8101, 8115, 8150, 8171, 8185, 8192, 8199, 8234, 8276, 8283, 8290, 8297, 8304, 8311, 8318, 8332, 8339, 8346, 8360, 8374, 8381, 8395, 8402, 8409, 8437, 8451, 8458, 8465, 8479, 8486, 8507, 8521, 8535, 8542, 8549, 8556, 8584, 8598, 8612, 8661, 8668, 8682, 8696, 8703, 8710, 8724, 8780, 8808, 8829, 8850, 8892, 8920, 8941, 8948, 8955, 8969, 8983, 8997, 9018, 9025, 9032, 9053, 9060, 9067, 9081, 9088, 9109, 9116, 9123, 9144, 9151, 9172, 9193, 9214, 9221, 9228, 9242, 9284, 9291, 9340, 9354, 9375, 9382, 9389, 9396, 9403, 9417, 9438, 9466, 9473, 9480, 9501, 9508, 9515, 9529, 9550, 9571, 9606, 9613, 9620, 9648, 9662, 9683, 9697, 9704, 9711, 9718, 9725, 9753, 9774, 9830, 9851, 9858, 9872, 9886, 9914, 9928, 9935, 9963, 9970, 9977, 9984, 9998, 10033, 10047, 10068, 10075, 10082, 10096, 10103, 10110, 10117, 10124, 10152, 10159, 10166, 10180, 10187, 10194, 10215, 10250, 10264, 10292, 10306, 10313, 10320, 10355, 10362, 10369, 10397, 10404, 10411, 10425, 10481, 10502, 10523, 10558, 10572, 10579, 10607, 10628, 10670, 10684, 10691, 10705, 10712, 10719, 10747, 10754, 10761, 10768, 10775, 10782, 10796, 10803, 10817, 10824, 10838, 10866, 10880, 10887, 10936, 10943, 10950, 10957, 10964, 10985, 11027, 11034, 11048, 11055, 11069, 11076, 11125, 11146, 11153, 11167, 11174, 11181, 11188, 11216, 11230, 11237, 11279, 11293, 11328, 11342, 11363, 11370, 11377, 11398, 11433, 11440, 11447, 11468, 11475, 11482, 11496, 11573, 11580, 11594, 11601, 11615, 11643, 11671, 11678, 11699, 11706, 11713, 11720, 11741, 11748, 11769, 11783, 11790, 11797, 11804, 11818, 11832, 11839, 11874, 11881, 11895, 11909, 11916, 11930, 11944, 11958, 11965, 11986, 12014, 12042, 12049, 12063, 12091, 12098, 12126, 12147, 12154, 12189, 12217, 12231, 12238, 12259, 12266, 12273, 12294, 12301, 14099, 14050, 14085, 14064, 36, 20, 12324, 5"
      ]
    }
  },
  {
    "info": "Unsafe recovery enters exit force leader stage",
    "time": "2023-07-13 03:13:00.245",
    "details": [
      "triggered by error: Exceeds timeout 2023-07-13 03:12:58.970101112 +0800 CST m=+749.477619802"
    ]
  },
  {
    "info": "Unsafe recovery failed: Exceeds timeout 2023-07-13 03:12:58.970101112 +0800 CST m=+749.477619802",
    "time": "2023-07-13 03:14:24.303",
    "details": [
      "affected meta regions: 12338",
      "affected table ids: 83, 84, 85, 88, 24, 108, 42, 89, 100, 106, 281474976710651, 281474976710654, 281474976710655, 40, 28, 82, 86, 87, 26",
      "Stores that have not dispatched plan: ",
      "Stores that have reported to PD: 10, 7",
      "Stores that have not reported to PD: 1"
    ]
  }
]

pd3-peer logs

[2023/07/13 03:00:48.663 +08:00] [INFO] [grpc_service.go:572] ["put store ok"] [store="id:10 address:\"tikv6-peer:20160\" labels:<key:\"host\" value:\"host4\" > labels:<key:\"zone\" value:\"dc2-zone1\" > labels:<key:\"dc\" value:\"dc2\" > version:\"6.5.3\" peer_address:\"tikv6-peer:20160\" status_address:\"tikv6-peer:20180\" git_hash:\"0e30a057d842a136ac0f82d01e43c841fccf59f5\" start_timestamp:1689188448 deploy_path:\"/tiup/deploy/tikv-20160/bin\" "]
[2023/07/13 03:00:48.663 +08:00] [INFO] [util.go:77] ["load cluster version"] [cluster-version=6.5.3]
[2023/07/13 03:00:48.663 +08:00] [WARN] [util.go:79] ["PD version less than cluster version, please upgrade PD"] [PD-version=6.5.3-0703-all-hotfix] [cluster-version=6.5.3]
[2023/07/13 03:00:49.083 +08:00] [INFO] [grpc_service.go:572] ["put store ok"] [store="id:7 address:\"tikv4-peer:20160\" labels:<key:\"host\" value:\"host4\" > labels:<key:\"zone\" value:\"dc2-zone1\" > labels:<key:\"dc\" value:\"dc2\" > version:\"6.5.3\" peer_address:\"tikv4-peer:20160\" status_address:\"tikv4-peer:20180\" git_hash:\"0e30a057d842a136ac0f82d01e43c841fccf59f5\" start_timestamp:1689188449 deploy_path:\"/tiup/deploy/tikv-20160/bin\" "]
[2023/07/13 03:00:49.083 +08:00] [INFO] [util.go:77] ["load cluster version"] [cluster-version=6.5.3]
[2023/07/13 03:00:49.083 +08:00] [WARN] [util.go:79] ["PD version less than cluster version, please upgrade PD"] [PD-version=6.5.3-0703-all-hotfix] [cluster-version=6.5.3]
[2023/07/13 03:00:51.025 +08:00] [INFO] [grpc_service.go:572] ["put store ok"] [store="id:1 address:\"tikv5-peer:20160\" labels:<key:\"host\" value:\"host4\" > labels:<key:\"zone\" value:\"dc2-zone1\" > labels:<key:\"dc\" value:\"dc2\" > version:\"6.5.3\" peer_address:\"tikv5-peer:20160\" status_address:\"tikv5-peer:20180\" git_hash:\"0e30a057d842a136ac0f82d01e43c841fccf59f5\" start_timestamp:1689188451 deploy_path:\"/tiup/deploy/tikv-20160/bin\" "]

What version of PD are you using (pd-server -V)?

[2023/07/13 03:00:29.529 +08:00] [INFO] [util.go:41] ["Welcome to Placement Driver (PD)"]
[2023/07/13 03:00:29.529 +08:00] [INFO] [util.go:42] [PD] [release-version=v6.5.3-0703-all-hotfix]
[2023/07/13 03:00:29.529 +08:00] [INFO] [util.go:43] [PD] [edition=Community]
[2023/07/13 03:00:29.529 +08:00] [INFO] [util.go:44] [PD] [git-hash=8b26701b9874cfc8d155af3f83ba5bb3794c82e8]
[2023/07/13 03:00:29.529 +08:00] [INFO] [util.go:45] [PD] [git-branch=heads/refs/tags/v6.5.3-0703-all-hotfix]
[2023/07/13 03:00:29.529 +08:00] [INFO] [util.go:46] [PD] [utc-build-time="2023-07-03 02:38:04"]
[2023/07/13 03:00:29.529 +08:00] [WARN] [main.go:81] ["Config contains undefined item: min-resolved-ts-persistence-interval"]
[2023/07/13 03:00:29.529 +08:00] [INFO] [metricutil.go:83] ["disable Prometheus push client"]
@mayjiang0203 mayjiang0203 added the type/bug The issue is confirmed as a bug. label Jul 13, 2023
@mayjiang0203
Copy link
Author

/assign @v01dstar
/severity critical

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 13, 2023

@mayjiang0203: GitHub didn't allow me to assign the following users: v01dstar.

Note that only tikv members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @v01dstar
/severity critical

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kevin-xianliu
Copy link

/assign @Connor1996

@Connor1996
Copy link
Member

Connor1996 commented Jul 27, 2023

Form the log, I find that a peer is in applying snapshot, so it won't send out the vote repsonse back to the force leader temporarily. While the apply snapshot is very slow, so the pre force leader can't finish in time and finally online recovery timeouts.

The behavior is as expected.

@mayjiang0203
Copy link
Author

/remove-severity critical

@mayjiang0203
Copy link
Author

/severity major
we should debug why the apply snapshot is so slow.

@mayjiang0203
Copy link
Author

/remove-severity major
/severity critical
It occur again, and can't w/a even rerun.

@mayjiang0203
Copy link
Author

recreate it in tikv repo as tikv/tikv#15346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants