Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The error message for rook-ceph-mon-d is:debug 2024-09-29T05:56:05.330+0000 7f8b5543f700 1 mon.d@2(electing) e5 handle_auth_request failed to assign global_id #12 #329

Open
shuaigea opened this issue Sep 29, 2024 · 6 comments

Comments

@shuaigea
Copy link

图片4
[WRN] Health check update: 10 slow ops, oldest one blocked for 56 sec, mon.d has slow ops (SLOW_OPS)

9/29/24 4:31:51 PM [WRN] Health check update: 8 slow ops, oldest one blocked for 51 sec, mon.d has slow ops (SLOW_OPS)

9/29/24 4:31:51 PM [WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 46 sec, mon.d has slow ops

9/29/24 4:31:51 PM [WRN] Health detail: HEALTH_WARN 2 slow ops, oldest one blocked for 46 sec, mon.d has slow ops
bash-4.4$ ceph -s
cluster:
id: 93cb51f5-56d6-4045-87d6-6e37d861a83e
health: HEALTH_WARN
1/3 mons down, quorum a,c

services:
mon: 3 daemons, quorum a,c,d (age 0.186447s)
mgr: b(active, since 9w), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 2d), 6 in (since 3d)

data:
volumes: 1/1 healthy
pools: 3 pools, 49 pgs
objects: 43.32k objects, 65 GiB
usage: 204 GiB used, 11 TiB / 11 TiB avail
pgs: 49 active+clean

io:
client: 136 KiB/s rd, 1005 KiB/s wr, 4 op/s rd, 64 op/s wr
图片1
图片2
图片3

@subhamkrai
Copy link
Collaborator

@shuaigea are you trying to restore the mon quorum and having some issues? It's not clear from the issue description

@shuaigea
Copy link
Author

shuaigea commented Oct 8, 2024

@shuaigea您是否正在尝试恢复 mon quorum 并遇到一些问题?问题描述不清楚

Hello, the problem occurred when I added a new hard drive during expansion. The new hard drive has a different transfer speed from the original Jiu hard drive of the same brand. The Jiu hard drive has a read speed of 7000mb/s, which is indeed fast, but the new machine has a read speed of 3500mb/s, which is not satisfactory. However, there is still a slow read alarm. After investigation, it was found that the read speed of the new hard drive is different from that of Jiu hard drive. I would like to know if there is a clear indication during expansion that it should have the same transfer speed? Or is it necessary to use the same brand of hard drive and read/write speed as Jiu to maintain consistency in order to avoid slow reading within what acceptable range of differences in reading speed?

@subhamkrai
Copy link
Collaborator

@shuaigea您是否正在尝试恢复 mon quorum 并遇到一些问题?问题描述不清楚

Hello, the problem occurred when I added a new hard drive during expansion. The new hard drive has a different transfer speed from the original Jiu hard drive of the same brand. The Jiu hard drive has a read speed of 7000mb/s, which is indeed fast, but the new machine has a read speed of 3500mb/s, which is not satisfactory. However, there is still a slow read alarm. After investigation, it was found that the read speed of the new hard drive is different from that of Jiu hard drive. I would like to know if there is a clear indication during expansion that it should have the same transfer speed? Or is it necessary to use the same brand of hard drive and read/write speed as Jiu to maintain consistency in order to avoid slow reading within what acceptable range of differences in reading speed?

I'm not sure about this one @travisn @BlaineEXE do you have any about above

@BlaineEXE
Copy link
Member

I still don't understand what the problem is. My intuition from reading between the lines is that the disk in question is being used for mon and not osd, but I can't be sure.

@shuaigea
Copy link
Author

shuaigea commented Oct 9, 2024

@BlaineEXE It is indeed a problem with mon that is causing slow queries now, but I am not sure if my newly added hard drive is not of the same brand and has the same read and write capabilities as the original disk. My current solution is to kick out the newly added mon. 5 and restore it, but I still have doubts about future expansion. Should the storage capacity be consistent with the original hard drive performance?

@BlaineEXE
Copy link
Member

I still don't quite have a full enough understanding to help out here. I know you have added a new hard drive, but hard drives have multiple uses for Ceph, and I can't know exactly how the new drive is getting used.

If the new drive was used for an OSD, that shouldn't affect mons, so we will have to look into other causes and other cluster info.

If the new drive was used for either the mon PVC, or if it has dataDirHostPath on it, then it could affect the mon.

The only guidance we have from the Ceph documentation on mon disks is this:

It is strongly suggested that (enterprise-class) SSDs are provisioned for, at a minimum, Ceph Monitor

The Ceph project doesn't state specific throughput requirements, so I can't say for sure whether the disk's throughput is an issue or not. It is possible that the new disk is simply a faulty (or partly faulty) unit from the factory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants