Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shortCircuit/domain socket not working #18659

Open
mervynzhang opened this issue Jul 12, 2024 · 4 comments
Open

shortCircuit/domain socket not working #18659

mervynzhang opened this issue Jul 12, 2024 · 4 comments
Labels
type-bug This issue is about a bug

Comments

@mervynzhang
Copy link

Alluxio Version:
What version of Alluxio are you using?
2.9.5
Describe the bug
After deploy alluxio to k8s cluster, I can read/write on proxy/s3 endpoint.

But BytesReadDomainThroughput or BytesWrittenDomainThroughput are always 0.
And /opt/domain is empty in worker pod.

I tried volumeType persistentVolumeClaim or hostPath, both had the same result.

Can you please help to troubleshoot this issue?

shortCircuit:
  enabled: true
  policy: uuid
  volumeType: persistentVolumeClaim
  size: 1Gi
  pvcName: alluxio-worker-domain-socket
  accessModes:
  - ReadWriteOnce
  storageClass: standard
  hostPath: "/tmp/alluxio-domain/" # The hostPath directory to use
@mervynzhang mervynzhang added the type-bug This issue is about a bug label Jul 12, 2024
@YichuanSun
Copy link
Contributor

where did you get the metrics BytesReadDomainThroughput or BytesWrittenDomainThroughput? Could you please provide more logs and properties? We have no idea based on current information.

@mervynzhang
Copy link
Author

mervynzhang commented Jul 30, 2024

use version 2.9.3, it give more errors about this issue. Give mount path more permission fix the issue for 2.9.3. But 2.9.5 still cannot create uuid in /opt/domain without any error.

Caused by: java.io.IOException: Failed to bind to address /opt/domain/1e0d735d-f2ca-4115-b589-2e69edaffeec

Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Permission denied

@mervynzhang
Copy link
Author

alluxio worker can create uuid in /opt/domain now, but alluxio dashboard -> metrics still show no domain socket read/write when I read data using spark.

Is there debug log for alluxio in spark?

2024-07-30 10:15:04,627 INFO  [main](AlluxioWorkerProcess.java:147) - Domain socket data server is enabled at /opt/domain/bf45a11c-2432-4020-bec5-37d066fb1829.
2024-07-30 10:15:04,634 INFO  [main](GrpcDataServer.java:119) - Alluxio worker gRPC server started, listening on /opt/domain/bf45a11c-2432-4020-bec5-37d066fb1829

image

@rorueda
Copy link

rorueda commented Aug 12, 2024

I've had the same issue with version 2.9.5.

After debugging, I found that the domain socket only works when EPOLL is available and a conflict between netty dependencies is making it unavailable.

It seems it is already fixed by #18638. Hopefully it will be released soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

3 participants