Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Crash pod - Failed to resolve '_cluster._tcp.crate-discovery-my-cluster.default.svc.cluster.local.' #666

Open
Nyuuk opened this issue Oct 14, 2024 · 1 comment

Comments

@Nyuuk
Copy link

Nyuuk commented Oct 14, 2024

I have deployed the crate operator using Helm following the documentation https://cratedb.com/docs/guide/install/container/kubernetes/kubernetes-operator.html
and I have run dev-cluster.yaml.
but I found Failed to resolve service discrovery-cluster this might cause a crash
here is the error log
name pod: crate-data-hot-my-cluster-0

[2024-10-14T09:06:57,426][ERROR][i.c.d.SrvUnicastHostsProvider] [data-hot-0] DNS lookup exception:
java.util.concurrent.ExecutionException: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-my-cluster.default.svc.cluster.local.' [SRV(33)]
        at io.netty.util.concurrent.DefaultPromise.get(DefaultPromise.java:374) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.crate.discovery.SrvUnicastHostsProvider.lookupRecords(SrvUnicastHostsProvider.java:165) ~[dns-discovery-5.8.1.jar:?]
        at io.crate.discovery.SrvUnicastHostsProvider.getSeedAddresses(SrvUnicastHostsProvider.java:140) ~[dns-discovery-5.8.1.jar:?]
        at org.elasticsearch.discovery.DiscoveryModule.lambda$new$4(DiscoveryModule.java:139) ~[crate-server-5.8.1.jar:?]
        at org.elasticsearch.discovery.SeedHostsResolver$1.doRun(SeedHostsResolver.java:191) ~[crate-server-5.8.1.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[crate-server-5.8.1.jar:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1570) [?:?]
Caused by: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-my-cluster.default.svc.cluster.local.' [SRV(33)]
        at io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:1151) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:1098) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:457) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:1056) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.onResponse(DnsResolveContext.java:692) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.access$500(DnsResolveContext.java:69) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext$2.operationComplete(DnsResolveContext.java:515) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsQueryContext.trySuccess(DnsQueryContext.java:345) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsQueryContext.finishSuccess(DnsQueryContext.java:336) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsNameResolver$DnsResponseHandler.channelRead(DnsNameResolver.java:1384) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:97) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        ... 1 more
Caused by: io.netty.resolver.dns.DnsErrorCauseException: Query failed with NXDOMAIN
        at io.netty.resolver.dns.DnsResolveContext.onResponse(..)(Unknown Source) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
[2024-10-14T09:06:58,061][INFO ][o.e.c.s.MasterService    ] [data-hot-0] elected-as-master ([1] nodes joined)[{data-hot-0}{t6_1jL7rTrCgQV16g5FuvA}{14jrfutqQ961f4qC9AOSAQ}{10.244.4.28}{10.244.4.28:4300}{dm}{node_name=hot, http_address=10.244.4.28:4200} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 153, version: 177, reason: master node changed {previous [], current [{data-hot-0}{t6_1jL7rTrCgQV16g5FuvA}{14jrfutqQ961f4qC9AOSAQ}{10.244.4.28}{10.244.4.28:4300}{dm}{node_name=hot, http_address=10.244.4.28:4200}]}
[2024-10-14T09:07:00,398][INFO ][o.e.c.s.ClusterApplierService] [data-hot-0] master node changed {previous [], current [{data-hot-0}{t6_1jL7rTrCgQV16g5FuvA}{14jrfutqQ961f4qC9AOSAQ}{10.244.4.28}{10.244.4.28:4300}{dm}{node_name=hot, http_address=10.244.4.28:4200}]}, term: 153, version: 177, reason: Publication{term=153, version=177}

@WalBeh
Copy link
Contributor

WalBeh commented Oct 16, 2024

@Nyuuk thank you for sharing the logs with us and using crate-operator!

The DNS errors itself - while the crateDB node/POD is started - are IMHO not an issue for crateDB itself.

crate-operator creates a discovery service (which is a kubernetes service). This service is a headless service, which creates the necessary DNS Services (in your k8s DNS), as soon as the crateDB Pod are set to READY and the PODs IPs are added to the corresponding endpoints.

I hope that makes sense and addresses your question. Otherwise let us know (and post more surrounding logs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants