-
Notifications
You must be signed in to change notification settings - Fork 843
Marathon on CENTOS 7 Fails to start #7136
Comments
I have the same also. I deleted everthing in /var/lib/mesos and /var/lib/zookeeper and still I have this shit. I reported that mesos-dns is not correctly reporting ip addresses almost a year ago and it is still not fixed. |
Which version are you using? |
this 1.9.109-1.el7 and 1.9.136-1.el7 the same |
@f1-outsourcing, you should not blindly delete files. Also, remember that this is all open source. We are happy to accept bug fixes from you. @seanfulton, sorry for replying so late. Do you still have this issue? If I understood correctly, Marathon fails to load the old state. Was this after an upgrade? |
I am deleting files and remove configuration options to see if that results to something. Whatever I am changing I am only able to get marathon-1.7.216-9e2a9b579 working with mesos-1.10.0-2.0.1 This is what I posted to the marathon-framework mailing list: All of a sudden I having problems with marathon ui getting stuck at 'loading' and end points like http://m01.local:8081/v2/info are not responding (http://m01.local:8081/ping gives pong). I have now downgraded the test cluster to one node, running only mesos-master and zookeeper and marathon. Cleaning between tests the /var/lib/zookeeper and the /var/lib/mesos directories. I have also removed many of the configuration options I had, like ssl etc. I am only able to get to run marathon-1.7.216-9e2a9b579. marathon-1.8.222-86475ddac and marathon-1.10.17-c427ce965 are having the above mentioned errors/problem. I have been comparing the marathon 1.7 and marathon 1.8 logs and this what I have noticed. There are quite a bit of log statements missing between 'All services up and running. (mesosphere.marathon.MarathonApp:main' and 'akka://marathon/deadLetters' in the 1.8 log. Anyone had something similar? [@mesos-master]# rpm -qa | grep java [@mesos-master]# uname -a [@mesos-master]# cat /etc/redhat-release CentOS Linux release 7.8.2003 (Core) marathon 1.8 (unresponsive)Jun 7 17:40:59 m01 marathon: [2020-06-07 17:40:59,696] INFO All services up and running. (mesosphere.marathon.MarathonApp:main) Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,879] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#1746491390] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior, then [Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-7) marathon 1.7 (ok)Jun 7 17:37:02 m01 marathon: [2020-06-07 17:37:02,681] INFO All services up and running. (mesosphere.marathon.MarathonApp:main) Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,459] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#-463341905] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior, then [Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-8) |
@f1-outsourcing, could you attach the complete logs of Marathon 1.8 from when you start until you made a request to |
|
Is this useful? |
Is this being looked at still? |
If your strategy at D2iQ/mesosphere is to give 'shitty' support to marathon, because you want to push people into using DCOS. You should consider there is a flip side to that approach, I perceive this as:
Someone else reported the same issue[1] in March on your JIRA and his work-a-round of downgrading to Marathon 1.7. He also did not get any attention for 6 month's. Whether or not your software is open source, you should attend to such issues quicker, where people need to downgrade so many versions. |
Also not working What about this message: [info] [2020-07-18 13:14:42,675] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#-1125671270] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior, then [Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-8) |
debug log of http://test2.local:7070/v2/info [info] [2020-07-18 13:26:09,845] DEBUG Current State: LaunchTokens:100 OffersWanted:false Matchers:0 OfferQueues:0 UnprocessedOffers:0 (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-9) |
tracelog of 1.8 after deploying a task with 1.7 [info] Loading project definition from /home/software/marathon2/project/project [�[33mwarn] Canceling execution... [�[31merror] I0718 15:24:56.504509 1575186 sched.cpp:2166] Asked to stop the driver [�[31merror] I0718 15:24:56.504631 1575172 sched.cpp:1204] Stopping framework 5262ced9-70e2-4c0d-9064-ab4173118409-0000 |
This is the v2/info request compared between 1.7.236 an 1.10.25 1.7.236
1.10.25
1.10.25 the request ends with these lines
where 1.7.236 continues like this
|
I have a new install of marathon/mesos/zookeeper on centos 7. I am using the RPMs (1.9.109). Everything fires up OK, but when I go to the marathon interface, it spins with Loading Applications ... and never finishes. The only thing I can find in the logs is this on the master:
INFO Found no roles suitable for revive repetition. (mesosphere.marathon.core.launchqueue.impl.ReviveOffersStreamLogic$ReviveRepeaterLogic:marathon-akka.actor.default-dispatcher-3)
Feb 08 10:35:25 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:25,513] INFO Found no roles suitable for revive repetition. (mesosphere.marathon.core.launchqueue.impl.ReviveOffersStreamLogic$ReviveRepeaterLogic:marathon-akka.actor.default-dispatcher-2)
Feb 08 10:35:30 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:30,513] INFO Found no roles suitable for revive repetition. (mesosphere.marathon.core.launchqueue.impl.ReviveOffersStreamLogic$ReviveRepeaterLogic:marathon-akka.actor.default-dispatcher-7)
Feb 08 10:35:32 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:32,723] INFO Prompting Mesos for a heartbeat via explicit task reconciliation (mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marathon-akka.actor.default-dispatcher-8)
Feb 08 10:35:32 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:32,726] INFO Received fake heartbeat task-status update (mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-172)
Firewall/iptables is off. I can't submit a test job to marathon or create an app. Mesos seems to be working OK.
sean
The text was updated successfully, but these errors were encountered: