-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes and improvements to deployment #90
Conversation
markgoddard
commented
Apr 17, 2024
- deployment: Add comments to inventories
- deployment: Always get CA fingerprint
- deployment: Assert that there is only one HAProxy server
This avoids an issue when adding hosts to a cluster where the host getting the fingerprint has already been bootstrapped, so does not query the CA's fingerprint.
Currently we are not deploying any failover mechanism such as keepalived, so limit to one HAProxy server.
many thanks @markgoddard 🍺 I am currently testing this branch with this
but it's currently hanging at "Gethring facts":
I can send you the debug log but there are no critical issues there, just a lot of fluff, and the ssh connections to active2 and active3 work fine. Any clues? |
also the deps have not been updated, just for logging reasons:
🍺 |
@valeriupredoi I'd diff your new inventory against your old one. I expect you don't want to deploy minio. |
Do you have activeh in /etc/hosts on activeh? Perhaps previously you were referring to it as localhost? |
yessir! Here's my hosts file:
with aactual numbers not xxx - no, I was using exactly the same inventory as now, but with three machines listed under HAproxy, etc/hosts has not changed either |
let me try rerun with my previous configuration, see if that goes through (with the error at the end), so we can isolate the issue |
right! So the thing now hangs with my old setup from yesterday as well 🤦♂️ Need to see what's happened in the meantime |
@markgoddard apols for the tardiness: this is the bit that's hanging:
-> the funny thing is that if I run that command myself, all is fine...am very confused 😖 |
figured it out thanks to @RosalynHatcher whom I owe a massive 🍺 - but am back to the Bootstrap CA issue now:
😖 |
note that the reductionist verification also fails, for
🍺 |
Your bootstrapping is failing with network errors:
Perhaps the port is not open in a firewall/security group? |
Also looks like you are not using the changes in this PR because the task to get the fingerprint is skipped on activeh |
I am - I am using this branch, bud. Port 9999 doesn't actually exist afaik, does it? |
yes it is indeed skipped - the problem is with active2 and active3 for Bootstrap CA - but even activeh fails at the end with reductionist unreachable - very possible it's how @bnlawrence has configured active2 and 3? |
this is the full partial section:
|
Perhaps you still have the code changes you made previously? That task should not be skipped if using this branch. |
ok, my mistake - it's the |
What do you mean port 9999 doesn't exist? TCP/UDP ports go up to 65,535 |
It's failing with certificate expired. Step CA uses short-lived certificates, so probably the renewal isn't working for some reason. |
OK:
and same for active3, and regarding the current reductionist-rs:
where the inventory file is changed to have it for this deployment. At any rate, I just pulled the latest:
Have you pushed any changes? 🍺 |
my bad, I thought the biggest one was 9090 🤣 |
Port 9999 needs to be accessible on activeh. Are you able to curl it (using HTTPS)? |
That's what they want you to believe |
you, sir, are a life-saver! Completely forgot to turn on port 999 on activeh - now, massive progress, but it stumbled right at the end:
Should I regenerate the step |
actually, hang on, just opened 8080 too (was not open 🤦♂️ ) now it clearly says cert is expired:
|
OK that certificat is not expired:
Mark, any clues why the pinger would think it expired? |
sorry bud, last message for today I promise (just about to go home): the deployed Reductionist on activeh is using the activeh's backend IP |
That is the root CA certificate, not the server certificate(s). Those are generated by the step CLI in |
Perhaps we can talk about your exact network setup tomorrow, but there is a variable called |
both those two things - great clues, many thanks for taking the time with me, Mark, and my apologies for bombarding you with questions - I realize I am annoying, but, as you can see, am a total n00b at ansibles and its networking, and I want to get this done. I owe you a couple pints for sure! I reckon by tomorrow, given what you pointed me to, we'll get it to successfully deploy (and work) 🍻 |
…oxy host Try to catch issues with access to backends earlier.
I've pushed some changes that should help. There is a fix for the wait task that would be necessary if you modify |
Mark, you're a bloody wizard! I took the latest changes you made here, opened port 8081 on active2 and active3, and the thing ran with absolutely no hitch:
I'm actually gonna poke it about see what's what, but boy am I happy to see no fails and comms and certs not barking at me 😁 🍻 |
we got "Hello world!" from a remote client (me laptop) to
where 192... is
I am over the moon 😄 |
just ran an actual PyActiveStorage test and it runs very well (let's not concern with the times just yet) - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolute legend @markgoddard 🍺 Very many thanks for this and your CH (continuous help) over the past couple days. One itty bitty mention I'd put in here for others not to struggle like me is to have the ssh connection from the main node (activeh in my case) not be init-ed with an eal of the ssh-agent ie eval $(ssh-agent -s)
because that will result in ansibles needing a password be inputted, but it's not asking for it explicitly, and instead it hangs - this is what my lovely colleague @RosalynHatcher sorted me out with, me barely speaking any ssh. Apart from that, I owe you a couple pints, mate 🍺
I've added a note about the SSH agent issue. Will merge once CI goes green again. Thanks for trying out my changes! |
🍻 |