-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node script to copy projects from staging or prod #1816
Conversation
Often fails to copy assets because `kubectl cp` or `kubectl exec tar` get cut off partway through, but that should go away once Kubernetes version 1.30 is released.
Unit Test Results362 tests 362 ✅ 13s ⏱️ Results for commit 1e3ced2. ♻️ This comment has been updated with latest results. |
In our discussion from #1542 (comment) we talked about modifying all the userRef values from the lexicon (entries, comments, and so on). The script does not do that yet. So far I haven't encountered errors due to userRef values pointing to non-existent users, but I haven't tested that extensively. That might end up being unnecessary, but more testing is needed to prove that. |
I'm wondering if it would make more sense to write a script like this in JS which has a pretty high confidence of running cross platform, as it is Chris has stopped working on Mac so the majority of the team will be running this on windows. As it is bash scripts are difficult to maintain since most of us don't write bash unless we need to. Additionally with JS we could just use kubectl to open a port to the db and use a mongo connection directly which would make it much simpler to write than a line like this |
@hahn-kev you bring up some good points. I didn't think of writing this in NodeJS. I had been planning on writing it in PHP and making something that ran server side. Since I hadn't done it yet I asked Robin to. We both agreed that avoiding PHP would be simpler, since the approach was primarily shelling out to mongo commands. Once we realized we could run the entire thing remotely and not involve the server, it seemed natural to write it in bash. I plan to ensure it runs on Windows. If we could port it to JS for free now that would be cool, but as long as it works as advertised, I am fine leaving this as bash. If we have issues with cross platform or maintainability friction because not everyone writes bash, then I'd go with a three-strikes-and-we-port-it philosophy. |
@rmunn I really wanted to test this however my windows machine doesn't have kube contexts or a wireguard tunnel setup :( I need my existing tunnel and contexts which are on my mac at home. So maybe tomorrow I will test on windows. |
This should ensure that the project assets eventually get copied over to the local Docker setup even under conditions where `kubectl exec` is flaky and fails every couple of minutes.
Now that this is working, I can get rid of the `ls -lR` step (which is effectively redundant anyway as `docker cp` is chatty about what files it's copying), and enable the final cleanup of the temporary directory.
The docker cp command was preserving the UID/GID of the copied files even though I didn't pass it the `-a` parameter (whose purpose is to preserve the UID/GID of the copied files). To work around this issue, we set the file ownership to 33/33 before copying the files into Docker.
The previous solution was too Linux-y; this one doesn't rely on `sudo` or `id` working in a Git Bash environment on Windows.
I'm willing to rewrite in JS if necessary, though I'd like to see how it performs in a Git Bash environment first, to avoid unnecessary work if it does turn out to be unnecessary. |
I asked chatgpt to port it over to Node Typescript and I thought it did a good first pass: https://chat.team-gpt.com/lt-lexical-tools/664585252bf6048a1b9a3f67 |
Not done yet, so don't try to run this yet.
It seems Edit: Switching to a different Internet connection made no difference. I'm going to rewrite the mongodump/mongorestore step to use a MongoClient connection instead. Since I can't use the |
Now takes project ID or URL as parameter
Was used during development as a way to test remote connection, no longer needed.
Defaults to qa/staging for obvious reasons
Our staging server has a port defined on the db service, but our production server does not. Switching to port forward to `deploy/db`, which will automatically select the Mongo pod (which *does* have a port open to forward to).
This will save a bit of time when kubectl cp is being reliable Also force languageforge namespace just in case
NOTE: If you get an error like If this happens a lot, I'll open a separate issue to track that bugfix. UPDATE: Nope, this is happening a lot; it's apparently quite common. I'll include a mitigation in this PR rather than a separate issue. |
Before including pictures and audio in the tarball, make sure they're really there, and skip them if they are a broken symlink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work, this will be nice to have for debugging issues. Let me know when you would like us to test it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is like crazy white magic to me 🪄 🤓 👏
It seems to work 🚀 , but not flawlessly just yet. I'm getting:
throw new Error(`Unexpected result from readlink ${name}: ${result}`);
^
Error: Unexpected result from readlink /var/www/html/assets/lexicon/sf_test-chris-01/audio: -n no'
Might have something to do with windows.
Changing the occurences of echo -n
to use printf (e.g.) printf "no"
seems to fix it for me.
Grrr. Windows, what are you doing? That It's those little subtle differences that get you when trying to write cross-platform scripts. Another thing I could do here is give up on the |
It's possible that Windows is doing something strange here that's causing the `echo` tobe handled by the Windows shell instead of as part of the kubectl input passed to `sh`. Switching to plain echo and then stripping newlines from the result should produce the same result without any cross-platform hiccups.
I want to say windows typically uses double quote's (
|
That makes things difficult, because Linux assigns different meaning to single quotes vs double quotes; for example, single quotes don't do I might be able to rewrite the |
Yup, that's true. Are you saying you cannot make it work with double quotes?
Does it work on Linux like this? |
Windows has issues with single-quotes for quoting command-line params, but thankfully Linux handles double-quotes correctly in all the places I used single-quotes, so we'll just switch to double-quotes everywhere.
Just pushed a commit making it work with double quotes, which I started even before your comment. :-) I would have pushed it an hour ago, but DockerHub was giving me just a trickle of bandwidth so it took nearly an hour to download the Docker images and run local LF to test it. |
Just found a bug: if the lexicon collection exists but is empty, you get I'll fix it, but if it takes too long then I won't spend too much time on it — because a project with no lexical entries at all is not one that we're likely to need to copy to local LF in order to troubleshoot. :-) |
Mongo doesn't like it when you call `.insertMany` and pass it an empty list. You'd think they would handle that case gracefully, but they don't and Mongo throws an error "Invalid BulkOperation, Batch cannot be empty". So we will skip calling `.insertMany` if there are no records to insert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I'm pretty stuck.
Copying the assets tar to my Windows machine has multiple problems.
- I had to normalize the path (actually, I'm not sure if that was necessary, because it didn't help)
- It seems that kubectl cp interprets the Drive in a windows path as a pod name (after the bug has already been fixed twice, apparently 😉). kubectl cp command fails when passing full Windows path such as c:\Temp\foobar.txt kubernetes/kubernetes#101985 (comment)
- I can't use a relative path, because my code is on D and my temp folder on C
- When I copied the code to C to try it out I got:
error: unexpected EOF
kubectl cp failed Error: Command failed: kubectl --context="aws-rke" --namespace=languageforge cp app-7659cf6f57-n9cmx:/tmp/assets-sf_test-chris-02_lf.tar ..\..\AppData\Local\Temp\lfbackup-i56u10\assets-sf_test-chris-02_lf.tar
error: unexpected EOF
. Will try to continue with rsync...
Ensuring rsync exists in target container...
Error from server (Forbidden): deployments.apps "app" is forbidden: User "u-j7l8z" cannot get resource "deployments" in API group "apps" in the namespace "default"
Cleaning up temporary directory C:\Users\tim\AppData\Local\Temp\lfbackup-i56u10...
node:internal/errors:932
const err = new Error(message);
^
Error: Command failed: kubectl exec --context="aws-rke" -c app deploy/app -- bash -c "which rsync || (apt update && apt install rsync -y)"
Error from server (Forbidden): deployments.apps "app" is forbidden: User "u-j7l8z" cannot get resource "deployments" in API group "apps" in the namespace "default"
I have no idea what the error: unexpected EOF
is about 🙁.
I don't really want to set up k8s in WSL just to test this 😕.
If it's working now in Linux, then maybe you should just merge it. 🤷
The "unexpected EOF" error is because The bit about
Okay, I'll dismiss your "changes requested" review from earlier so that GitHub will allow me to merge this. |
You might be able to work around the "drive letter interpreted as a pod name" error by using the obscure |
Feature request from meeting: we want to auto-cleanup the tar file from the server on script exit, so we don't leave a bunch of asset tarballs lying around until the next container restart. |
Also use pod name instead of deploy/app since not every user account has access to deploy objects, at least on production
@myieye - As we discussed, leaving this bit of the work for you. Commit 45d8294 adds a comment in the place where you'd want to make that substitution.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the path change you suggested (1e3ced2) and it worked! 🥳
Fixes #1542
Description
Bash script (on Windows, you'll want to run it with gitbash) to copy projects from staging or production to your local Docker dev environment.
Usage: First, edit the script and make sure the
staging_context
andprod_context
values match the names you've given to your Kubernetes contexts. (If unsure, runkubectl config get-contexts
to see what context names you have on your system).Then run
node backup.mjs MongoID
, e.g. to copy https://staging.languageforge.org/app/lexicon/5dbf805650b51914727e06c4, you'd copy the Mongo ID out of that URL and runbackup.sh 5dbf805650b51914727e06c4
.Alternately, you can just paste the URL as the command-line argument, at which point the script will automatically extract the project ID. Be careful to quote the URL, as some characters might have special meaning to the shell. (For example, on Linux, the
!
character means "Find a previous command that starts with this text". If you don't put quotes around the URL, you'll get an error from Bash sayingbash: !/editor/entry/5dbf806cbea602641cc27e61?sortBy=Default: event not found
. And they must be single quotes, because double-quotes don't remove the special meaning of!
).Dependencies
You need
kubectl
anddocker
installed. Also, the script first tries to copy assets usingkubectl cp
, but if that fails, it falls back torsync
. On Windows, you might need to install a Windows build of rsync from the msys project.Checklist
Testing
Testers, use the following instructions against our staging environment. Post your findings as a comment and include any meaningful screenshots, etc.
Describe how to verify your changes and provide any necessary test data.
backup.sh copiedProjectID