-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow cloning without using symbolic links to the destination folder #799
Comments
Without a link, I don't know how to provide atomic updates. If you only pull one time, maybe, but if you want to pull a new version, consuming apps could see a half-updated tree. |
@thockin I think for |
Proposals and PRs are welcome, but really my first pref is to fix broken consumers. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I faced the same problem, the tool was very promising but the main container of the pod is not designed to interpret symlinks, so any reference to a file within the path where the repository has been cloned fails. I would like to contribute with a merge request to solve this problem, but I currently don't have the necessary skills to do so. If this feature (allowing to copy the latest commit of the git repo in a folder under a certain fixed path without symlinks) becomes possible in the future, I will likely want to use git-sync. In the mean time, I'm forced to look for alternatives (I'm currently building my own custom container based on 'alpine/git' that will do what I need). Thanks anyway to the contributors of this project, even if your tool didn't fit my needs in the end I see how much work you have done on this! |
I am curious what "not designed to interpret symlinks" really means. For the most part, if you do NOTHING it works. The places I have historically found to not work are apps that are trying to be smart, and actually being dumb as a result. I would be happy to consider a variant of --one-time which syncs into a real directory without symlink (should not be super complicated, just needs good tests and thinking about exact details). I would consider, under some duress a similar mode outside of --one-time , but that's going to have more impactful design tradeoffs. I don't have personal time to implement this right now, but am happy to guide someone who wants to explore. |
Here's a reason not to use symlinks. Consider the following situation:
What happens is git-sync clones the repo and creates a symlink to the current ref. So far so good. Then the other components start up, mount the NFS volume and can see the contents of the repo at the symlink. So far so good Now, the second that symlink is updated to a new ref because someone pushed to the repo - the Airflow components start failing with filesystem errors:
It turns out that symlink changes don't seem to be propagated in NFS. Having actual directories in this case would be better. I'm not worried about partial reads - in this case, Airflow constantly polls the dags directory so any partial reads will resolve themselves |
I would not expect a shared volume for this, unless the repo is REALLY large. In Kubernetes a local I will have to learn more about symlinks on NFS - it's not a case I have really considered. |
It's more to prevent constant git clones and the latency that involves. The situation git-sync appears to be built for is to clone a repo and supply it to a single pod on a single node, and that pod should be reasonably long-lived. The problem with doing that in a tool like Airflow is that it regularly generates thousands upon thousands of short lived task pods (depending on the number / size of jobs you're running of course) and doing it that way will cause each task pod will trigger a git clone. When your git repo is stored somewhere like github, that generates a lot of traffic. It also stresses k8s's DNS infrastructure and we see random failed jobs due to DNS resolution issues in the git-sync container - rather than the actual job failing |
Thanks for the context. Note that a non-symlink approach will either end up with inconsistent state in the directory or a short period where the directory does not exist (and likely stale file handles, too). |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Is there any way to tell git-sync not to create a symlink for my destination path?
This is critical since the git-sync is by its nature to be used along with other tools and this is a requirement for the other tools to be able to handle the symbolic links.
I recently hit this problem using Kaniko. see GoogleContainerTools/kaniko#2721
which as it seems, Kaniko can't resolve the symbolic link.
The text was updated successfully, but these errors were encountered: