Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.6.2 unable to mount on macOS #538

Closed
creachadair opened this issue Oct 24, 2024 · 21 comments
Closed

v2.6.2 unable to mount on macOS #538

creachadair opened this issue Oct 24, 2024 · 21 comments

Comments

@creachadair
Copy link
Contributor

After updating from v2.6.1 to v2.6.2, I am unable to mount a filesystem successfully on macOS, I get the following diagnostic:

mount: init: 19=operation not supported by device

It works fine if I revert to v2.6.1, but I have not yet figured out which specific commit is the culprit. I will do some more digging and see if I can isolate it further.

@creachadair
Copy link
Contributor Author

creachadair commented Oct 24, 2024

I did a git bisect, and commit f829e36 is the first place where it fails, though I do not understand clearly why that should have this effect.

Possibly this is related to the fact that I explicitly call WaitMount in my driver? link

@creachadair
Copy link
Contributor Author

I think there may be a shadowing bug (or at least an unintended use) here:

if err := cmd.Wait(); err != nil {

Since the goroutine is not started till the outer err is already known to be nil, the reassignment inside the if is not what gets reported on the channel.

That said: This is not new with that commit, and I don't see an immediate causal relationship to this bug, except insofar as where the channel gets read maybe.

@creachadair
Copy link
Contributor Author

I did a bit more picking at this, and it appears if control does not return from mount, then requests from the kernel are not getting routed properly, and the mount_macfuse process appears to stall. This makes a certain amount of sense, since NewServer in fuse/server.go does not call handleInit until the initial mount function returns.

@creachadair
Copy link
Contributor Author

@hanwen Do you have opinions about the right way to address this? It seems to me like there are a couple options:

  • We could revert f829e36
  • We could make the Darwin-specific mount function handle initialization before returning

I'm happy to do some work to plumb it together, but I don't want to start hacking without getting a second opinion. 🙂

@hanwen
Copy link
Owner

hanwen commented Oct 26, 2024

thanks for the sleuthing. I was afraid something like this might be going on.

The solution is to partially revert the commit: the ready channel shouldn't be in the Server struct, but it should be passed to the mount function. Also, it gist of your analysis should be in a comment somewhere.

Maybe I should bite the bullet and buy a mac for testing the osx flavor of this. I keep hearing bad things about disallowed kext loading. Is it completely disallowed in the latest versions of macos, or is there a developer switch that enables it?

@creachadair
Copy link
Contributor Author

The solution is to partially revert the commit: the ready channel shouldn't be in the Server struct, but it should be passed to the mount function. Also, it gist of your analysis should be in a comment somewhere.

Sounds good. I'll dig into it a bit more this weekend and send you a PR to consider.

Maybe I should bite the bullet and buy a mac for testing the osx flavor of this. I keep hearing bad things about disallowed kext loading. Is it completely disallowed in the latest versions of macos, or is there a developer switch that enables it?

You can disable SIP, but since I don't work on the FUSE extension itself I just re-approve it each time it gets updated (which is a click-ops step in the system settings and a reboot, ugh).

But once the extension is approved you don't need to do extra steps to load and unload it, the approval is just for whether to trust the developer signature. In principle the author could get on the allowlist but Apple charges a lot for that and I suspect he doesn't make enough from working on this to pay for it.

@hanwen
Copy link
Owner

hanwen commented Oct 26, 2024

In principle the author could get on the allowlist but Apple charges a lot for that and I suspect he doesn't make enough from working on this to pay for it.

Google had to convert SrcFS to use NFS, so I think this is not just a matter of paying enough.

@creachadair
Copy link
Contributor Author

Google had to convert SrcFS to use NFS, so I think this is not just a matter of paying enough.

Unfortunate, but it makes sense. I have seen a couple attempts to shim the FUSE interface into an NFS server, but regrettably none that work reliably enough for me to switch to them yet 😞

@creachadair
Copy link
Contributor Author

creachadair commented Oct 26, 2024

Edit: The below turns out to be some other issue, which went away after a reboot.

More wrinkles: If I revert f829e36 at the head of v2.6.2, I am able to mount a filesystem normally. However, if I revert that commit at the tip of master, I get a different type of failure, where the mount blocks and will not respond to EINTR at all (previously EINTR would stop the mount process and the reported error from above would result).

If I SIGQUIT it does stop, and the relevant goroutine trace shows it stalled in getConnection trying to read from the Unix socket to get its descriptor (trimmed for legibility):

fuse.getConnection(0x14000247200?)                                                                                                                                                                                                                
        software/pr/go-fuse/fuse/mount.go:45 +0xac fp=0x1400017a8b0 sp=0x1400017a7d0 pc=0x102c6c65c                     
fuse.mount({0x16d44b7f5, 0xd}, 0x14000237b00, 0x1400021a8c0)                                                                                                                                                                                      
        software/pr/go-fuse/fuse/mount_darwin.go:64 +0x4ac fp=0x1400017aa70 sp=0x1400017a8b0 pc=0x102c6d06c                                                                                                                                                     
fuse.NewServer({0x102e5eca8, 0x1400017e2c0}, {0x16d44b7f5, 0xd}, 0x1400017ac28?)                                                                                                                                                                  
        software/pr/go-fuse/fuse/server.go:229 +0x350 fp=0x1400017abd0 sp=0x1400017aa70 pc=0x102c77ea0                  
fs.Mount({0x16d44b7f5, 0xd}, {0x102e57548?, 0x14000248240?}, 0x102db9c20?)                                
        software/pr/go-fuse/fs/mount.go:27 +0xa8 fp=0x1400017ad00 sp=0x1400017abd0 pc=0x102c880e8                       

I haven't had a chance to trace through the more recent changes to master since v2.6.2 to see what may have affected it, only to note that reverting is no longer sufficient.

@creachadair
Copy link
Contributor Author

Hmm, now that I dig a bit further, I think maybe the behaviour seen in the above comment is confounded by some other issues. More investigation is required.

@creachadair
Copy link
Contributor Author

I bisected again, and it appears that commit e68e570 also interacts with this issue. Reverting it is a little tricky, as there are some later changes built atop it, but as an experiment I checked out at commit b89a90e (the commit just before that in history), reverted f829e36, and the mount still works.

When I repeat the same experiment at commit e68e570 (reverting the channel change), I get:

2024/10/26 12:14:59 writer: Write/Writev failed, err: 22=invalid argument. opcode: INIT
2024/10/26 12:14:59 Error: mount: init: 22=invalid argument

This appears to come from the systemWrite call here.

@creachadair
Copy link
Contributor Author

Oh this is interesting, I just noticed:

mount: init: 19=operation not supported by device

The INIT opcode is 26, not 19, and indeed opcode 19 is not defined at all. So where is that coming from in this case?

@creachadair
Copy link
Contributor Author

creachadair commented Oct 26, 2024

I have found another interesting thing, in fuse/server.go:

	req.serializeHeader(req.outPayloadSize())

	if req.inHeader().Opcode == _OP_INIT && ms.kernelSettings.Minor <= 22 {
		// v8-v22 don't have TimeGran and further fields.
		req.outHeader().Length = uint32(sizeOfOutHeader) + 24
	}

because we serialize the header before making the adjustment, it appears we get the wrong header layout in cases where the minor version is less than 22, which includes Darwin (presently set to 19 in the latest release).

I found that if I reorder this code:

	if req.inHeader().Opcode == _OP_INIT && ms.kernelSettings.Minor <= 22 {
		// v8-v22 don't have TimeGran and further fields.
		req.outHeader().Length = uint32(sizeOfOutHeader) + 24
	}
	req.serializeHeader(req.outPayloadSize())

then, combined with a revert of f829e36, I am able to mount a filesystem correctly with macOS off the tip of master.

@creachadair
Copy link
Contributor Author

creachadair commented Oct 26, 2024

I sent #539 https://review.gerrithub.io/c/hanwen/go-fuse/+/1203164 to address the ordering issue, which I think we want irrespective of the other issue (which I will send separately).

hanwen pushed a commit that referenced this issue Oct 27, 2024
Older kernel versions do not support all the fields of the current INIT header.
Adjust the header size before marshaling the header payload.

Updates #538

Change-Id: I06933c281af5c46bde6bfecaf5269274d20b8ad6
@creachadair
Copy link
Contributor Author

creachadair commented Oct 27, 2024

More digging reveals that Darwin wants not only the INIT callback, but also the initial STATFS, before it will return control from the mount_oxsfuse binary. This means in the current plumbing, we need to hold off blocking till later than I realized.

Reverting the channel change outright fixes this because it allows the caller to begin serving requests before blocking on the channel. But with my putative fix (where it waits after handleInit returns), it still stalls at startup.

@hanwen Sadly, in light of this, I think maybe the original construction—where you had the channel hanging off the Server type directly—might have been a better approach. It's annoying that this is only needed for Darwin, but it definitely is needed there. 😞

@creachadair
Copy link
Contributor Author

Not for merging, but I put together https://github.com/creachadair/go-fuse/pull/2 as a prototype of another way we might solve this. If you think this is OK I can turn it into a proper PR on Gerrit.

@hanwen
Copy link
Owner

hanwen commented Oct 28, 2024

hmm. I'm starting to think that the original wasn't so bad, because the ready <-chan error is mostly self-explanatory.

@creachadair
Copy link
Contributor Author

creachadair commented Oct 28, 2024

hmm. I'm starting to think that the original wasn't so bad, because the ready <-chan error is mostly self-explanatory.

Yes and no—the meaning of the error is clear, but the semantics of when it has to be closed were never spelled out, and are fiddly to document. My proposal in that draft PR is meant to avoid the need for managing the channel at all in cases where it's not needed. YMMV, though.

(That said, simply reverting that change would certainly be simpler to do 🙂)

@hanwen
Copy link
Owner

hanwen commented Oct 30, 2024

see https://review.gerrithub.io/c/hanwen/go-fuse/+/1203281

looks like there is movement on the kext front, osxfuse/osxfuse#1025

@creachadair
Copy link
Contributor Author

creachadair commented Oct 30, 2024

looks like there is movement on the kext front, osxfuse/osxfuse#1025

Oh, that would be excellent.

Thank you for the change!

@creachadair
Copy link
Contributor Author

With v2.6.3 patched, this appears to be resolved, thank you very much for the quick turnaround on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants