-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readdir support #11 #12
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ | |
//! unified Rust-y interface to these calls. | ||
|
||
use std::{ | ||
ffi::OsStr, | ||
fs::File, | ||
io::{Error, ErrorKind, Result}, | ||
path::Path, | ||
|
@@ -23,11 +24,11 @@ cfg_if::cfg_if! { | |
if #[cfg(windows)] { | ||
mod win; | ||
|
||
use win::OpenOptionsImpl; | ||
use win::{OpenOptionsImpl, ReadDirImpl, DirEntryImpl}; | ||
} else { | ||
mod unix; | ||
|
||
use unix::OpenOptionsImpl; | ||
use unix::{OpenOptionsImpl, ReadDirImpl, DirEntryImpl}; | ||
} | ||
} | ||
|
||
|
@@ -183,6 +184,11 @@ impl OpenOptions { | |
/// This will honour the options set for creation/append etc, but will only | ||
/// operate relative to d. To open a file with an absolute path, use the | ||
/// stdlib fs::OpenOptions. | ||
/// | ||
/// Note: On Windows this uses low level APIs that do not perform path | ||
/// separator translation: if passing a path containing a separator, it must | ||
/// be a platform native one. e.g. `foo\\bar` on Windows, vs `foo/bar` on | ||
/// most other OS's. | ||
pub fn open_at<P: AsRef<Path>>(&self, d: &mut File, p: P) -> Result<File> { | ||
self._impl.open_at(d, OpenOptions::ensure_root(p.as_ref())?) | ||
} | ||
|
@@ -198,6 +204,66 @@ impl OpenOptions { | |
} | ||
} | ||
|
||
/// Iterate over the contents of a directory. Created by calling read_dir() on | ||
/// an opened directory. Each item yielded by the iterator is an io::Result to | ||
/// allow communication of io errors as the iterator is advanced. | ||
/// | ||
/// To the greatest extent possible the underlying OS semantics are preserved. | ||
/// That means that `.` and `..` entries are exposed, and that no sort order is | ||
/// guaranteed by the iterator. | ||
#[derive(Debug)] | ||
pub struct ReadDir<'a> { | ||
_impl: ReadDirImpl<'a>, | ||
} | ||
|
||
impl<'a> ReadDir<'a> { | ||
pub fn new(d: &'a mut File) -> Result<Self> { | ||
Ok(ReadDir { | ||
_impl: ReadDirImpl::new(d)?, | ||
}) | ||
} | ||
} | ||
|
||
impl Iterator for ReadDir<'_> { | ||
type Item = Result<DirEntry>; | ||
|
||
fn next(&mut self) -> Option<Result<DirEntry>> { | ||
self._impl | ||
.next() | ||
.map(|entry| entry.map(|_impl| DirEntry { _impl })) | ||
} | ||
} | ||
|
||
/// The returned type for each entry found by [`read_dir`]. | ||
/// | ||
/// Each entry represents a single entry inside the directory. Platforms that | ||
/// provide rich metadata may in future expose this through methods or extension | ||
/// traits on DirEntry. | ||
/// | ||
/// For now however, only the [`name()`] is exposed. This does not imply any | ||
/// additional IO for most workloads: metadata returned from a directory listing | ||
/// is inherently racy: presuming that what was a dir, or symlink etc when the | ||
/// directory was listed, will still be the same when opened is fallible. | ||
/// Instead, use open_at to open the contents, and then process based on the | ||
/// type of content found. | ||
#[derive(Debug)] | ||
pub struct DirEntry { | ||
_impl: DirEntryImpl, | ||
} | ||
|
||
impl DirEntry { | ||
pub fn name(&self) -> &OsStr { | ||
self._impl.name() | ||
} | ||
} | ||
|
||
/// Read the children of the directory d. | ||
/// | ||
/// See [`ReadDir`] and [`DirEntry`] for details. | ||
pub fn read_dir(d: &mut File) -> Result<ReadDir> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I somewhat would have expected this would be a method on a specific There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, so I can understand that. There is a consistency question. The stdlib's OpenOptions struct is a programmable factory basically. I've chosen to put mkdir_at and open_at (and in future symlink_at and delete_at) on a similar struct, at least for now. That permits a foreign File like that implements AsDeref to be used I guess, which an extension trait wouldn't so much. readdir doesn't seem to need OpenOptions which is why I made it a free function for now. The type it works on is File because one doesn't know when a path is opened whether it is a directory or not. We could do a type conversion thing where we check the metadata, but std lib File's don't have free metadata last I checked, so we'd be adding a syscall hidden behind a type conversion, which feels strange to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could have a It might make client code feel more natural: Of course on Linux (at least) it doesn't have to be a dir; you can have an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps. We can try a few things and see whats nice. e.g.
would certainly work. I'm just not sure its better yet :) |
||
ReadDir::new(d) | ||
} | ||
|
||
pub mod os { | ||
cfg_if::cfg_if! { | ||
if #[cfg(windows)] { | ||
|
@@ -214,14 +280,15 @@ pub mod testsupport; | |
#[cfg(test)] | ||
mod tests { | ||
use std::{ | ||
ffi::OsStr, | ||
fs::{rename, File}, | ||
io::{Error, ErrorKind, Result, Seek, SeekFrom, Write}, | ||
path::PathBuf, | ||
}; | ||
|
||
use tempfile::TempDir; | ||
|
||
use crate::{testsupport::open_dir, OpenOptions, OpenOptionsWriteMode}; | ||
use crate::{read_dir, testsupport::open_dir, DirEntry, OpenOptions, OpenOptionsWriteMode}; | ||
|
||
/// Create a directory parent, open it, then rename it to renamed-parent and | ||
/// create another directory in its place. returns the file handle and the | ||
|
@@ -464,4 +531,46 @@ mod tests { | |
} | ||
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn readdir() -> Result<()> { | ||
let (_tmp, mut parent_dir, _pathname) = setup()?; | ||
assert_eq!( | ||
2, // . and .. | ||
read_dir(&mut parent_dir)? | ||
.collect::<Result<Vec<DirEntry>>>()? | ||
.len() | ||
); | ||
let dir_present = | ||
|children: &Vec<DirEntry>, name: &OsStr| children.iter().any(|e| e.name() == name); | ||
|
||
let mut options = OpenOptions::default(); | ||
options.create_new(true).write(OpenOptionsWriteMode::Write); | ||
options.open_at(&mut parent_dir, "1")?; | ||
options.open_at(&mut parent_dir, "2")?; | ||
options.open_at(&mut options.mkdir_at(&mut parent_dir, "child")?, "3")?; | ||
let children = read_dir(&mut parent_dir)?.collect::<Result<Vec<_>>>()?; | ||
assert_eq!( | ||
5, | ||
children.len(), | ||
"directory contains 5 entries (., .., 1, 2, child)" | ||
); | ||
assert!(dir_present(&children, OsStr::new("1")), "{:?}", children); | ||
assert!(dir_present(&children, OsStr::new("2")), "{:?}", children); | ||
assert!( | ||
dir_present(&children, OsStr::new("child")), | ||
"{:?}", | ||
children | ||
); | ||
|
||
{ | ||
let mut child = OpenOptions::default() | ||
.read(true) | ||
.open_at(&mut parent_dir, "child")?; | ||
let children = read_dir(&mut child)?.collect::<Result<Vec<_>>>()?; | ||
assert_eq!(3, children.len(), "{:?}", children); | ||
assert!(dir_present(&children, OsStr::new("3")), "{:?}", children); | ||
} | ||
Ok(()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point about it being racy, but, does
open_at
give you a way to avoid the race consistently.If there's an entry that's flipping between being a symlink, directory, file, and something else, and I want to readlink, readlink, read, or error out... I'd need a
nofollow
option to start with, and then I guess the application code needs to retry various options until something succeeds? Seems a bit messy?(Probably not a reason to block addition of this with just the name to start with, just a thought.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats ... fascinating. What bakes my brain here is that this is a new fd - where is the state coming from? (The previous iteration over the DIR* contents should be wiped out when fdopendir is called : https://docs.rs/nix/0.24.2/src/nix/dir.rs.html#57
And we never actually read from the original fd: the pattern is:
Dir
calls closedir correctly., which closes fd 4.It works on Linux here outside docker. Doesn't work within docker. That means we don't require a kernel version change to trigger the situation.
Linux tr2vm 5.13.0-52-generic #59-Ubuntu SMP Wed Jun 15 20:17:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
libc6:amd64 2.34-0ubuntu3.2
The failing docker container contents:
ii libc6:amd64 2.35-0ubuntu3.1 amd64 GNU C Library: Shared libraries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoFollow is absolutely needed yes. We also need an inode abstraction (to permit safeish
open_at(child, '..')
) to walk back up the tree. But then you don't need to retry, you just open, interrogate, and if a symlink follow the pointer from userspace, otherwise process the object you obtained from open_at.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installed an impish container (same as host os that 'works') and it still fails - so this is looking very specific to docker.
Going to try a couple things :). Also I've realised nix has poor readdir hygiene, filing a bug there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nix-rust/nix#1783
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaves a strong hint (from the NetBSD man page in particular) nix-rust/nix#1784
So I think seek might be right, though I'm still suspicious of underlying things here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It failed for me first time, outside of Docker, on Pop_OS 22.04 Linux lift
5.18.10-76051810-generic #202207071639~1659108431~22.04~c9172fb SMP PREEMPT_DYNAMIC Fri J x86_64 x86_64 x86_64 GNU/Linux
I can't immediately cite chapter and verse but I'm not surprised that dup'd file descriptors share a seek position.
https://man7.org/linux/man-pages/man2/dup.2.html
https://man7.org/linux/man-pages/man2/open.2.html
So my model was that you have a single open file object in the kernel with two fd numbers pointing to it. If you read from it by one, you have to seek it back in the other.
That does not explain why it would succeed in some cases though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But with
O_NOFOLLOW
you'll fail to open the file. You need to then retry withreadlinkat
.I split out #13 on this too.
The other case I though of where
statat
will help beyondfstat
is a file that you don't have permission to read.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the success in some cases is very confusing. We may need to add a note that the same file object is in use and thus the file location offset can be updated by readdir.
O_PATH permits fstat after opening, but the portability concerns are there :/.
The advice for programs that want to be secure and run on Darwin, from that stackexchange, is to tell folk that they need to grant read access to permit such software to work, or document its vulnerability to races on Darwin.