Skip to content

Commit

Permalink
MFC r344183-r344187, r344333-r344334, r344407, r344857, r344865 (by cem)
Browse files Browse the repository at this point in the history
r344183:
FUSE: Respect userspace FS "do-not-cache" of file attributes

The FUSE protocol demands that kernel implementations cache user filesystem
file attributes (vattr data) for a maximum period of time in the range of
[0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or 10
seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

In the future, as an optimization, we should implement notify_inval_entry,
etc, which provide userspace filesystems a way of evicting the kernel cache.

One potentially bogus access to invalid cached attribute data was left in
fuse_io_strategy.  It is restricted behind the undocumented and non-default
"vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
deadcode and can be eliminated?

Some minor APIs changed to facilitate this:

1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").

2. cache_attrs() respects the provided TTL and only caches in the FUSE
inode if TTL > 0.  It also grows an "out" argument, which, if non-NULL,
stores the translated fuse_attr (even if not suitable for caching).

3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
avoid programming mistakes.

4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
link op was weakened (only performed when we have a valid attr cache).  The
check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
We have to trust any userspace filesystem that rejects local caching to
account for it correctly.

PR:		230258 (inspired by; does not fix)

r344184:
FUSE: Respect userspace FS "do-not-cache" of path components

The FUSE protocol demands that kernel implementations cache user filesystem
path components (lookup/cnp data) for a maximum period of time in the range
of [0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or
10 seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
if the user filesystem did not set a zero second TTL.

PR:		230258 (inspired by; does not fix)

r344185:
FUSE: Only "dirty" cached file size when data is dirty

Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
the buf cache bounds as a result of either a read from the underlying FUSE
filesystem, or as part of a write-through type operation (like truncate =>
VOP_SETATTR).  In these cases, do not set the FN_SIZECHANGE flag, which
indicates that an inode's data is dirty (in particular, that the local buf
cache and fvdat->filesize have dirty extended data).

PR:		230258 (related)

r344186:
FUSE: The FUSE design expects writethrough caching

At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol
specifies only clean data to be cached.

Prior to this change, we implement and default to writeback caching.  This
is ok enough for local only filesystems without hardlinks, but violates the
general design contract with FUSE and breaks distributed filesystems or
concurrent access to hardlinks of the same inode.

In this change, add cache mode as an extension of cache enable/disable.  The
new modes are UC (was: cache disabled), WT (default), and WB (was: cache
enabled).

For now, WT caching is implemented as write-around, which meets the goal of
only caching clean data.  WT can be better than WA for workloads that
frequently read data that was recently written, but WA is trivial to
implement.  Note that this has no effect on O_WRONLY-opened files, which
were already coerced to write-around.

Refs:
  * https://sourceforge.net/p/fuse/mailman/message/8902254/
  * vgough/encfs#315

PR:		230258 (inspired by)

r344187:
FUSE: Refresh cached file size when it changes (lookup)

The cached fvdat->filesize is indepedent of the (mostly unused)
cached_attrs, and we failed to update it when a cached (but perhaps
inactive) vnode was found during VOP_LOOKUP to have a different size than
cached.

As noted in the code comment, this can occur in distributed filesystems or
with other kinds of irregular file behavior (anything is possible in FUSE).

We do something similar in fuse_vnop_getattr already.

PR:		230258 (as reported in description; other issues explored in
			comments are not all resolved)
Reported by:	MooseFS FreeBSD Team <freebsd AT moosefs.com>
Submitted by:	Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)

r344333:
fuse: add descriptions for remaining sysctls

(Except reclaim revoked; I don't know what that goal of that one is.)

r344334:
Fuse: whitespace and style(9) cleanup

Take a pass through fixing some of the most egregious whitespace issues in
fs/fuse.  Also fix some style(9) warts while here.  Not 100% cleaned up, but
somewhat less painful to look at and edit.

No functional change.

r344407:
fuse: Fix a regression introduced in r337165

On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to
use a buf cache block size in excess of permitted size.  This did not affect
most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB.
The issue was discovered and reported using a custom kernel with a DFLTPHYS
of 512kB.

PR:		230260 (comment #9)
Reported by:	ken@

r344857:
FUSE: Prevent trivial panic

When open(2) was invoked against a FUSE filesystem with an unexpected flags
value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic.

For now, prevent the panic by rejecting such VOP_OPENs with EINVAL.

This is not considered the correct long term fix, but does prevent an
unprivileged denial-of-service.

PR:		236329
Reported by:	asomers
Reviewed by:	asomers
Sponsored by:	Dell EMC Isilon

r344865:
fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf

On GENERIC kernels with empty loader.conf, there is no functional change.
DFLTPHYS and MAXBSIZE are both 64kB at the moment.  This change allows
larger bufcache block sizes to be used when either MAXBSIZE (custom kernel)
or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher
than the default.

Suggested by:	ken@
  • Loading branch information
asomers committed Sep 6, 2019
1 parent fc0d9df commit 5326c19
Show file tree
Hide file tree
Showing 13 changed files with 640 additions and 653 deletions.
41 changes: 21 additions & 20 deletions sys/fs/fuse/fuse.h
Original file line number Diff line number Diff line change
Expand Up @@ -197,26 +197,27 @@ do { \
#define FUSE_TRACE 0
#endif

#define DEBUGX(cond, fmt, ...) do { \
if (((cond))) { \
printf("%s: " fmt, __func__, ## __VA_ARGS__); \
} } while (0)

#define fuse_lck_mtx_lock(mtx) do { \
DEBUGX(FUSE_DEBUG_LOCK, "0: lock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
mtx_lock(&(mtx)); \
DEBUGX(FUSE_DEBUG_LOCK, "1: lock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
} while (0)

#define fuse_lck_mtx_unlock(mtx) do { \
DEBUGX(FUSE_DEBUG_LOCK, "0: unlock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
mtx_unlock(&(mtx)); \
DEBUGX(FUSE_DEBUG_LOCK, "1: unlock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
} while (0)
#define DEBUGX(cond, fmt, ...) do { \
if (((cond))) { \
printf("%s: " fmt, __func__, ## __VA_ARGS__); \
} \
} while (0)

#define fuse_lck_mtx_lock(mtx) do { \
DEBUGX(FUSE_DEBUG_LOCK, "0: lock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
mtx_lock(&(mtx)); \
DEBUGX(FUSE_DEBUG_LOCK, "1: lock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
} while (0)

#define fuse_lck_mtx_unlock(mtx) do { \
DEBUGX(FUSE_DEBUG_LOCK, "0: unlock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
mtx_unlock(&(mtx)); \
DEBUGX(FUSE_DEBUG_LOCK, "1: unlock(%s): %s@%d by %d\n", \
__STRING(mtx), __func__, __LINE__, curthread->td_proc->p_pid); \
} while (0)

void fuse_ipc_init(void);
void fuse_ipc_destroy(void);
Expand Down
2 changes: 1 addition & 1 deletion sys/fs/fuse/fuse_device.c
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ fuse_device_read(struct cdev *dev, struct uio *uio, int ioflag)
return (err);
}

static __inline int
static inline int
fuse_ohead_audit(struct fuse_out_header *ohead, struct uio *uio)
{
FS_DEBUG("Out header -- len: %i, error: %i, unique: %llu; iovecs: %d\n",
Expand Down
25 changes: 9 additions & 16 deletions sys/fs/fuse/fuse_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,14 +88,11 @@ __FBSDID("$FreeBSD$");
static int fuse_fh_count = 0;

SYSCTL_INT(_vfs_fusefs, OID_AUTO, filehandle_count, CTLFLAG_RD,
&fuse_fh_count, 0, "");
&fuse_fh_count, 0, "number of open FUSE filehandles");

int
fuse_filehandle_open(struct vnode *vp,
fufh_type_t fufh_type,
struct fuse_filehandle **fufhp,
struct thread *td,
struct ucred *cred)
fuse_filehandle_open(struct vnode *vp, fufh_type_t fufh_type,
struct fuse_filehandle **fufhp, struct thread *td, struct ucred *cred)
{
struct fuse_dispatcher fdi;
struct fuse_open_in *foi;
Expand All @@ -114,8 +111,8 @@ fuse_filehandle_open(struct vnode *vp,
/* NOTREACHED */
}
/*
* Note that this means we are effectively FILTERING OUT open() flags.
*/
* Note that this means we are effectively FILTERING OUT open() flags.
*/
oflags = fuse_filehandle_xlate_to_oflags(fufh_type);

if (vnode_isdir(vp)) {
Expand Down Expand Up @@ -159,10 +156,8 @@ fuse_filehandle_open(struct vnode *vp,
}

int
fuse_filehandle_close(struct vnode *vp,
fufh_type_t fufh_type,
struct thread *td,
struct ucred *cred)
fuse_filehandle_close(struct vnode *vp, fufh_type_t fufh_type,
struct thread *td, struct ucred *cred)
{
struct fuse_dispatcher fdi;
struct fuse_release_in *fri;
Expand Down Expand Up @@ -265,10 +260,8 @@ fuse_filehandle_getrw(struct vnode *vp, fufh_type_t fufh_type,
}

void
fuse_filehandle_init(struct vnode *vp,
fufh_type_t fufh_type,
struct fuse_filehandle **fufhp,
uint64_t fh_id)
fuse_filehandle_init(struct vnode *vp, fufh_type_t fufh_type,
struct fuse_filehandle **fufhp, uint64_t fh_id)
{
struct fuse_vnode_data *fvdat = VTOFUD(vp);
struct fuse_filehandle *fufh;
Expand Down
94 changes: 42 additions & 52 deletions sys/fs/fuse/fuse_file.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,75 +67,65 @@
#include <sys/vnode.h>

typedef enum fufh_type {
FUFH_INVALID = -1,
FUFH_RDONLY = 0,
FUFH_WRONLY = 1,
FUFH_RDWR = 2,
FUFH_MAXTYPE = 3,
FUFH_INVALID = -1,
FUFH_RDONLY = 0,
FUFH_WRONLY = 1,
FUFH_RDWR = 2,
FUFH_MAXTYPE = 3,
} fufh_type_t;
_Static_assert(FUFH_RDONLY == O_RDONLY, "RDONLY");
_Static_assert(FUFH_WRONLY == O_WRONLY, "WRONLY");
_Static_assert(FUFH_RDWR == O_RDWR, "RDWR");

struct fuse_filehandle {
uint64_t fh_id;
fufh_type_t fh_type;
uint64_t fh_id;
fufh_type_t fh_type;
};

#define FUFH_IS_VALID(f) ((f)->fh_type != FUFH_INVALID)

static __inline__
fufh_type_t
static inline fufh_type_t
fuse_filehandle_xlate_from_mmap(int fflags)
{
if (fflags & (PROT_READ | PROT_WRITE)) {
return FUFH_RDWR;
} else if (fflags & (PROT_WRITE)) {
return FUFH_WRONLY;
} else if ((fflags & PROT_READ) || (fflags & PROT_EXEC)) {
return FUFH_RDONLY;
} else {
return FUFH_INVALID;
}
if (fflags & (PROT_READ | PROT_WRITE))
return FUFH_RDWR;
else if (fflags & (PROT_WRITE))
return FUFH_WRONLY;
else if ((fflags & PROT_READ) || (fflags & PROT_EXEC))
return FUFH_RDONLY;
else
return FUFH_INVALID;
}

static __inline__
fufh_type_t
static inline fufh_type_t
fuse_filehandle_xlate_from_fflags(int fflags)
{
if ((fflags & FREAD) && (fflags & FWRITE)) {
return FUFH_RDWR;
} else if (fflags & (FWRITE)) {
return FUFH_WRONLY;
} else if (fflags & (FREAD)) {
return FUFH_RDONLY;
} else {
panic("FUSE: What kind of a flag is this (%x)?", fflags);
}
if ((fflags & FREAD) && (fflags & FWRITE))
return FUFH_RDWR;
else if (fflags & (FWRITE))
return FUFH_WRONLY;
else if (fflags & (FREAD))
return FUFH_RDONLY;
else
panic("FUSE: What kind of a flag is this (%x)?", fflags);
}

static __inline__
int
static inline int
fuse_filehandle_xlate_to_oflags(fufh_type_t type)
{
int oflags = -1;

switch (type) {

case FUFH_RDONLY:
oflags = O_RDONLY;
break;

case FUFH_WRONLY:
oflags = O_WRONLY;
break;

case FUFH_RDWR:
oflags = O_RDWR;
break;

default:
break;
}

return oflags;
int oflags = -1;

switch (type) {
case FUFH_RDONLY:
case FUFH_WRONLY:
case FUFH_RDWR:
oflags = type;
break;
default:
break;
}

return oflags;
}

int fuse_filehandle_valid(struct vnode *vp, fufh_type_t fufh_type);
Expand Down
25 changes: 7 additions & 18 deletions sys/fs/fuse/fuse_internal.c
Original file line number Diff line number Diff line change
Expand Up @@ -373,23 +373,18 @@ fuse_internal_readdir_processdata(struct uio *uio,

/* remove */

#define INVALIDATE_CACHED_VATTRS_UPON_UNLINK 1
int
fuse_internal_remove(struct vnode *dvp,
struct vnode *vp,
struct componentname *cnp,
enum fuse_opcode op)
{
struct fuse_dispatcher fdi;
struct fuse_vnode_data *fvdat;
int err;

struct vattr *vap = VTOVA(vp);

#if INVALIDATE_CACHED_VATTRS_UPON_UNLINK
int need_invalidate = 0;
uint64_t target_nlink = 0;

#endif
int err = 0;
err = 0;
fvdat = VTOFUD(vp);

debug_printf("dvp=%p, cnp=%p, op=%d\n", vp, cnp, op);

Expand All @@ -399,13 +394,6 @@ fuse_internal_remove(struct vnode *dvp,
memcpy(fdi.indata, cnp->cn_nameptr, cnp->cn_namelen);
((char *)fdi.indata)[cnp->cn_namelen] = '\0';

#if INVALIDATE_CACHED_VATTRS_UPON_UNLINK
if (vap->va_nlink > 1) {
need_invalidate = 1;
target_nlink = vap->va_nlink;
}
#endif

err = fdisp_wait_answ(&fdi);
fdisp_destroy(&fdi);
return err;
Expand Down Expand Up @@ -483,13 +471,13 @@ fuse_internal_newentry_core(struct vnode *dvp,
if ((err = fuse_internal_checkentry(feo, vtyp))) {
return err;
}
err = fuse_vnode_get(mp, feo->nodeid, dvp, vpp, cnp, vtyp);
err = fuse_vnode_get(mp, feo, feo->nodeid, dvp, vpp, cnp, vtyp);
if (err) {
fuse_internal_forget_send(mp, cnp->cn_thread, cnp->cn_cred,
feo->nodeid, 1);
return err;
}
cache_attrs(*vpp, feo);
cache_attrs(*vpp, feo, NULL);

return err;
}
Expand Down Expand Up @@ -563,6 +551,7 @@ fuse_internal_vnode_disappear(struct vnode *vp)

ASSERT_VOP_ELOCKED(vp, "fuse_internal_vnode_disappear");
fvdat->flag |= FN_REVOKED;
fvdat->valid_attr_cache = false;
cache_purge(vp);
}

Expand Down
Loading

0 comments on commit 5326c19

Please sign in to comment.