Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafety improvements #1683

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion core/src/environment.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ impl<K: Hash + Eq, V: PartialEq> Environment<K, V> {

/// Creates an iterator that visits all layers from the most recent one to the oldest.
/// The element iterator type is `Rc<HashMap<K, V>>`.
pub fn iter_layers(&self) -> EnvLayerIter<'_, K, V> {
pub fn iter_layers<'slf>(&'slf self) -> EnvLayerIter<'slf, K, V> {
EnvLayerIter {
env: if !self.was_cloned() {
Some(NonNull::from(self))
Expand Down
31 changes: 21 additions & 10 deletions core/src/identifier.rs
Original file line number Diff line number Diff line change
Expand Up @@ -240,9 +240,11 @@ mod interner {
/// The interner, which serves a double purpose: it pre-allocates space
/// so that [Ident](super::Ident) labels are created faster
/// and it makes it so that labels are stored only once, saving space.
pub(crate) struct Interner<'a>(RwLock<InnerInterner<'a>>);
// NOTE: We set the lifetime parameter of InnerInterner to 'static since
// it's arbitrary, and there's no reason to expose it to users of Interner
pub(crate) struct Interner(RwLock<InnerInterner<'static>>);
Comment on lines -243 to +245
Copy link
Member

@yannham yannham Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Radvendii , it seems that it used to be the fact that the lifetime of the interner and the inner interner were the same one, before this change. Or am I misunderstanding something? Does your first example of UB still happens with the original code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it still happens with the original code.

Remember that in Interner<'a>(RwLock<InnerInterner<'a>) , there is nothing limiting 'a to the liferime of the Interner. If rust needs it to be 'static it will just insert 'static in here anyways, so specifying 'static doesn't allow any behaviour that wasn't allowed before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. But it's a private item already, as is InnerInterner, so you can only access the InnerInterner of an interner from within the identifier::interner module, is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, I start to believe that the lifetime of InnerInterner is useless, because we only ever instantiate it to static. So maybe we should just bite the bullet and use static references, saying that they aren't static at all, but only the actual implementation of InnerInterner can touch it. And document as well that an Interner should never ever change the underlying inner interner after construction. However, the inner interner being private should make it relatively ok. Another solution is to inline the inner interner in the Interner to avoid this situation, but this means more locks, and more coupling. I don't know if this is worth the price: in any case, we should make it clear that this module has unsafe parts and that you shouldn't touch it unless you know what you're doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is to explore Pin, which believe allows to express those kind of constraints (for example you can't swap a Pin<..> with something else, just drop it, which might be sufficient to avoid your UB example ?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And finally, maybe we should use NonNull instead of references with lifetimes, which would avoid to lie about their lifetime (would still require unsafe, but at least won't make it possible to do conversion to &'static from safe code)


impl<'a> Interner<'a> {
impl Interner {
/// Creates an empty [Interner].
pub(crate) fn new() -> Self {
Self(RwLock::new(InnerInterner::new()))
Expand All @@ -258,28 +260,32 @@ mod interner {
///
/// This operation cannot fails since the only way to have a [Symbol] is to have
/// [interned](Interner::intern) the corresponding string first.
pub(crate) fn lookup(&self, sym: Symbol) -> &str {
pub(crate) fn lookup<'slf>(&'slf self, sym: Symbol) -> &'slf str {
// SAFETY: We are making the returned &str lifetime the same as our struct,
// which is okay here since the InnerInterner uses a typed_arena which prevents
// deallocations, so the reference will be valid while the InnerInterner exists,
// hence while the struct exists.
unsafe { std::mem::transmute(self.0.read().unwrap().lookup(sym)) }
unsafe { std::mem::transmute::<&'_ str, &'slf str>(self.0.read().unwrap().lookup(sym)) }
}
}

/// The main part of the Interner.
struct InnerInterner<'a> {
/// 'internal should never be exposed to the outside, as it is an
/// implementation detail and could be set to anything. It should be treated
/// within the implementation of InnerInterner as the lifetime of the object
/// itself, since that's what it is turned into in the `lookup()` method.
struct InnerInterner<'internal> {
/// Preallocates space where strings are stored.
arena: Mutex<Arena<u8>>,

/// Prevents the arena from creating different [Symbols](Symbol) for the same string.
map: HashMap<&'a str, Symbol>,
map: HashMap<&'internal str, Symbol>,

/// Allows retrieving a string from a [Symbol].
vec: Vec<&'a str>,
vec: Vec<&'internal str>,
}

impl<'a> InnerInterner<'a> {
impl<'internal> InnerInterner<'internal> {
/// Creates an empty [InnerInterner].
fn new() -> Self {
Self {
Expand All @@ -299,9 +305,12 @@ mod interner {
// This is okay since the lifetime of the arena is identical to the one of the struct.
// It is also okay to use it from inside the mutex, since typed_arena does not allow
// deallocation, so references are valid until the arena drop, which is tied to the
// struct drop.
// struct drop. Since there is no 'self lifetime, we use 'internal instead, which
// at least has 'internal: 'self.
let in_string = unsafe {
std::mem::transmute(self.arena.lock().unwrap().alloc_str(string.as_ref()))
std::mem::transmute::<&'_ str, &'internal str>(
self.arena.lock().unwrap().alloc_str(string.as_ref()),
)
};
let sym = Symbol(self.vec.len() as u32);
self.vec.push(in_string);
Expand All @@ -313,6 +322,8 @@ mod interner {
/// This operation cannot fails since the only way to have a [Symbol]
/// is to have [interned](InnerInterner::intern) the corresponding string first.
fn lookup(&self, sym: Symbol) -> &str {
// The lifetime implicitly shrinks from 'internal to 'self, which
// prevents 'internal from leaking out.
self.vec[sym.0 as usize]
}
}
Expand Down
73 changes: 53 additions & 20 deletions core/src/term/array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -157,29 +157,50 @@ impl IntoIterator for Array {
// to the inner slice, then we can:
// - Drop the elements outside our inner view,
// - Move out the elements inside out inner view.
// - Drop the rest of the elements when we're dropped
// Otherwise, we clone everything.

unsafe {
let mut inner: Rc<[ManuallyDrop<RichTerm>]> = transmute(self.inner);
let idx = self.start;
let end = self.end;

if let Some(slice) = Rc::get_mut(&mut inner) {
for term in &mut slice[..idx] {
//
// If we start as a shared reference, we could become the only reference
// later, but we clone everything anyways, so we don't end up in an
// invalid in-between state where some terms have been freed manually
// and others haven't.

let inner = if Rc::strong_count(&self.inner) != 1 || Rc::weak_count(&self.inner) != 0 {
IntoIterInner::Shared(self.inner)
} else {
unsafe {
let mut inner =
transmute::<Rc<[RichTerm]>, Rc<[ManuallyDrop<RichTerm>]>>(self.inner);
let slice =
Rc::get_mut(&mut inner).expect("non-unique Rc after checking for uniqueness");
for term in &mut slice[..self.start] {
ManuallyDrop::drop(term)
}
for term in &mut slice[end..] {
for term in &mut slice[self.end..] {
ManuallyDrop::drop(term)
}
}

IntoIter { inner, idx, end }
IntoIterInner::Owned(inner)
}
};
IntoIter {
inner,
idx: self.start,
end: self.end,
}
}
}

enum IntoIterInner {
Shared(Rc<[RichTerm]>),
// This shouldn't really be an Rc. It should only ever have one reference.
// There may be a way to rewrite this once unsized locals are stabilized.
// But for now, there is no good way to repackage the inner array, so we
// keep it in an Rc.
Owned(Rc<[ManuallyDrop<RichTerm>]>),
}
pub struct IntoIter {
inner: Rc<[ManuallyDrop<RichTerm>]>,
inner: IntoIterInner,
idx: usize,
end: usize,
}
Expand All @@ -191,14 +212,14 @@ impl Iterator for IntoIter {
if self.idx == self.end {
None
} else {
let term = match Rc::get_mut(&mut self.inner) {
None => self.inner[..self.end]
.get(self.idx)
.cloned()
.map(ManuallyDrop::into_inner),
Some(slice) => slice[..self.end]
.get_mut(self.idx)
.map(|t| unsafe { ManuallyDrop::take(t) }),
let term = match &mut self.inner {
IntoIterInner::Shared(inner) => inner.get(self.idx).cloned(),
IntoIterInner::Owned(inner) => unsafe {
Rc::get_mut(inner)
.expect("non-unique Rc after checking for uniqueness")
.get_mut(self.idx)
.map(|t| ManuallyDrop::take(t))
},
};

self.idx += 1;
Expand All @@ -207,6 +228,18 @@ impl Iterator for IntoIter {
}
}

impl Drop for IntoIter {
fn drop(&mut self) {
// drop the rest of the items we haven't iterated through
if let IntoIterInner::Owned(inner) = &mut self.inner {
let inner = Rc::get_mut(inner).expect("non-unique Rc after checking for uniqueness");
for term in &mut inner[self.idx..self.end] {
unsafe { ManuallyDrop::drop(term) }
}
}
}
}

impl ExactSizeIterator for IntoIter {
fn len(&self) -> usize {
self.end - self.idx
Expand Down
Loading