Skip to content

Commit

Permalink
Merge pull request #142 from Alexhuszagh/optunsafe
Browse files Browse the repository at this point in the history
This ensures any ambiguity from the method names would not cause potential issues among maintainers. There's 2 abstractions that contain 193/250 of the unsafe expressions and everything is clearly documented and the safety requirements are clearly documented and effectively every call of these unsafe methods is made within the trait/struct.

Closes #100
Closes #138
  • Loading branch information
Alexhuszagh authored Sep 15, 2024
2 parents 4313694 + 559a1cf commit e3f01cb
Show file tree
Hide file tree
Showing 12 changed files with 55 additions and 196 deletions.
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ Lexical is highly customizable, and contains numerous other optional features:
- **f16**:   Add support for numeric conversions to-and-from 16-bit floats.
<blockquote>Adds <code>f16</code>, a half-precision IEEE-754 floating-point type, and <code>bf16</code>, the Brain Float 16 type, and numeric conversions to-and-from these floats. Note that since these are storage formats, and therefore do not have native arithmetic operations, all conversions are done using an intermediate <code>f32</code>.</blockquote>

To ensure the safety when bounds checking is disabled, we extensively fuzz the all numeric conversion routines. See the [Safety](#safety) section below for more information.
To ensure memory safety, we extensively fuzz the all numeric conversion routines. See the [Safety](#safety) section below for more information.

Lexical also places a heavy focus on code bloat: with algorithms both optimized for performance and size. By default, this focuses on performance, however, using the `compact` feature, you can also opt-in to reduced code size at the cost of performance. The compact algorithms minimize the use of pre-computed tables and other optimizations at a major cost to performance.

Expand Down Expand Up @@ -304,9 +304,7 @@ A benchmark on writing floats generated via a random-number generator and parsed

## Safety

Due to the use of memory unsafe code in the integer and float writers, we extensively fuzz our float writers and parsers. The fuzz harnesses may be found under [fuzz](https://github.com/Alexhuszagh/rust-lexical/tree/main/fuzz), and are run continuously. So far, we've parsed and written over 72 billion floats.

Due to the simple logic of the integer writers, and the lack of memory safety in the integer parsers, we minimally fuzz both, and test it with edge-cases, which has shown no memory safety issues to date.
Due to the use of memory unsafe code in the library, we extensively fuzz our float writers and parsers. The fuzz harnesses may be found under [fuzz](https://github.com/Alexhuszagh/rust-lexical/tree/main/fuzz), and are run continuously. So far, we've parsed and written over 72 billion floats.

## Platform Support

Expand Down
16 changes: 5 additions & 11 deletions lexical-parse-float/src/bigint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -586,8 +586,7 @@ impl<const SIZE: usize> StackVec<SIZE> {
pub fn from_u16(x: u16) -> Self {
let mut vec = Self::new();
assert!(1 <= vec.capacity());
// SAFETY: safe since we can always add 1 item.
unsafe { vec.push_unchecked(x as Limb) };
_ = vec.try_push(x as Limb);
vec.normalize();
vec
}
Expand All @@ -598,8 +597,7 @@ impl<const SIZE: usize> StackVec<SIZE> {
let mut vec = Self::new();
debug_assert!(1 <= vec.capacity());
assert!(1 <= SIZE);
// SAFETY: safe since we can always add 1 item (validated in the asset).
unsafe { vec.push_unchecked(x as Limb) };
_ = vec.try_push(x as Limb);
vec.normalize();
vec
}
Expand All @@ -611,14 +609,10 @@ impl<const SIZE: usize> StackVec<SIZE> {
debug_assert!(2 <= vec.capacity());
assert!(2 <= SIZE);
if LIMB_BITS == 32 {
// SAFETY: safe since we can always add 2 items (validated in the asset).
unsafe {
vec.push_unchecked(x as Limb);
vec.push_unchecked((x >> 32) as Limb);
}
_ = vec.try_push(x as Limb);
_ = vec.try_push((x >> 32) as Limb);
} else {
// SAFETY: safe since we can always add 1 item.
unsafe { vec.push_unchecked(x as Limb) };
_ = vec.try_push(x as Limb);
}
vec.normalize();
vec
Expand Down
18 changes: 8 additions & 10 deletions lexical-parse-float/src/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
let decimal_point = options.decimal_point();
let exponent_character = options.exponent();
debug_assert!(format.is_valid());
debug_assert!(!byte.is_done());
debug_assert!(!byte.is_buffer_empty());
let bits_per_digit = shared::log2(format.mantissa_radix()) as i64;
let bits_per_base = shared::log2(format.exponent_base()) as i64;

Expand All @@ -503,7 +503,7 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
// We must have a format like `0x`, `0d`, `0o`. Note:
is_prefix = true;
if iter.read_if_value(base_prefix, format.case_sensitive_base_prefix()).is_some()
&& iter.is_done()
&& iter.is_buffer_empty()
{
return Err(Error::Empty(iter.cursor()));
}
Expand Down Expand Up @@ -553,7 +553,6 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
let mut implicit_exponent: i64;
let int_end = n_digits as i64;
let mut fraction_digits = None;
// TODO: Change this to something different from read_if_value but same idea
if byte.first_is_cased(decimal_point) {
// SAFETY: byte cannot be empty due to first_is
unsafe { byte.step_unchecked() };
Expand Down Expand Up @@ -596,23 +595,23 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
let is_exponent = byte
.first_is(exponent_character, format.case_sensitive_exponent() && cfg!(feature = "format"));
if is_exponent {
// SAFETY: byte cannot be empty due to `first_is` from `is_exponent`.`
unsafe { byte.step_unchecked() };

// Check float format syntax checks.
#[cfg(feature = "format")]
{
// NOTE: We've overstepped for the safety invariant before.
if format.no_exponent_notation() {
return Err(Error::InvalidExponent(byte.cursor()));
return Err(Error::InvalidExponent(byte.cursor() - 1));
}
// Check if we have no fraction but we required exponent notation.
if format.no_exponent_without_fraction() && fraction_digits.is_none() {
return Err(Error::ExponentWithoutFraction(byte.cursor()));
return Err(Error::ExponentWithoutFraction(byte.cursor() - 1));
}
}

// SAFETY: byte cannot be empty due to `first_is` from `is_exponent`.`
// TODO: Fix: we need a read_if for bytes themselves?
unsafe { byte.step_unchecked() };
let is_negative = parse_exponent_sign(&mut byte)?;

let before = byte.current_count();
parse_digits::<_, _, FORMAT>(byte.exponent_iter(), |digit| {
if explicit_exponent < 0x10000000 {
Expand Down Expand Up @@ -679,7 +678,6 @@ pub fn parse_partial_number<'a, const FORMAT: u128>(
n_digits = n_digits.saturating_sub(1);
}
if zeros.first_is_cased(decimal_point) {
// TODO: Fix with some read_if like logic
// SAFETY: safe since zeros cannot be empty due to first_is
unsafe { zeros.step_unchecked() };
}
Expand Down
2 changes: 1 addition & 1 deletion lexical-parse-float/src/slow.rs
Original file line number Diff line number Diff line change
Expand Up @@ -685,7 +685,7 @@ pub fn compare_bytes<const FORMAT: u128>(
let mut integer = number.integer.bytes::<{ FORMAT }>();
let mut integer_iter = integer.integer_iter();
integer_iter.skip_zeros();
if integer_iter.is_done() {
if integer_iter.is_buffer_empty() {
// Cannot be empty, since we must have at least **some** significant digits.
let mut fraction = number.fraction.unwrap().bytes::<{ FORMAT }>();
let mut fraction_iter = fraction.fraction_iter();
Expand Down
14 changes: 6 additions & 8 deletions lexical-parse-integer/src/algorithm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -116,19 +116,17 @@ macro_rules! fmt_invalid_digit {
// NOTE: If we're using the `take_n` optimization where it can't
// be the end, then the iterator cannot be done. So, in that case,
// we need to end.
if is_suffix && $is_end && $iter.is_done() {
if is_suffix && $is_end && $iter.is_buffer_empty() {
// Break out of the loop, we've finished parsing.
break;
} else if is_suffix {
} else if !$iter.is_buffer_empty() {
// Haven't finished parsing, so we're going to call
// invalid_digit!. Need to ensure we include the
// base suffix in that.

// TODO: This isn't localized and needs to be fixed

// SAFETY: safe since the iterator is not empty, as checked
// in `$iter.is_done()` and `$is_end`. If `$is_end` is false,
// then we have more elements in our
// in `$iter.is_buffer_empty()`. Adding in the check hopefully
// will be elided since it's a known constant.
unsafe { $iter.step_unchecked() };
}
}
Expand Down Expand Up @@ -580,7 +578,7 @@ macro_rules! algorithm {

let is_negative = parse_sign::<T, FORMAT>(&mut byte)?;
let mut iter = byte.integer_iter();
if iter.is_done() {
if iter.is_buffer_empty() {
return into_error!(Empty, iter.cursor());
}

Expand All @@ -603,7 +601,7 @@ macro_rules! algorithm {
// We must have a format like `0x`, `0d`, `0o`. Note:
if iter.read_if_value(base_prefix, format.case_sensitive_base_prefix()).is_some() {
is_prefix = true;
if iter.is_done() {
if iter.is_buffer_empty() {
return into_error!(Empty, iter.cursor());
} else {
start_index += 1;
Expand Down
73 changes: 18 additions & 55 deletions lexical-util/src/iterator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,8 @@ pub use crate::skip::{AsBytes, Bytes};
/// methods for reading data and accessing underlying data. The readers
/// can either be contiguous or non-contiguous, although performance and
/// some API methods may not be available for both.
///
/// # Safety
///
/// // TODO: FIX CORRECTNESS DOCUMENTATION
/// This trait is effectively safe but the implementor must guarantee that
/// `is_empty` is implemented correctly. For most implementations, this can
/// be `self.as_slice().is_empty()`, where `as_slice` is implemented as
/// `&self.bytes[self.index..]`.
#[cfg(feature = "parse")]
pub unsafe trait Iter<'a> {
pub trait Iter<'a> {
/// Determine if the buffer is contiguous in memory.
const IS_CONTIGUOUS: bool;

Expand Down Expand Up @@ -59,6 +51,17 @@ pub unsafe trait Iter<'a> {
self.get_buffer().len()
}

/// Get if no bytes are available in the buffer.
///
/// This operators on the underlying buffer: that is,
/// it returns if [as_slice] would return an empty slice.
///
/// [as_slice]: Iter::as_slice
#[inline(always)]
fn is_buffer_empty(&self) -> bool {
self.cursor() >= self.get_buffer().len()
}

/// Get the current index of the iterator in the slice.
fn cursor(&self) -> usize;

Expand Down Expand Up @@ -86,42 +89,22 @@ pub unsafe trait Iter<'a> {
/// non-contiguous iterators this can be smaller.
fn current_count(&self) -> usize;

// TODO: DOCUMENT
// PROPERTIES

// TODO: Fix this naming convention
/// Get if no bytes are available in the buffer.
///
/// This operators on the underlying buffer: that is,
/// it returns if [as_slice] would return an empty slice.
///
/// [as_slice]: Iter::as_slice
#[inline(always)]
fn is_empty(&self) -> bool {
self.as_slice().is_empty()
}

/// Determine if the buffer is contiguous.
#[inline(always)]
fn is_contiguous(&self) -> bool {
Self::IS_CONTIGUOUS
}

// TODO: Ensure we get this **RIGHT**

/// Get the next value available without consuming it.
///
/// This does **NOT** skip digits, and directly fetches the item
/// from the underlying buffer.
///
/// # Safety
///
/// An implementor must implement `is_empty` correctly in
/// order to guarantee the traitt is safe: `is_empty` **MUST**
/// ensure that one value remains, if the iterator is non-
/// contiguous this means advancing the iterator to the next
/// position.
fn first(&self) -> Option<&'a u8>;
#[inline(always)]
fn first(&self) -> Option<&'a u8> {
self.get_buffer().get(self.cursor())
}

/// Check if the next element is a given value.
#[inline(always)]
Expand Down Expand Up @@ -249,14 +232,6 @@ pub unsafe trait Iter<'a> {
/// A default implementation is provided for slice iterators.
/// This trait **should never** return `null` from `as_ptr`, or be
/// implemented for non-contiguous data.
///
/// # Safety
///
/// TODO: Fix the safety documentation here...
/// The safe methods are sound as long as the caller ensures that
/// the methods for `read_32`, `read_64`, etc. check the bounds
/// of the underlying contiguous buffer and is only called on
/// contiguous buffers.
pub trait DigitsIter<'a>: Iterator<Item = &'a u8> + Iter<'a> {
// TODO: Fix the documentation
/// Get if the iterator cannot return any more elements.
Expand All @@ -278,28 +253,16 @@ pub trait DigitsIter<'a>: Iterator<Item = &'a u8> + Iter<'a> {
/// ensure [peek] is always safe.
///
/// If you would like to see if the cursor is at the end of the buffer,
/// see [is_done] or [is_empty] instead.
/// see [is_buffer_empty] instead.
///
/// [is_done]: DigitsIter::is_done
/// [is_empty]: Iter::is_empty
/// [is_buffer_empty]: Iter::is_buffer_empty
/// [peek]: DigitsIter::peek
#[inline(always)]
#[allow(clippy::wrong_self_convention)]
fn is_consumed(&mut self) -> bool {
self.peek().is_none()
}

/// Get if the buffer underlying the iterator is empty.
///
/// This might not be the same thing as [is_consumed]: [is_consumed]
/// checks if any more elements may be returned, which may require
/// peeking the next value. Consumed merely checks if the
/// iterator has an empty slice. It is effectively a cheaper,
/// but weaker variant of [is_consumed].
///
/// [is_consumed]: DigitsIter::is_consumed
fn is_done(&self) -> bool;

/// Peek the next value of the iterator, without consuming it.
///
/// Note that this can modify the internal state, by skipping digits
Expand Down
6 changes: 3 additions & 3 deletions lexical-util/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
//! correctly.
//!
//! To see if the cursor is at the end of the buffer, use
//! [DigitsIter::is_done][is_done].
//! [is_buffer_empty][crate::iterator::Iter::is_buffer_empty].
//!
//! Any iterators must be peekable: you must be able to read and return the next
//! value without advancing the iterator past that point. For iterators that
Expand All @@ -66,7 +66,7 @@
//! like:
//!
//! ```rust,ignore
//! unsafe impl<_> DigitsIter<_> for MyIter {
//! impl<_> DigitsIter<_> for MyIter {
//! fn peek(&mut self) -> Option<u8> {
//! loop {
//! let value = self.bytes.get(self.index)?;
Expand Down Expand Up @@ -95,7 +95,7 @@
//! }
//! ```
//!
//! [is_done]: iterator::DigitsIter::is_done
//! [is_buffer_empty]: iterator::Iter::is_buffer_empty
//! [is_consumed]: iterator::DigitsIter::is_consumed
//! [peek]: iterator::DigitsIter::peek
Expand Down
Loading

0 comments on commit e3f01cb

Please sign in to comment.