-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace regex with custom parsers #113
Conversation
Hey! Can you share the exact commands you ran to get that information? 🙏 I am a bit surprised the binary size would reduce as |
Hi and thanks for responding! Yes, I think you are partly right... Though, if I read the dependency tree correct, My sample program looks like below (Cargo.toml+main.rs): [package]
name = "rrules-test"
edition = "2021"
[profile.release]
strip = true
[dependencies]
rrule = "0.12" fn main() {
let exp = std::env::args().nth(1).unwrap();
let rrule: rrule::RRuleSet = exp.parse().unwrap();
println!("{:#?}", rrule);
} When I build this program (in release mode) prior to this PR the size of the binary is 3979720 bytes. After the PR it's 2812328 bytes. |
Fixed a lint:
|
I missed that, makes sense. I have to admit that I am a bit skeptical of merging this as the regex parsing has worked well and IIRC was copied from some other much more used rrule implementation in another language. The size reduction is quite impressive though! Do you have a use-case where the size reduction would be beneficial? I can imagine that this crate is already too big to be used in any resource constrained devices |
It's for use on a resource constrained device yes. :) Not that 1 MiB is impossible to fit but it's quite a waste of both precious IoT bandwidth and disk space. |
Some(part) => part.as_str() == "Z", | ||
None => false, | ||
// Parse date (YYYYMMDD). | ||
let year = val[0..4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will panic if the character at byte 3 is a multi-byte character.
Such an input would be absolute garbage, but the parser should not crash in such a case and return an error instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that valid RRULEs are strictly ASCII, you can probably treat this as a raw byte array instead of a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will panic if the character at byte 3 is a multi-byte character.
That's very true, thanks for catching! And I agree that it should be handled gracefully.
And yes, either ensuring that string is ascii or using .get(..)
instead of [..]
would probably solve this. I will add some tests and a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that valid RRULEs are strictly ASCII, you can probably treat this as a raw byte array instead of a string.
Btw, raw byte arrays (&[u8]
) doesn't have the convenient parsing methods for integers so that's why I didn't convert it but just checked that it's all ascii.
Rebased on main and added test and fix for the error @WhyNotHugo pointed out. |
Replace use of regex with custom parsers to decrease dependencies and binary size.
This PR removes both the
regex
andlazy_static
as direct dependencies. This mainly to decrease the binary size.In release mode, with stripped binary (x86_64), a simple RRuleSet parse example is decreased with ~1.1 MiB compared to when build before this PR.
I haven't measured how it compares performance wise.