Excessive stack usage during initialization #1149

Palkovsky · 2024-01-11T13:19:51Z

Palkovsky
Jan 11, 2024

Hello,

My use-case is currently quite niche, because I'm running the regex crate in the Windows kernel space. One of the problems I've encountered is an excessive stack usage during the initialization (call to Regex::new or RegexSet::new).

Specifically, the new function in regex-automata/meta/strategy.rs uses ~10kB of stack by itself (or ~20kB with perf-dfa-full enabled). If the regex initialization finds itself somwehere deeper in the call stack, the usage will become even greater. -Z emit-stack-sizes could be used to dump the stack usage.

Luckily, the WDK offers KeExpandKernelStackAndCalloutEx, so I was able to initialize the engine anyway using the extended stack. But I'd rather use this solution only as a guardrail, since it has its limitations - it will not permit to allocate more than ~64kB of stack. The matching part doesn't seem to exhibit similar stack consumption, so the problem is initialization-specific.

In theory, regex-lite should consume less stack, since there's much less to initialize. On the other hand, it is much slower (I believe it uses only the PikeVM?) and possibly too slow for the kernel space. Also it lacks RegexSet support, which offers great performance benefits when there's a large number of regexes.

I created a simple program simulating this issue in the Linux user space (using rlimit): https://github.com/Palkovsky/regex-stack-size/blob/master/src/main.rs:

default_features = false, features = ["perf", "unicode"]

// 20kB stack -> Overflow
damacek@ubuntu:~/code/regex-stack-size$ cargo run --release  20000 test '.*t'
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

// 30kB stack -> Overflow
damacek@ubuntu:~/code/regex-stack-size$ cargo run --release 30000 test '.*t'
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

// 40kB stack -> Okay
damacek@ubuntu:~/code/regex-stack-size$ cargo run --release 40000 test '.*t'
The haystack matches the regex pattern.

default_features = false, features = ["perf", "perf-dfa-full", "unicode"]

// 40kB stack -> Overflow
damacek@ubuntu:~/code/regex-stack-size$ cargo run --release 40000 test '.*t'
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

// 50kB stack -> Okay
damacek@ubuntu:~/code/regex-stack-size$ cargo run --release 50000 test '.*t'
The haystack matches the regex pattern.

Is there currently an intention to make the initialization less stack-intensive?

BurntSushi · 2024-01-11T13:24:59Z

BurntSushi
Jan 11, 2024
Maintainer

Making initialization stack intensive was not an intentional thing. And to be honest, I'm not even aware of what specifically is causing it. It could be an easy fix. But it will require investigation. I'm not sure when I'll get to it. But someone can identify specific spots in the code that are using a lot of stack, that will likely make it quicker on my end to fix it. (That's assuming there are some specific spots using a lot of stack. If instead the stack usage is diffuse, then this might be harder to fix.)

0 replies

Shaddy · 2024-01-12T12:48:48Z

Shaddy
Jan 12, 2024

Core created here

regex/regex-automata/src/meta/strategy.rs

Line 153 in 027eebd

let mut core = Core::new(info.clone(), pre.clone(), hirs)?;

#[derive(Debug)]
struct Core {
    info: RegexInfo,
    pre: Option<Prefilter>,
    nfa: NFA,
    nfarev: Option<NFA>,
    pikevm: wrappers::PikeVM,
    backtrack: wrappers::BoundedBacktracker,
    onepass: wrappers::OnePass,
    hybrid: wrappers::Hybrid,
    dfa: wrappers::DFA,
}

Appears to reserve a significant amount of stack when compiled. Specially for wrappers::DFA::new, not sure how to easily fix tho, perhaps boxing but I'm not that familiar with the code.

6 replies

Palkovsky Jan 12, 2024
Author

I'm afraid that boxing only Core itself will only help a little. The functions called in Core::new are consuming a lot of stack as well.

I mangaged to solve this issue by maniacally moving parts of the initialization onto the heap, but there's over 100 LoC changed. This is probably non-mergable, but might by some kind of a starting point: https://github.com/rust-lang/regex/compare/master...Palkovsky:regex:reduce-stack-usage?expand=1

Palkovsky Jan 12, 2024
Author

Personally, I'd start from ensuring that types such as pub(crate) struct ByteSet([bool; 256]); or pub struct ByteClasses([u8; 256]); are never kept on the stack.

BurntSushi Jan 12, 2024
Maintainer

It's not quite as simple as that, because this library supports core-only (so no allocation). And indeed, ByteClasses are used in that context. And zero-copy deserialization needs to be supported.

Let's try a different tact here. Can you share with me the process you're using (which commands are you running) to determine where a lot of stack is being used?

Palkovsky Jan 12, 2024
Author

Sure, I'll prepare a "test harness" allowing to get quicker feedback about stack usage. I'm gonna use: https://github.com/japaric/stack-sizes for that. Will come back to you when it's ready.

Palkovsky Jan 12, 2024
Author

I've prepared a test harness to observe stack consumption: https://github.com/Palkovsky/regex-stack-size/tree/master. The exact steps to obtain stack info can be found in xtask/main.rs, you could just run cargo xtask to get the stack analysis.

Pre-requisites:

It works only on Linux, since -Z emit-stack-sizes works only for ELF targets
It requires a nightly compiler installed

Notes:

The resulting functions are sorted by the stack usage
The compiler performs a lot of implicit inlining. This will not display inlined functions, unless #[inline(never)] is specified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive stack usage during initialization #1149

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Excessive stack usage during initialization #1149

Palkovsky Jan 11, 2024

default_features = false, features = ["perf", "unicode"]

default_features = false, features = ["perf", "perf-dfa-full", "unicode"]

Replies: 2 comments · 6 replies

BurntSushi Jan 11, 2024 Maintainer

Shaddy Jan 12, 2024

Palkovsky Jan 12, 2024 Author

Palkovsky Jan 12, 2024 Author

BurntSushi Jan 12, 2024 Maintainer

Palkovsky Jan 12, 2024 Author

Palkovsky Jan 12, 2024 Author

Pre-requisites:

Notes:

Palkovsky
Jan 11, 2024

Replies: 2 comments 6 replies

BurntSushi
Jan 11, 2024
Maintainer

Shaddy
Jan 12, 2024

Palkovsky Jan 12, 2024
Author

Palkovsky Jan 12, 2024
Author

BurntSushi Jan 12, 2024
Maintainer

Palkovsky Jan 12, 2024
Author

Palkovsky Jan 12, 2024
Author