How to ensure regexes in a RegexSet
are mutually exclusive?
#837
-
Thanks for great work. Is it possible to somehow ensure all regexes in a Thanks again. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
It's possible in theory, because you can take any two automata and intersect them. If the resulting automata has any reachable match states, then there must be some overlap. Conversely, if there are no reachable match states in the intersection, then the set of strings each regex matches must therefore be disjoint. But this crate doesn't (and will never) expose the necessary utilities to do such a thing because it's not terribly practical to do in the context of a general purpose regex library. Short of that, I don't know of a way to answer such a query in the general case. If you can assume something about the regexes you're using, then there are likely shortcuts. e.g., My guess is that the XY problem may be at work here. It might be more helpful if you describe the higher level problem you're trying to solve. |
Beta Was this translation helpful? Give feedback.
-
Thanks @BurntSushi , Let's say, we are taking configuration from user for routing url to files. With one capture in url match denoting slug for folder, and other capture denoting file path in matched folder. How to ensure configuration will never lead to overlaps and ambiguities? The folder slug may be sub-domain name or sub path in a domain, etc.. |
Beta Was this translation helpful? Give feedback.
It's possible in theory, because you can take any two automata and intersect them. If the resulting automata has any reachable match states, then there must be some overlap. Conversely, if there are no reachable match states in the intersection, then the set of strings each regex matches must therefore be disjoint.
But this crate doesn't (and will never) expose the necessary utilities to do such a thing because it's not terribly practical to do in the context of a general purpose regex library. Short of that, I don't know of a way to answer such a query in the general case. If you can assume something about the regexes you're using, then there are likely shortcuts. e.g.,
[a-z]{2}
and[a-z…