Creating a DFA for regex with backreferences if the strings are of finite and known length #1247
-
Hello! I understand that under most circumstances, a regex using backreferences cannot be recognized by a DFA since the language isn't regular. However, does this change if the strings recognized by that language are of a predetermined length? My question is inspired by regex crosswords and a brief section on a blog post about solving them. The MIT crossword regex Given this hypothetical DFA, possibly even a Another idea would be to take the DFA, perform a topological sort to find the set of possible characters at a given position in the input string, and perform constraint solving and searching/backtracking to solve the regex crossword. This would also require access to the DFA's states. Is this possible? Sorry for what amounted to a bit of a ramble. The |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
If you can translate the regex with backreferences to an actual regular expressing, then, yes, of course, you could generate a DFA. But if you're talking about a regex with over a million different branches, then good luck. It probably won't go over well. And note that this crate has a limit on the total number of states in a DFA that it can generate (based on how many states can be represented in ~32 bits). That limit might seem constraining, but in practice, a DFA that big is going to take a very very long time to generate and generally be completely unwieldy to actually use.
Well, what would the API you want actually look like? I claim that it would basically amount to what's on the |
Beta Was this translation helpful? Give feedback.
-
If you're just doing this "for fun," then one very quick way to do this would be to just fork |
Beta Was this translation helpful? Give feedback.
-
Incidentally, I just wrote a code generator that converts a DFA to Rust code here: https://github.com/BurntSushi/duration-unit-lookup/blob/master/gendfa/main.rs It uses It is not a "full" example in the sense that the code generator is correct for all possible DFAs. But it should be a good starting point depending on what you're doing. |
Beta Was this translation helpful? Give feedback.
If you can translate the regex with backreferences to an actual regular expressing, then, yes, of course, you could generate a DFA.
But if you're talking about a regex with over a million different branches, then good luck. It probably won't go over well. And note that this crate has a limit on the total number of states in a DFA that it can generate (based on how many states can be represented in ~32 bits). That limit might seem constraining, but in practice, a DFA that big is going to take a very very long time to generate and generally be completely unwieldy to actually use.