Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AMBIG_MULTIPLE for llvm, Rust, go, and the vmops dump #486

Merged
merged 22 commits into from
Aug 16, 2024

Commits on Jul 27, 2024

  1. Move out retlist construction.

    katef committed Jul 27, 2024
    Configuration menu
    Copy the full SHA
    366fce6 View commit details
    Browse the repository at this point in the history

Commits on Jul 28, 2024

  1. Constuct retlist ahead of time for all vm-based formats.

    Now the VM IR ops point to retlist entries, rather than carrying their own endid sets. This means we can use the same indexing to de-dup them for every VM-based format.
    katef committed Jul 28, 2024
    Configuration menu
    Copy the full SHA
    684433c View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2024

  1. Add AMBIG_MULTIPLE output for go.

    The generated code looks like this:
    ```go
    package fsm_fsm
    
    var ret0 []uint = []uint{1}
    var ret1 []uint = []uint{2}
    var ret2 []uint = []uint{1, 2}
    var ret3 []uint = []uint{0, 1, 2}
    
    func fsm_Match(data string) (bool, []uint) {
    	var idx = ^uint(0)
    
    	if idx++; idx >= uint(len(data)) {
    		return true, ret0
    	}
    
    	if data[idx] == 'a' {
    		goto l3
    	}
    
    	if data[idx] == 'b' {
    		goto l2
    	}
    
    	if data[idx] != 'c' {
    		return false, nil
    	}
    
    l0: // e.g. "c"
    	if idx++; idx >= uint(len(data)) {
    		return true, ret2
    	}
    
    	if data[idx] <= '`' {
    		return false, nil
    	}
    
    	if data[idx] <= 'b' {
    		goto l1
    	}
    
    	if data[idx] == 'c' {
    		goto l0
    	}
    
    	{
    		return false, nil
    	}
    
    l1: // e.g. "aa"
    	if idx++; idx >= uint(len(data)) {
    		return true, ret1
    	}
    
    	if data[idx] <= '`' {
    		return false, nil
    	}
    
    	if data[idx] <= 'c' {
    		goto l1
    	}
    
    	{
    		return false, nil
    	}
    
    l2: // e.g. "b"
    	if idx++; idx >= uint(len(data)) {
    		return true, ret3
    	}
    
    	if data[idx] == 'a' {
    		goto l1
    	}
    
    	if data[idx] == 'b' {
    		goto l2
    	}
    
    	if data[idx] == 'c' {
    		goto l1
    	}
    
    	{
    		return false, nil
    	}
    
    l3: // e.g. "a"
    	if idx++; idx >= uint(len(data)) {
    		return true, ret1
    	}
    
    	if data[idx] == 'a' {
    		goto l1
    	}
    
    	if data[idx] == 'b' {
    		goto l2
    	}
    
    	if data[idx] == 'c' {
    		goto l1
    	}
    
    	{
    		return false, nil
    	}
    
    }
    ```
    katef committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    5092212 View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. Configuration menu
    Copy the full SHA
    7cdb00b View commit details
    Browse the repository at this point in the history
  2. Whitespace.

    katef committed Jul 31, 2024
    Configuration menu
    Copy the full SHA
    7e970d7 View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2024

  1. Configuration menu
    Copy the full SHA
    f2bae91 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ac4a66a View commit details
    Browse the repository at this point in the history
  3. Support for enids for the vmops output.

    I've handled AMBIG_NONE here, but I haven't distinguished the other ambig modes. Other than AMBIG_NONE, the other modes are all presented as an array of ids, even if it's just a single element. That's because I don't see any reason to give these specialised APIs for the current use-cases for this generated code, which is supposed to be a direct representation of our VM opcodes.
    katef committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    e484294 View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2024

  1. First cut at AMBIG_MULTIPLE for llvm.

    It's a bit rough, this isn't what I want to end up with, but I wanted to
    commit this as a waypoint.
    katef committed Aug 6, 2024
    Configuration menu
    Copy the full SHA
    2698127 View commit details
    Browse the repository at this point in the history
  2. Rework stop: to index into an array for return values.

    This means the phi instruction now only carries an array index.
    
    Many thanks to @mcy for advice and patient help here.
    katef committed Aug 6, 2024
    Configuration menu
    Copy the full SHA
    7426501 View commit details
    Browse the repository at this point in the history
  3. Don't use undef for values the caller is expected to not access, us…

    …e `poison`.
    
    Thanks to @mcy for this.
    katef committed Aug 6, 2024
    Configuration menu
    Copy the full SHA
    f69dbfc View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2024

  1. Split overriding comments to a separate hook.

    This allows callers to default the codegen for accepting states (in particular outputting the values for endids) independently of commenting caller-specific meanings for the IDs.
    katef committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    292fd1c View commit details
    Browse the repository at this point in the history
  2. No need to emit a bitmap here.

    Originally I'd intended this as a demonstration of how various applications can handle ambiguities differently. But now we have library support for AMBIG_MULTIPLE, I think this is just confusing.
    katef committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    bb4e81d View commit details
    Browse the repository at this point in the history
  3. Use id count to indicate match failure, and >= 0 means success.

    The idea here is just to trim down %rt to:
    ```
    %rt = type { ptr, i64 }
    ```
    
    where we clearly don't need a uint64_t's count of unique ids.
    
    I've purposefully not done the same for single-id interfaces. I don't want to mix success/failure with an id *value*, because the values are opaque (i.e. the meaning of an id value is the responsibility of the caller). Whereas here for AMBIG_MULTIPLE I'm mixing success/failure with the count, not with the id values.
    
    Suggested by @mcy, thank you.
    katef committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    6c17934 View commit details
    Browse the repository at this point in the history
  4. Clarification.

    katef committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    c44bfd2 View commit details
    Browse the repository at this point in the history
  5. Add AMBIG_MULTIPLE output for Rust.

    The generated code looks like this:
    ```rust
    pub fn fsm_main(input: &str) -> Option<&'static [u32]> {
        use Label::*;
        static RET0: [u32; 1] = [1];
        static RET1: [u32; 1] = [2];
        static RET2: [u32; 3] = [0, 1, 2];
    
        let mut bytes = input.bytes();
    
        pub enum Label {
            Ls, L0,
        }
    
        let mut l = Ls;
    
        loop {
            match l {
                Ls => { // e.g. ""
                    let c = match bytes.next() {
                        None => return Some(&RET0) /* "x?" */,
                        Some(c) => c,
                    };
                    if c != b'x' { return None }
                    let c = match bytes.next() {
                        None => return Some(&RET2) /* "x", "x?", "x+" */,
                        Some(c) => c,
                    };
                    if c != b'x' { return None }
                    l = L0; continue;
                }
    
                L0 => { // e.g. "xx"
                    let c = match bytes.next() {
                        None => return Some(&RET1) /* "x+" */,
                        Some(c) => c,
                    };
                    if c != b'x' { return None }
                    l = L0; continue;
                }
            }
        }
    }
    ```
    katef committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    ba70069 View commit details
    Browse the repository at this point in the history
  6. Factor out print_ret().

    katef committed Aug 7, 2024
    Configuration menu
    Copy the full SHA
    218e90c View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    021d169 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2024

  1. Whoops... wrong exit status.

    katef committed Aug 11, 2024
    Configuration menu
    Copy the full SHA
    719b1b2 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. Configuration menu
    Copy the full SHA
    690f0e2 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2024

  1. Separate calling points for the comments hook.

    This gives better control over whitespace and punctuation between the hooks. For example we can output "<accept>, <comment>\n" with a comma between, and that sits more nicely for single-line comments. Previously these had to be "<accept> <comment>,\n"
    katef committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    0f4d838 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2024

  1. Typo.

    Spotted by Scott.
    katef committed Aug 16, 2024
    Configuration menu
    Copy the full SHA
    ab0a411 View commit details
    Browse the repository at this point in the history