Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove orphan nodes from serialized egraphs #368

Closed
wants to merge 3 commits into from

Conversation

oflatt
Copy link
Member

@oflatt oflatt commented Apr 23, 2024

When a maxmim number of serialized enodes per function is set, or when users use delete, the current serialization code leaves off children.
This results in confusing visualizations and ill-typed terms. We ran into this in eggcc.

Here I have an ugly solution that removes nodes until a fixed point, similar to extraction. It ensures all the nodes in the serialized egraph have children.

@oflatt oflatt requested a review from a team as a code owner April 23, 2024 23:30
@oflatt oflatt requested review from mwillsey and saulshanabrook and removed request for a team April 23, 2024 23:30
@oflatt oflatt marked this pull request as draft April 23, 2024 23:50
@oflatt
Copy link
Member Author

oflatt commented Apr 23, 2024

This still has problems- rewriting now

@oflatt
Copy link
Member Author

oflatt commented Apr 24, 2024

I'm quite unhappy with this PR, but I don't see another way to deal with the problems with orphaned nodes.
Perhaps @saulshanabrook has a suggestion for how to clean it up?

@oflatt oflatt marked this pull request as ready for review April 24, 2024 00:15
@yihozhang
Copy link
Collaborator

Do you have an example where things can go off?

@yihozhang
Copy link
Collaborator

oh nvm, I understand the problem now with the comments in the PR. I think it is on the users to not use the delete action that could lead to e-classes being ungrounded? Do we create ungrounded e-classes with delete in eggcc?

I don't have strong opinions about what we should do with viz under size limits.

@saulshanabrook
Copy link
Member

saulshanabrook commented Apr 24, 2024

When a maxmim number of serialized enodes per function is set, or when users use delete, the current serialization code leaves off children.
This results in confusing visualizations and ill-typed terms. We ran into this in eggcc.

I hadn't thought about the delete case! It seems a bit pathological... Do you find it coming up somewhere? I wonder if should even allow this behavior in the first place...

In terms of the max number of nodes, I agree it produces visualizations that have the wrong number of arguments for functions, if they were omitted they will just not be shown.

Again for this, if you have a particular example that would be helpful to look at. The trimming is meant as some sort of debugging tool... to still give you a sense of the graph without having to omit all of it. Maybe based on the example you are looking at we could come up with a reasonable way to make it more helpful?

EDIT: One idea might if a node points to an empty e-class that was removed bc of the max # of nodes, we could add a dummy node like "" to fill it, so the edge can still exist.

@oflatt
Copy link
Member Author

oflatt commented Apr 24, 2024

We are running into this case with eggcc, since we are abusing delete recently. It's easy to write an example, just delete a leaf node: https://egraphs-good.github.io/egglog/?program=XQAAgACTAAAAAAAAAAAUGQgnfMUD9dO10C6gYDgGpv3OlpxDkJUt6jGNQtRZOLSCLUYu5JKbot9I5KG4je3lTY-sI3WWWcM4zLmo_r3QQzlEdQTC1KY2y57eIrs_X_9nHgAA

I think for visualization we should at least mark that we left off some nodes due to size constraints, and have a big warning.

For non-viz use cases, I'm not sure what to do. Give an error? A dummy eclass seems like a bad idea because it's not something that is actually there

@oflatt
Copy link
Member Author

oflatt commented Apr 24, 2024

Current hack for eggcc is to use subsumption instead, and because serialization ignores subsumption right now we are ok

@saulshanabrook
Copy link
Member

Yeah I think we want to do something on the delete case at least...

If we changed the serialization format to point to e-classes instead of e-nodes then we could represent this OK.

@saulshanabrook
Copy link
Member

After we discussed this yesterday, we thought it made the most sense to support this at the serialization level by adding the ability to represent empty e-classes in the serialized format. I opened egraphs-good/egraph-serialize#11 to track that and egraphs-good/egraph-serialize#12 to track adding this algorithm present here to the underlying serialized format.

In tandem with egraphs-good/egraph-serialize#11, we can open a new PR here that fixes how we omit missing nodes by using the new support instead of just omitting them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants