Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Define a uast.ContentOf helper #377

Merged
merged 1 commit into from
Mar 26, 2019
Merged

Define a uast.ContentOf helper #377

merged 1 commit into from
Mar 26, 2019

Conversation

dennwc
Copy link
Member

@dennwc dennwc commented Mar 14, 2019

Define a uast.ContentOf helper as requested in #376.

Fixes #376

Signed-off-by: Denys Smirnov [email protected]

@dennwc dennwc self-assigned this Mar 14, 2019
@dennwc dennwc requested review from creachadair and bzz March 14, 2019 15:48
Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this change seems fine, but I have a couple of comments for discussion.

uast/types.go Outdated Show resolved Hide resolved
@@ -245,6 +245,11 @@ func RolesOf(n nodes.Node) role.Roles {
}

// TokenOf is a helper for getting node token (see KeyToken).
//
// The token is an exact code snippet that represents a given AST node. It only works for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily in this change, but we should think about how to document expectations for the user of the UAST, about which nodes can be expected to have exact snippets and which do not. (For example: Do numeric literals have one? I think so, but the reader could not be sure, and I couldn't find a table of rules anywhere).

Anyway, I think that's something we should figure out how to document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly: I'm worried about the expectations set up by the native/annotated vs. semantic split: If someone calls TokenOf(n) on a Semantic node n, will they get "" always? Or will it just magically work sometimes and not others? Or will it work but return canonicalized text?

For the caller, I think that disparity is going to be frustrating. Maybe we could fix it by making the API distinguish ("TokenOf always returns an empty string in Semantic mode") or maybe we should just have a single API (TextOf?) whose return value is canonicalized or not depending on mode. I don't know the right answer here—and we don't necessarily need to figure it out right now—but I am concerned that we are adding to the API surface only to satisfy a particular use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenOf by definition should always either return a code snippet or nothing. Unfortunately defining all the node types that will have the token is impossible since it differs between the languages even for trivial nodes like string literals (C# has no token in the literal node, but has a separate token node).

In general, I plan for TokenOf to always return a code snippet for any native AST. But it needs access to the source file to do it and a different kind of API. In SDK v3 I think it makes sense to replace all node types with opaque *Node that will also keep a pointer to the original source file, so we can always get snippets based on the node position.

Regarding Semantic nodes, it's a bit tricky. We can still get the snippet based on positions, but Semantic nodes might have a completely different structure and the snippet for inner node might be larger than a snipped for the parent node. Because of this, it may make sense to give a different API promise. We may allow storing both Native and Semantic nodes (+ the source) and clearly communicate in docs that to get the code snippet (token) for a subtree, the user will first need to jump from Semantic node to associated Native. This will may help to draw a clear and easy to understand boundary for the end user.

For the ContentOf, as explained in the original issue, it's not meant to be very precise in contrast to TokenOf. It's just a quick way to get any text content for a specific node, thus it will try to first use any canonical text field of Semantic nodes, and if they are not available - fallback to TokenOf. So they really serve a different purpose.

Copy link
Contributor

@bzz bzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the motivation for this helper, have read the code but have not really understood how this works.

And here is my +1 not to block the merge 😆

On the serious note though - it's completely not clear to me how a user is expected to discover this, esp with regarding the difference in annotated/semantic modes 😕

@dennwc
Copy link
Member Author

dennwc commented Mar 15, 2019

I see that there is no understanding in the way how it works and/or the difference between the two, so let's discuss it in more details. The change may wait a bit.

@bzz
Copy link
Contributor

bzz commented Mar 19, 2019

Makes sense to me.

Also, form the API user's perspective, most probably we should have a clear shared understanding on how developer's experience of using go-client with these helpers is "transferable" to using python and jvm client APIs.

I did not have a chance to dig deeper yet, but just assume that the user needs will be the same (like #376) but the solution for different clients might be different.

@dennwc
Copy link
Member Author

dennwc commented Mar 20, 2019

Right, but this is change to SDK to allow this kind of functionality in general. Client libraries may still expose/document it differently.

So, is there any suggestions for the better naming, better docs, or anything else that will clarify the API contract for those two functions?

@dennwc
Copy link
Member Author

dennwc commented Mar 26, 2019

@bzz @creachadair If there are no specific suggestions here, I'm merging this.

@creachadair
Copy link
Contributor

@bzz @creachadair If there are no specific suggestions here, I'm merging this.

I think it's OK to go ahead and merge it. The more general issue of how to go back and forth between the UAST structure and the underlying tokens & text is a larger one we'll have to continue to think about, but this solves an existing problem.

@bzz
Copy link
Contributor

bzz commented Mar 26, 2019

Right, but this is change to SDK to allow this kind of functionality in general. Client libraries may still expose/document it differently.

yes, but this change is clearly driven by the go-client user feature request. My only concern is that we basically keep "enhancing" only one language client with such changes, and the only suggestion is not to just do it, but also to have some shared bigger plan on how do we make things like that exposed in other clients and how do we manage these expectations for there users.

E.g shall we have an issue somewhere, documenting the need of exposing this to clients so users beyond go could benefit from that?

@dennwc
Copy link
Member Author

dennwc commented Mar 26, 2019

The general notion is to port/expose all the features of the Go client to other clients as well, assuming some adaptation to the target language.

As a note, I consider everything exposed by uast package in SDK a "development preview". Once we expose it in the libuast or the Go client - it becomes a supported API. But this is not mentioned anywhere in the docs.

Opened an issue for this: #387

Copy link
Contributor

@bzz bzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 thank you for clarifications, that makes perfect sense now!

@dennwc dennwc merged commit bbb1495 into bblfsh:master Mar 26, 2019
@dennwc dennwc deleted the content_of branch March 26, 2019 15:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants