Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New traversal proposal. #1523

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

New traversal proposal. #1523

wants to merge 27 commits into from

Conversation

ml86
Copy link
Contributor

@ml86 ml86 commented Nov 11, 2021

This is my proposal for a new generic traversal supposed to be used as replacement for our current Traversal class to build up our DSL steps.
This proposal offers three advantages when compared to the current state:

  1. DSL steps only need to be defined once to be usable on single elements as well on collections/traversals of elements and it does so without wrapping single elements in some kind of single element collections.
  2. It is equally fast or faster when using C2 JIT (the default one on JVM 8 and 11) and it is much faster when using Graal JIT (can be enable in JVM 11 via flags or by using GraalVM). In fact for the performance tests i made so far the new DSL with Graal JIT runs as fast as using no traversal wrapper at all.
  3. DSL step definitions can be used for lazy iterations via Iterator as well as non lazy iterations via Iterable just buy including a different implicit PipeOps iterator.

Before i go into some of the details i dont want to spare you the drawbacks:

  1. People writing DSL steps will be exposed to more complex generic programming than it is currently the case. E.g. higher kinded generic type parameters are used. I have come up with a strict schema for the DSL steps in order to keep complexity low.
  2. We wont be able to swap in the new traversal without some stylistic changes in the usage pattern of steps which take implicit parameters: E.g. repeat(...)(_.emit) would need to become repeat(..., _.emit). With scala 3 we would not need to make that change.

I will now provide an example of the new DSL step definitions for the referencingIdentifiers step on LOCAL nodes:

// Implicit function which wraps a single LOCAL node into a LocalTraversalNew.
// This is quite analog to the current DSL and goes into some language package object.
// FT[_] will be explained futher below.
implicit def toLocalTraversalNew1[I <: nodes.Local, FT[_]](trav: I): LocalTraversalNew[I, Trav1, FT] = {
  new LocalTraversalNew(trav: Trav1[I])
}

// Implicit function which wraps an input traversal(IT) of LOCAL nodes into a LocalTraversalNew. 
// This is quite analog to the current DSL and goes into some language package object.
// FT[_] will be explained futher below.
implicit def toLocalTraversalNew2[I <: nodes.Local, IT[_], FT[_]](trav: IT[I]): LocalTraversalNew[I, IT, FT] = {
  new LocalTraversalNew(trav)
}

// Extension class which works on any input traversal(IT) of LOCAL nodes.
class LocalTraversalNew[I <: nodes.Local, IT[_], FT[_]](val pipe: IT[I]) extends AnyVal {
  // Extension method takes implicit traversal operation ops1 which provides the basic traversal operations
  // like map and flatMap for the input traversal type (IT). Some of the basic traversal operations like flatMap
  // ​have an output traversal type which is different from IT. We call it flatMap traversal(FT).
  // The seconds traversal operation ops2 is than required to again provide the basic traversal operations
  // for the result of flatMap.
  def referencingIdentifiers(implicit ops1: TravOps[IT, FT], ops2: TravOps[FT, FT]) = {
    pipe
      .flatMap(_._refIn.asScala)
      .filter(_.label == NodeTypes.IDENTIFIER)
      .cast[Identifier]
  }
}

Keep in mind this PR is just a proposal so it contains some stuff off less importance.
The new traversal core definitions are under semanticcpg/src/main/scala/io/shiftleft/semanticcpg/langv2/ and
MethodTraversal.scala, AstNodeTraversal.scala and LocalTraversal.scala contain some example step implementations.
Please ignore semanticcpg/src/main/scala/io/shiftleft/semanticcpg/language/MySteps.scala and
semanticcpg/src/main/scala/io/shiftleft/semanticcpg/language/PipeOps.scala. They contain my first attempt for a new traversal but that approach while easier for the step writer turned out to be too slow.

@fabsx00
Copy link
Contributor

fabsx00 commented Nov 11, 2021

From your description, the advantages clearly outweight the disadvantages. The one thing I ask myself when seeing the example code is whether we can find any ways of making the complexity less visible to the language writer. @mpollmeier what are your thoughts?

@mpollmeier
Copy link
Contributor

Just discussed this in detail with markus, and he's looking into a few additional points (e.g. support for PathAwareTraversal and .repeat()(_.until) with type application). We'll let it sink in over the weekend.

Additional benefit that's not in the description yet: the end user (not the language designer) doesn't need to learn about Traversal - she can just continue to use Iterator or whatever collection they have at hand.

Additional downside: when the language designer adds a non-trivial step (like in LocalTraversalNew.referencingIdentifiers), she only has the methods that we defined in AnyTraversal at hand. Currently, with Traversal, we have all collection methods, including things like zipWithIndex, partition etc. We'd need to make sure we cover enough ground here.

We also discussed and agreed that it makes sense to bring this in before upgrading to Scala 3, mostly because changing the Scala version and traversal impl wold mean a hard cut for all existing Scala 2 users.

It'll be crucial to document this really really well, especially those additional generic terms like InputTraversal(IT) / FlatMapTraversal(IT) and TravNOps.

Overall I'm positive for this PR. The complexity certainly goes up, but the benefits seem to outweigh the downsides.

Copy link
Contributor

@bbrehm bbrehm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale, please re-request review once you're ready to continue.

Copy link
Contributor

@fabsx00 fabsx00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants