Replies: 5 comments 1 reply
-
This looks super interesting and I like the idea of nano passes. Could you explain the benchmark in more detail? How does it search the nodes? What "real world" use case does it model? I hope to have more time tomorrow to look into this in detail. |
Beta Was this translation helpful? Give feedback.
-
Is there any prior art on nano-passes combined with query architectures? I could see it working, but it could cause a lot of nested calling if we have a query that takes 5-10 passes to resolve. As for flat trees, this seems quite similar to data oriented programming, in that it exploits cache locality. The question then becomes if the passes have a consistent order in which they traverse the trees. The library appears to assume pre-order, which may not be true for every pass. In my experience, it's in the interest of most compilers to move to a block based SSA/three address code intermediate representation, simply because you remove the arbitrary nesting that you can get with trees. It also makes optimizations like dead code elimination and constant folding a lot easier. Perhaps this is a way to gain performance as well, since block based IRs probably have less pointer chasing. |
Beta Was this translation helpful? Give feedback.
-
As I said before, I'm excited that we start exploring new representations for the AST. Another interesting read is about the ZIG compiler's rework of the memory layout which seems related. What's unclear to me right now is what's the scope of what you're proposing. Are you proposing to use this new AST for linters only, only for additional passes right after the parser (scope analysis, rewriting the tree) or replace rowan completely? Is this a structure we could use for transforms as well that heavily mutate the tree? Also, what are the advantages disadvantages of this new approach. We'll probably loose caching. How important do we find caching? It would help if you could extend your explanation covering the use cases and advantages/disadvantages of the current / new approach. Having a way to perform a cheap second pass after parsing would be useful to implement many of JavaScript's semantic checks or to rewrite the AST, for example, rewriting the left hand side of an assignment to an That raises the question how performant tree modifications are or if this will require re-creating the whole tree everytime we change a single node. Only giving pre-order traversal isn't sufficient for the formatter. The formatter often times needs to look at the predecessor token to properly format comments or determine the number of lines to insert between two statements. And this happens in the critical path: Improving the complexity of retrieving the Have you also looked into alternatives? For example, I believe our rowan implementation is more complicated than it must be and it often allocates |
Beta Was this translation helpful? Give feedback.
-
Doing a quick research on cache hit/miss:
Trivia: 22,849 of 29,574 = 77%
Trivia: 335,245 of 505,603 = 66% |
Beta Was this translation helpful? Give feedback.
-
Expanding on this research I tried to create a pipeline of stages. First obvious implementation is: trait PipelineStage {
fn handle(&mut self, node: &NodeOrToken<SyntaxNode, SyntaxToken>);
}
struct DynPipeline
{
stages: Vec<Box<dyn PipelineStage>>,
} If we run this on our 200/5k/jquery test suite with just one pass counting how many functions there are in each file we get:
On a "normal" file it takes less than 1ms to iterate through all nodes and check its kind against If we increase to 10 passes we see an increase in the time spent. I honestly thought the increase would be linear, but it wasn't. I would not be surprised in a more complex scenario, with different passes, that it will approach linear increase.
In the "normal" file we are already spending +1ms just navigating the tree. Not ideal. There is one "easy" improvement we can do. We can exchange the dynamic dispatch for static dispatch. Rust does not have variadics generics like C++ (https://en.cppreference.com/w/cpp/language/parameter_pack). So I will need to cheat. struct Pipeline<TStage>
where
TStage: PipelineStage,
{
stage: TStage,
}
impl<TStage> Pipeline<TStage>
where
TStage: PipelineStage,
{
pub fn run(&mut self, result: Parse<JsAnyRoot>) {
let root = result.syntax();
let all: Vec<_> = root.descendants_with_tokens().collect();
for t in all.iter() {
self.stage.handle(t);
}
}
} This allows me to have zero dynamic disptach. let mut pipeline: Pipeline<CountFunctionsStage> = Pipeline::new();
pipeline.run(result); With the problem of being limited with just one stage. I can break this limitation doing: #[derive(Default)]
struct LinkedListStages<TCurrent, TNext>
where
TCurrent: PipelineStage,
TNext: PipelineStage,
{
current: TCurrent,
next: TNext,
}
impl<TCurrent, TNext> PipelineStage for LinkedListStages<TCurrent, TNext>
where
TCurrent: PipelineStage,
TNext: PipelineStage,
{
fn handle(&mut self, node: &NodeOrToken<SyntaxNode, SyntaxToken>) {
self.current.handle(node);
self.next.handle(node);
}
} I can sugar the creation of the pipeline with type Stages2<A, B> = LinkedListStages<A, B>;
type Stages3<A, B, C> = Stages2<A, Stages2<B, C>>;
type Stages4<A, B, C, D> = Stages3<A, B, Stages2<C, D>>;
type Stages5<A, B, C, D, E> = Stages4<A, B, C, Stages2<D, E>>; or with macros. This allows me to do: let mut pipeline: Pipeline<
Stages10<
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
CountFunctionsStage,
>,
> = Pipeline::new();
pipeline.run(result); The advantage is that being everything static, the optimizer shows its value and we get:
This puts us under 1ms again for the "normal" file or five times faster than using dynamic dispatch. I still find this approach of navigating all nodes searching the one you wants the "worst case scenario". We must support this, and this approach seems to be the best we can do at the moment. |
Beta Was this translation helpful? Give feedback.
-
We will soon start doing semantic analysis and using our CST/AST for more complex processing. This begs the question of how we are going to architect this. One possibility is using nanopasses:
Generics in Small Doses: Nanopass Compilation with Haskell
https://offog.org/publications/fita200811-generics.pdf
LLVM for Grad Students
https://www.cs.cornell.edu/~asampson/blog/llvm.html
and others...
To assess the feasibility of using nanopasses I did a quick check: how long it takes to transverse all nodes of a small file (200 lines), a big file (5000 lines) and huge files (jquery). I also implemented a new way to build and transverse our nodes for comparison, called "flat tree" below.
For more details: https://github.com/mamcx/tree-flat
Transversing is much faster because it is pretty much iterating a
Vec
.At 100 nanopasses, both solutions are pretty similar if we cache all nodes. This is very interesting because we have a nice solution already at our door. But...
Using "flat trees" allows me to more easily choose "columnar" modelling and have a
Vec<JsSyntaxKind>
, which means we can find certain kinds using SIMD... and that makes all the difference.Nothing changed when parsing.
The table above shows that we can find all nodes of a certain kind a hundred times and still be two or three times faster than we can do it only once with our current code. This is almost 300x faster.
In the most complex case, jquery, we are transversing and comparing all nodes a hundred times, and we still are faster than just parsing today. This means it seems possible to have a pipeline of nanopasses with negligible costs for transversing our parsed code.
It is also possible that we can improve parsing even more by moving from a "Events + Treesink" model to one building the flat tree directly.
To play with this code: https://github.com/rome/tools/tree/poc/flat-trees
My suggestion is that we continue this POC:
rome_rowan
. Do we still need it?Opinions?
How to run
Beta Was this translation helpful? Give feedback.
All reactions