R3: RESTful Pattern Recognition (R3) for the Apache SkyWalking AI pipeline.
Modern APIs are often written in a RESTful convention. For example, the below endpoint contains four parameters, meaning each instance of this endpoint will be different although they share the same pattern.
/api/v{apiVersion}/artists/{artistid}/moments/{postid}/comments/{commentid}
While the DevOps team could setup rules for grouping such URI patterns, it quickly gets overwhelming when there are numerous endpoints across services.
R3 is a project that entirely eliminates the need for writing complex expressions to group RESTful endpoints for runtime performance analysis tasks.
IMPORTANT The R3 algorithm is based on machine learning and, as with any algorithm, it doesn't guarantee 100% accuracy (still, it's highly accurate). However, it offers a powerful and convenient solution for grouping RESTful endpoints in any scenario.
Currently, R3 offers a simple gRPC service that could be deployed easily at local or containerized environments.
The simple server is the best way to get started, which could steadily serve 500+ SkyWalking services * 3000 uris per minute).
To run the R3 service on localhost:
python -m servers.simple.run
To deploy as a container:
docker run -d --name r3 -p 17128:17128 r3:latest
The following URL would recognize the pattern as /api/users/{var}
, since the last part of URL are different for each instance.
- /api/users/cbf11b02ea464447b507e8852c32190a
- /api/users/5e363a4a18b7464b8cbff1a7ee4c91ca
- /api/users/44cf77fc351f4c6c9c4f1448f2f12800
- /api/users/38d3be5f9bd44f7f98906ea049694511
- /api/users/5ad14302e7924f4aa1d60e58d65b3dd2
The following URL would keep the original URL, not parametrized, since the all part of URL are word.
- /api/sale
- /api/product_sale
- /api/ProductSale
The following URL would keep the original URL, not parametrized, since the sample count is lower than the threshold(combine_min_url_count
).
If the sample count equals or bigger than the threshold, the URL would be parametrized.
Such as the threshold is 3
, the following URL would keep the original URL, not parametrized.
- /api/fetch1
- /api/fetch2
But the following URL would be parametrized to /api/{var}
, since the sample count is bigger than the threshold.
- /api/fetch1
- /api/fetch2
- /api/fetch3
If the part of URI contains version number, such as v1
, v2
, v3
, the version number part would not be parametrized.
Such as the following URL would not be parametrized:
- /test/v1
- /test/v2
- /test/v3
If still not matter the other part of URI to be parametrized, such as the following URI would be parametrized to /test/v1/{var}
and /test/v999/{var}
.
- /test/v1/cbf11b02ea464447b507e8852c32190a
- /test/v1/5e363a4a18b7464b8cbff1a7ee4c91ca
- /test/v1/38d3be5f9bd44f7f98906ea049694511
- /test/v999/1
- /test/v999/2
- /test/v999/3
If you are curious how the algorithm actually works or decided to improve upon it, please first read the URIDrain Overview and checkout the algorithm live demo by running below commands:
To run a demo of the algorithm (implemented with Gradio):
- Install the dependencies with
make install
(ormake env
if you plan to contribute code) - Run
python demo.demo_gradio
- Open
http://localhost:8080
in your browser or access through remote gradio service from the web by settinglaunch(share=True)
- Enjoy!
This project is dual-licensed under MIT and Apache 2.0.
The URIDrain algorithm implemented in this project is a modified version of the upstream Drain3 log clustering algorithm.
Therefore, the modified algorithm is also licensed under MIT as the upstream. The remaining utilities and services are licensed under Apache 2.0, which also allows commercial usage as long as users adhere to the license terms.
We welcome contributions from the community to make R3 more robust. Whether it's bug fixes, feature enhancements, or new ideas, your input is valuable.