-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search very slow when logline has only one token (e.g. from masking) #37
Comments
When looking at if cluster.log_template_tokens[0]==tokens[0]:
return cluster It seems to considerable speed up the whole thing - the effect is less visible with max_clusters activated but that seems to have an very strong negative effect on performance in any case |
Hello, |
Hi, |
Typical usage of Drain is for extracting up to few thousands of templates. Perhaps with the new |
I had to do some other topics but want to get back on drain - I plan to test it with billions of streamed loglines and it seems e.g. the patch shown here still adds some huge improvements. When I use e.g. Perhaps it makes sense to add some |
Another demo with a significantly bigger log file can be a great addition. |
I have a lot of lines that are are masked completely as they contain a lot of rubbish - e.g. resulting in one token (=mask) which will become the template.
For some reason search on these lines is extremely slow (given we have a bigger search tree already) while it should be actually super fast as they have only one token.
I cannot give a good example of the log due to confidentiality but perhaps this issue/limitation is generally known already?
The text was updated successfully, but these errors were encountered: