Replies: 1 comment 3 replies
-
Hmm, this is a challenging task and I am sure I am lacking context based on the description. I am thinking if standard consolidation/grouping practices help here, e.g. normalize logs early in the process and introduce some patterns to check against. If you add more details and examples maybe we can come up with better recommendations. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a lot of log collection, and the simplest way to manage them is to use the filename as a source. However, this may lead to too many sources and impact performance, so we do some merging. But in the transform stage, different logs may have different conditions for multi-line matching or different rules for timestamp extraction. This could lead to a large number of transforms, possibly over a thousand. What are some good solutions for such a scenario? Our daily log collection volume is more than 30TB, and the logging scenarios are complex, with inconsistent encoding and a large number of log collections, making the cleansing rules complex.
Beta Was this translation helpful? Give feedback.
All reactions