Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one question, how to do Incremental learning in drain3 training? #97

Open
CH-nolyn opened this issue Feb 21, 2024 · 6 comments
Open

one question, how to do Incremental learning in drain3 training? #97

CH-nolyn opened this issue Feb 21, 2024 · 6 comments

Comments

@CH-nolyn
Copy link

import logging
import sys
import time
from util.config_reader import initialize_template_config
from util.httpserver_operation import training_post_model
from drain3.file_persistence import FilePersistence
from drain3 import TemplateMiner


def process_log_training(raw_log_path, query_data):
    logger = logging.getLogger(__name__)
    logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(message)s')
    scenario = query_data["scenario"]
    output_file = f"{scenario}/drain3_state.bin"
    persistence = FilePersistence(output_file)

    template_miner = TemplateMiner(persistence, config=initialize_template_config(profiling_enabled=True))

    line_count = 0
    with open(raw_log_path, encoding='utf-8') as f:
        lines = f.readlines()

    start_time = time.time()
    batch_start_time = start_time
    batch_size = 10000
    # 逐行训练
    for line in lines:
        line = line.rstrip()
        result = template_miner.add_log_message(line)
        line_count += 1
        if line_count % batch_size == 0:
            time_took = time.time() - batch_start_time
            rate = batch_size / time_took
            logger.info(f"Processing line: {line_count}, rate {rate:.1f} lines/sec, "
                        f"{len(template_miner.drain.clusters)} clusters so far.")
            batch_start_time = time.time()
        if result["change_type"] != "none":
            result_json = json.dumps({
                result["cluster_id"]: {
                    "template_mined": result["template_mined"]
                }
            })
            logger.info(f"Input ({line_count}): " + line)
            logger.info("Result: " + result_json)

    time_took = time.time() - start_time
    rate = line_count / time_took
    logger.info(
        f"--- Done processing file in {time_took:.2f} sec. Total of {line_count} lines, rate {rate:.1f} lines/sec, "
        f"{len(template_miner.drain.clusters)} clusters")

    sorted_clusters = sorted(template_miner.drain.clusters, key=lambda it: it.size, reverse=True)
    for cluster in sorted_clusters:
        logger.info(cluster)

    print("Prefix Tree:")
    template_miner.drain.print_tree()
    template_miner.profiler.report(0)

    training_post_model(output_file)

this is my training code ,so how could i train new log by ex-trained model?

@Superskyyy
Copy link
Collaborator

Training on new log is trivial, as long you have the previous templateminer serialized to some external storage (e.g., in-memory, pickle or redis) Just load back from the storage and continue to add log lines to it will be sufficient.

@CH-nolyn
Copy link
Author

CH-nolyn commented Apr 24, 2024 via email

@Superskyyy
Copy link
Collaborator

class FilePersistence(PersistenceHandler):
    def __init__(self, file_path: str) -> None:
        self.file_path = file_path

    def save_state(self, state: bytes) -> None:
        pathlib.Path(self.file_path).write_bytes(state)

    def load_state(self) -> Optional[bytes]:
        if not os.path.exists(self.file_path):
            return None

        return pathlib.Path(self.file_path).read_bytes()

Calling load_state will suit your needs.

@CH-nolyn
Copy link
Author

CH-nolyn commented Apr 24, 2024 via email

@Superskyyy
Copy link
Collaborator

thx, but one ques is how to code it 

---Original--- From: @.> Date: Wed, Apr 24, 2024 20:46 PM To: @.>; Cc: @.@.>; Subject: Re: [logpai/Drain3] one question, how to do Incremental learning indrain3 training? (Issue #97) Training on new log is trivial, as long you have the previous templateminer serialized to some external storage (e.g., in-memory, pickle or redis) Just load back from the storage and continue to add log lines to it will be sufficient. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

I can write up a small example for you when I have time, but you might need to wait until after the May 1st holiday until I find time.

@CH-nolyn
Copy link
Author

CH-nolyn commented Apr 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants