Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Related to issue#335, Prevent concurrent malheur and support malheur … #343

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SeanKim777
Copy link
Contributor

Related to issue #335
Allow malheur running in 'cluster' and 'increment' mode depending on number of analysis reports collected.
Run malheur in 'increment' will reduce its execution time for newly created mist report
Also, prevent concurrent malheur running by use pid file configured in reporting.conf

@mallorybobalice
Copy link

hey, thanks very much for this. I'll try have a thorough read of the commit by the end of the week and to test it on our test box syncing a 2-3k samples to it. (while it's no subs for Brad and co revieweing it -who knows maybe I'll find something interesting bug wise - in a positive peer review way)

Is there any way we could also add a force malheur analysis button in the summary dashboard page or a helper script just in case we want to force it? ( guess I could always vary the threshold )

Is there anything i need to know about the pull request before i try to merge and configure it on the test system?

@mallorybobalice
Copy link

(e.g. wipe existing malheur report or output folders or leave as is?)

@spender-sandbox
Copy link
Owner

I'm not sure about this change. For starters, wouldn't a simple multiprocessing Lock make more sense than maintaining a pid file?

Also, if I remember right, when I was initially testing with malheur, increment mode didn't give me good results (but I was testing at the time with a small number of samples). Have you run a comparison to see if the increment mode with 10k samples or so in the db will show a similarity with a recent sample that also had increment mode run on it?

-Brad

@SeanKim777
Copy link
Contributor Author

@mallorybobalice, not sure whether we need force malheur scan under my change because all the analsysis will have its own mist data generated and increment mode scanned if there are more than 10K analysis(default value for threshold) mist data existing.

@spender-sandbox I just found multiprocessing Lock usage on other module after I have committed changes...my bad, will change if I have spare time shortly...but its functional :)
I have compared with increment VS cluster mode then the result rendered on Django web UI was same. Basically cluster id eg. C000-0123 in malheur.txt was same. The only thing required for increment mode was merge increment mode output file to malheur.txt(previous scan result) (commented on source)
Average scan time with increment mode for my dataset (now 20K analyses) is about 20~30 seconds but as @rieck mentioned on #335 it is expected to take similar amount of time for any new analysis mist data which is a lot better than cluster mode

@spender-sandbox
Copy link
Owner

That cleaned it up nicely -- I am still a bit concerned about this though, as this lock should be highly contentious. If someone's running 100 instances of process.py fully loaded, and to complete processing the lock must be held, and one instance holds it for 20-30 seconds at a time, then 100 instances all at the malheur stage will take close to an hour to complete. That seems like a showstopper to me.

-Brad

@SeanKim777
Copy link
Contributor Author

SeanKim777 commented Nov 3, 2016

@spender-sandbox You are right. I didn't thought about that many instances may running because I only using single instance. but tested with 3 instances though

So it would better to register Malheur Application As a Scheduled Service(use cron or ??) to be executed every configured interval with option to scan directory which will hold bunch of mist data created by malheur.py module during that interval

Current malheur.py module need to be changed to just put newly created mist data into a directory which will be used by MAASS. Directory name should be changed dynamically for each running of MAASS.

Also, I don't think malheur in Cuckoo need to be run in a 'cluster' mode and this may cause more trouble in the future. but need to think about compatibility with older version of cuckoo

I am look forward to your comment about this and please give me what kind of tool(?) for MAASS would be good? cron? or anything else I don't know?

@mallorybobalice
Copy link

mallorybobalice commented Nov 13, 2016

@spender-sandbox - Brad, is there any way you could please help Sean fix this for process.py compatibility?

I guess you could say retention.py should be a cron job (yet it's a reporting module), so I don't quite see why malheur.py in batch mode can't be a reporting module ala - 'trigger , put a bookmark type thing in, don't lock up any processing jobs- anything after comes in the next run ' ?
thoughts?

@mallorybobalice
Copy link

@spender-sandbox final bump :( ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants