Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Backend Class for Readability, Testing, Coordinator-less Functionality #295

Merged
merged 19 commits into from
Jan 24, 2023

Conversation

ryanohoro
Copy link
Collaborator

@ryanohoro ryanohoro commented Jan 24, 2023

Describe the change

Makes coordinator an optional parameter for distribute(), falls back to using File() objects to pass data.

Updates all scanners to use emit_file() for extracted files, in support of running Strelka without a Redis coordinator

Adds a test for distribute()

Adds a main function for the Strelka python module that permits rudimentary python-only "local" scanning (with embedded config files), for #182

Imports internal fork code for processing Redis tasks as JSON objects, with filename attribute, Closes #294

Adds a ScannerException class that scanners can throw to provide verbose exception messages

Adds split_words option to ScanOcr (default: True), true: return list of words, false: return whole text string (without newlines)

Clears the flags field of cached scanners to prevent persistence

Improves error handling in ScanOcr, ScanNf, ScanLsb

Removes "application/msword" from ScanEncryptedZip in backend.yaml, erroneously added in a previous PR

Describe testing procedures

src/python$ python setup.py install
src/python$ pip install -r requirements.txt
src/python$ strelka strelka/tests/fixtures/test.html
starting local analysis...
{"file": {"depth": 0, "flavors": {"mime": ["text/html"], "yara": ["html_file"]}, "name": "strelka/tests/fixtures/test.html", "scanners": ["ScanEntropy", "ScanFooter", "ScanHash", "ScanHeader", "ScanHtml", "ScanYara"], "size": 5875, "tree": {"node": "d6fd90b3-ba36-44fc-a45b-e7ca40c58fe2", "root": "d6fd90b3-ba36-44fc-a45b-e7ca40c58fe2"}}, "scan": {"entropy": {"elapsed": 3.6e-05, "entropy": 4.847574566795829}, "footer": {"elapsed": 2e-05, "footer": "pan></span>\n</p>\n\n\n<p>&nbsp;</p>\n\n\n</body>\n</html>", "backslash": "pan></span>\\n</p>\\n\\n\\n<p>&nbsp;</p>\\n\\n\\n</body>\\n</html>"}, "hash": {"elapsed": 0.004696, "md5": "ba4ffdba7f62b2333a23a97d3ba5f1f6", "sha1": "a1f900c64ed49bc111462c6fd91546640b5ac20c", "sha256": "38e2d4d56acf228fcebbbf5a60a16bb36ffcee490299ea52c9b1ffbcbeb62db8", "ssdeep": "96:qWJQC5siJJ+tH6STSTSTSTkvsAw2gF3BgwQWhhSTSTSTSTS/sItklIy7STSTSTSh:qOQGsiJJO3eeeIvspjJPyeeeefklCeew", "tlsh": "T14AC16713EF67021152BDA0E9E0BF4A64D494560CA3465BF4B2AE477ABFCD93136122CC"}, "header": {"elapsed": 3e-05, "header": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <title", "backslash": "<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n    <title"}, "html": {"elapsed": 0.005584, "total": {"scripts": 2, "forms": 0, "inputs": 0, "frames": 0, "extracted": 1, "spans": 35}, "title": "Lorem Ipsum", "scripts": [{"src": "https://example.com/example.js", "type": "text/javascript"}], "spans": [{"style": "font-size:11pt"}, {"style": "background-color:white"}, {"style": "font-family:Calibri,sans-serif"}, {"style": "font-size:52.5pt"}, {"style": "color:black"}, {"style": "font-size:12pt"}, {"style": "font-family:\"Times New Roman\",serif"}, {"style": "font-size:10.5pt"}, {"style": "font-family:\"Arial\",sans-serif"}]}, "yara": {"elapsed": 0.002338, "matches": ["test"]}}}
{"file": {"depth": 1, "flavors": {"mime": ["text/plain"], "yara": ["javascript_file"]}, "name": "script_1", "scanners": ["ScanEntropy", "ScanFooter", "ScanHash", "ScanHeader", "ScanJavascript", "ScanYara"], "size": 221, "source": "ScanHtml", "tree": {"node": "b13705b5-ee6b-4d02-b6bc-b17bd81b7744", "parent": "d6fd90b3-ba36-44fc-a45b-e7ca40c58fe2", "root": "d6fd90b3-ba36-44fc-a45b-e7ca40c58fe2"}}, "scan": {"entropy": {"elapsed": 3.1e-05, "entropy": 4.620200029985679}, "footer": {"elapsed": 1.7e-05, "footer": "   document.body.appendChild(newParagraphElement)\n", "backslash": "   document.body.appendChild(newParagraphElement)\\n"}, "hash": {"elapsed": 7e-05, "md5": "ed2a6dffc68bcbe361f4539b5f423d66", "sha1": "172771134de76ede1df66cfa95a839237e485c40", "sha256": "8c3e97cc7103eec2f8959b0f27e2011f09f26386131b075a59f2423c791917ff", "ssdeep": "6:8/tuR78mgO9lV3K0Ji8mOFf0/tuRhBeJY1lLB/etuRMv:8/tuRYu80J17F8/tuRhBein2tuRu", "tlsh": "T159D0A715143A07E4A34AB04F24344394F870045A30173115545F4CCF6F20E922485494"}, "header": {"elapsed": 1.3e-05, "header": "\n    newParagraphElement = document.createElement(", "backslash": "\\n    newParagraphElement = document.createElement("}, "javascript": {"elapsed": 0.031265, "tokens": ["Identifier", "Punctuator", "String"], "strings": ["span", "Lorem Ipsum"], "identifiers": ["newParagraphElement", "document", "createElement", "textLoremIpsum", "createTextNode", "appendChild", "body"], "beautified": true}, "yara": {"elapsed": 7e-05, "matches": ["test"]}}}
============================= test session starts ==============================
platform linux -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /strelka
plugins: mock-3.10.0, unordered-0.5.2
collected 98 items

tests/test_distribute.py .
tests/test_required_for_scanner.py .
tests/test_scan_base64.py .
tests/test_scan_base64_pe.py .
tests/test_scan_batch.py .
tests/test_scan_bmp_eof.py .
tests/test_scan_bzip2.py .
tests/test_scan_capa.py ...
tests/test_scan_ccn.py .
tests/test_scan_delay.py .
tests/test_scan_dmg.py ...
tests/test_scan_docx.py .
tests/test_scan_elf.py .
tests/test_scan_email.py .
tests/test_scan_encrypted_doc.py ....
tests/test_scan_encrypted_zip.py ..
tests/test_scan_entropy.py .
tests/test_scan_exception.py .
tests/test_scan_exiftool.py ..
tests/test_scan_footer.py ..
tests/test_scan_gif.py .
tests/test_scan_gzip.py .
tests/test_scan_hash.py .
tests/test_scan_header.py ..
tests/test_scan_html.py .
tests/test_scan_ini.py .
tests/test_scan_iso.py .
tests/test_scan_javascript.py .
tests/test_scan_jpeg.py ..
tests/test_scan_json.py .
tests/test_scan_libarchive.py ......
tests/test_scan_lnk.py .
tests/test_scan_lzma.py .
tests/test_scan_macho.py .
tests/test_scan_manifest.py .
tests/test_scan_msi.py .
tests/test_scan_nf.py ....
tests/test_scan_ocr.py ...
tests/test_scan_ole.py ....
tests/test_scan_pcap.py ..
tests/test_scan_pdf.py .
tests/test_scan_pe.py .
tests/test_scan_pgp.py ....
tests/test_scan_plist.py .
tests/test_scan_png_eof.py ...
tests/test_scan_qr.py ...
tests/test_scan_rar.py .
tests/test_scan_seven_zip.py .....
tests/test_scan_strings.py .
tests/test_scan_tar.py .
tests/test_scan_upx.py .
tests/test_scan_url.py ..
tests/test_scan_vhd.py ..
tests/test_scan_x509.py ..
tests/test_scan_xml.py .
tests/test_scan_yara.py .
tests/test_scan_zip.py ..

======================= 98 passed, 29 warnings in 38.63s =======================

============================= test session starts ==============================
platform linux -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /strelka
plugins: mock-3.10.0, unordered-0.5.2
collected 161 items

tests_configuration/test_scanner_assignment.py ................................................................................
tests_configuration/test_taste.py .................................................................................
======================= 161 passed, 4 warnings in 7.88s ========================

Sample output

split_words option in ScanOcr permits a configurable change in the output events:

split_words: false

"ocr": {
  "elapsed": 0.671064,
  "text": "Lorem Ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras lobortis sem dui. Morbi at magna quis ligula faucibusconsectetur feugiat at purus. Sed nec lorem nibh. Nam vel libero odio. Vivamus tempus non enim egestas pretium.Vestibulum turpis arcu, maximus nec libero quis, imperdiet suscipit purus. Vestibulum blandit quis lacus nonsollicitudin. Nullam non convallis dui, et aliquet risus. Sed accumsan ullamcorper vehicula. Proin non urna facilisis,condimentum eros quis, suscipit purus. Morbi euismod imperdiet neque fermentum dictum. Integer aliquam, erat sitamet fringilla tempus, mauris ligula blandit sapien, et varius sem mauris eu diam. Sed fringilla neque est, in laoreetfelis tristique in. Donec luctus velit a posuere posuere. Suspendisse sodales pellentesque quam."
},

split_words: true (default)

"ocr": {
  "elapsed": 0.645343,
  "text": [
    "Lorem",
    "Ipsum",
    "Lorem",
    "ipsum",
    "dolor",
    "sit",
    "amet,",
    "consectetur",
    "adipiscing",
    "elit.",
    "Cras",
    "lobortis",
    "sem",
    "dui.",
    "Morbi",
    "at",
    "magna",
    "quis",
    "ligula",
    "faucibus",
    "consectetur",
    "feugiat",
    "at",
    "purus.",
    "Sed",
    "nec",
    "lorem",
    "nibh.",
    "Nam",
    "vel",
    "libero",
    "odio.",
    "Vivamus",
    "tempus",
    "non",
    "enim",
    "egestas",
    "pretium.",
    "Vestibulum",
    "turpis",
    "arcu,",
    "maximus",
    "nec",
    "libero",
    "quis,",
    "imperdiet",
    "suscipit",
    "purus.",
    "Vestibulum",
    "blandit",
    "quis",
    "lacus",
    "non",
    "sollicitudin.",
    "Nullam",
    "non",
    "convallis",
    "dui,",
    "et",
    "aliquet",
    "risus.",
    "Sed",
    "accumsan",
    "ullamcorper",
    "vehicula.",
    "Proin",
    "non",
    "urna",
    "facilisis,",
    "condimentum",
    "eros",
    "quis,",
    "suscipit",
    "purus.",
    "Morbi",
    "euismod",
    "imperdiet",
    "neque",
    "fermentum",
    "dictum.",
    "Integer",
    "aliquam,",
    "erat",
    "sit",
    "amet",
    "fringilla",
    "tempus,",
    "mauris",
    "ligula",
    "blandit",
    "sapien,",
    "et",
    "varius",
    "sem",
    "mauris",
    "eu",
    "diam.",
    "Sed",
    "fringilla",
    "neque",
    "est,",
    "in",
    "laoreet",
    "felis",
    "tristique",
    "in.",
    "Donec",
    "luctus",
    "velit",
    "a",
    "posuere",
    "posuere.",
    "Suspendisse",
    "sodales",
    "pellentesque",
    "quam."
  ]
},

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of and tested my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

@phutelmyer phutelmyer merged commit 4b1a098 into target:master Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

file.name Field Missing From Root File Using Oneshot
2 participants