GitHub - jgawrilo/butler_install: A KYC tool for building a persona profile via simple search.

Overview

Butler is a web-based Know Your Customer (KYC) application meant to assist in slot-filling an entity profile via human-in-the-loop feedback and a simple search query capable of hitting the open and dark web as well as enterprise search repositories.

It's capable of leveraging SRI's Lighthouse Search backend for free text search and information correlation.

Use Cases

The primary use case is to help analysts whose job it is to begin with small piece of information such as a phone number or user handle and understand the complete profile of the entity or persona. Often times Google search and a speadsheet is used and there exists few tools that aggregate and analyze search results such that the relevant profile information is captured. Often there is also ambiguity with regards to resolving an entity - Butler is designed to cluster pages in an attempt to group pages and information that is more similar together.

Installation

Dependencies

At a high level, Butler depends on four software projects:

CoreNLP Server (used for entity and information extraction)
Elasticsearch (used as the application database)
Butler Server (scraping, analytic, and data processing component)
Butler UI (User interface)

Full Docker (Recommended Installation for testing)

This installation works with Linux and OS X and requires Docker and Git on your machine. It has been tested with Version 17.09.0.

It is recommended to configure Docker with 2 CPUs and 8 GB of Memory for basic use.

This installation runs each of the four software components listed above in a separate docker container.

CoreNLP Server runs on port 9000.
Elasticsearch runs on port 9200.
Butler Server runs on port 5000.
Butler UI runs on port 3000.

# Go get the project!
git clone https://github.com/jgawrilo/butler_install.git

# Move into the project directory!
cd butler_install

# Install the full app!
./install.sh

# Start the app (all containers) for use/testing!
./start.sh

# Head to http://localhost:3000 in your browser. Use the application! See 'Testing' below if you don't know what to do.  Go ahead! Try it out!

# Stop the app (all containers) when you're done!
./stop.sh

Testing

Head to http://localhost:300 and ensure you see a screen like below. When you do, start a project called 'justin'.
Type 'justin gawrilow' in the search bar and hit enter. This starts mining results.
The search might take a few minutes to complete. Please keep in mind the tool goes to the open (and possibly dark) web and pulls results on the fly, taking screenshots, parsing HTML and trying to fill out a profile.
After some time you should see a few results come back. Click on the clusters in the treemap or legend to checkout the pages.
You can also check out the profile, by clicking 'Profile' in the upper right corner.
To get more results, click on the 'More' button in the upper left. Again, after some time you'll see even more pages associated with 'justin gawrilow'
To start another project, click the button in the upper right and then click 'Close' or just close the browser. You can always go back to your old project or start a new one.
For more information on how to the use more features of the tool, please see the User Manual below.

Optional Configuration

Endpoint Configuration

Dark Web Search or Enterprise Elasticsearch hook up

Scaling out search infrastructure and page scraping

It's possible to speed up the search and scraping aspect of Butler by installing gg on separate servers and then adding those endpoints to the butler_server config.json.

Doing this will essentially distribute the search to these servers and will limit the calls any one server will receive.

E.g.,

"search_boxes":["http://40.167.321.126:7777/get_urls"]

User Manual

Please download this brief to understand more details about the application: Butler Cheat Sheet

License and Acknowledgements

Apache-2.0 and developed under the DARPA Memex program.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
docker-butler_app		docker-butler_app
docker-butler_corenlp		docker-butler_corenlp
docker-butler_server		docker-butler_server
docs		docs
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
install_ui_mac.sh		install_ui_mac.sh
mini_install.sh		mini_install.sh
mini_start.sh		mini_start.sh
mini_stop.sh		mini_stop.sh
start.sh		start.sh
stop.sh		stop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Use Cases

Installation

Dependencies

Full Docker (Recommended Installation for testing)

Testing

Optional Configuration

Endpoint Configuration

Dark Web Search or Enterprise Elasticsearch hook up

Scaling out search infrastructure and page scraping

User Manual

License and Acknowledgements

About

Releases

Packages

Languages

License

jgawrilo/butler_install

Folders and files

Latest commit

History

Repository files navigation

Overview

Use Cases

Installation

Dependencies

Full Docker (Recommended Installation for testing)

Testing

Optional Configuration

Endpoint Configuration

Dark Web Search or Enterprise Elasticsearch hook up

Scaling out search infrastructure and page scraping

User Manual

License and Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages