A database system for the efficient storage and querying of Linux audit logs.
Read more about the purpose of this system and its design here.
By Deven Bansod, Raghav Bhat, and Jitin George.
This assumes python2.7
is installed.
sudo apt-get install python-audit auditd
pip install -r requirements.txt
This requires Java to be installed. See this for installing Java 8 on newer versions of Ubuntu.
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.org/repo stable/' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
sudo apt install neo4j
sudo neo4j start
- Open http://localhost:7474
- Setup and remember server password (initial username and password is
neo4j
)
git clone https://github.com/joh/when-changed
cd when-changed
sudo python setup.py install
https://gtvault-my.sharepoint.com/:f:/g/personal/dbansod3_gatech_edu/Ek_MxbalghtHk4Gr5T7uyGQBFCGxjQtsfjyu_EYdkosWnA?e=bTUJbf
export PASS=<neo4j-password>
du -h /var/lib/neo4j/data/databases/graph.db/
sudo bash upload.sh neo4j $PASS data/1.log
Note: Running this automatically creates the Indexes that are used while querying.
du -h /var/lib/neo4j/data/databases/graph.db/
when-changed being-watched.log "sudo -E bash watcher.sh"
The process should run partially but halt since new lines is less than specified number.
The process should run completely.
du -h /var/lib/neo4j/data/databases/graph.db/
Now that the demo data has been ingested by the system, we can run queries on the data. The CLI is designed to be interactive.
python main.py 127.0.0.1:7687 neo4j $PASS
reducer/
folder in 1:
- Folder contains the Causality Preserving Reduction (CPR) algorithm we use to reduce shrink the logs
- We modified the original source code to make it compatible with our log-ingestion pipeline
parser/
folder in 1:
- Folder contains the log parser, which takes in raw system call logs, strips out irrelevant fields (e.g. register values), and deletes irrelevant system-calls (e.g. execve).
- We modified the set of parameters that is returned and made the code compatible with our pipeline
cli/
and root
directories, which include:
- Script to ingest data into Neo4j (
upload.sh
) - Neo4j Queries - ~20 Queries that are relevant to system call log analysis (
cli/query_functions.py
) - Command-line interface (
cli/*
) - Git diff-based system to ingest only new logs into the database (
watcher.sh
)