GitHub - ycombinator/es-enron: Elasticsearch demo using Enron email dataset

Pre-requisite

Download dataset.tgz from here into the same folder as where you clone this repository.

Preparation

The dataset.tgz file contains an archive of all Enron emails, de-duped, and parsed into JSON files. Each JSON file in the archive represents one email message.

The size of this compressed dataset is 252MB. Uncompressed into individual JSON files, the size becomes 1.3GB.

Install Node.js, MySQL, and Elasticsearch. Make sure MySQL and Elasticsearch are running.
Uncompress the archive.

tar xvf dataset.tgz

Load the emails into Elasticsearch.

npm install   # if you haven't run this already
./load_into_es.sh

Load the emails in MySQL.

./load_into_mysql.sh

Appendix

The original Enron email dataset was taken from https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz. This is an archive of all Enron emails in EML format, where each file represents one email message. Some of these messages are duplicated in multiple files.

The parse_email_files.js script will parse the original Enron email dataset into JSON files, after de-duplicating them.

The included dataset.tgz file is archive of exactly these JSON files.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
analysis.md		analysis.md
load_into_elasticsearch.js		load_into_elasticsearch.js
load_into_elasticsearch.sh		load_into_elasticsearch.sh
load_into_mysql.sh		load_into_mysql.sh
package.json		package.json
parse_email_files.js		parse_email_files.js
queries.md		queries.md
transform_for_mysql_bulk_load.js		transform_for_mysql_bulk_load.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pre-requisite

Preparation

Appendix

About

Releases

Packages

Languages

ycombinator/es-enron

Folders and files

Latest commit

History

Repository files navigation

Pre-requisite

Preparation

Appendix

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages