Create an elasticsearch cluster in azure, host mongodb, get data from traackr, do something cool with maps. Scale up.
##Setup servers -Create github repo -Go to azure portal and create some nods named cm-es-9200.cloudapp.net through cm-es-9204.cloudapp.net and make the node names be the same as the ports you're going to use -setup mongodb on the first node
##Mongo DB -http://docs.mongodb.org/manual/tutorial/install-mongodb-on-red-hat-centos-or-fedora-linux/ -Create a /etc/yum.repos.d/mongodb.repo file to hold the following configuration information for the MongoDB repository:
[mongodb] name=MongoDB Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/ gpgcheck=0 enabled=1
Then "sudo su -" and then
yum install mongo-10gen mongo-10gen-server
mongo cluster-7-data-00.sl.hackreduce.net:28953/traackr
> db.posts.find() > db.influencers.find()
Install Mongo on windows. just download the 64 bit version, unpack the zip file and put mongo bin in your PATH.
mvn package mvn assembly:assembly cluster-7-data-00.sl.hackreduce.net:28953 cd target java -jar hackday* java -jar hackday-mongo-loader.jar -c posts -d traackr -m cluster-7-data-00.sl.hackreduce.net:28953 cd java/mondo-data mvn package; mvn assembly:assembly; java -jar target/hackday-mongo-loader.jar -c influencers -d traackr -m cluster-7-data-00.sl.hackreduce.net:28953 -o 10
##Project Home
[here] (https://github.com/depahelix/es-hack-1000)
##Goal 1 git clone https://github.com/hackreduce/elasticsearch-hackathon build the project with maven. see: [elasticsearch-hackathon] (https://github.com/hackreduce/elasticsearch-hackathon)
##Goal 2 Index some data.
##Section 3
hack/reduce elasticsearch sept 2013
ElasticSearch Hackathon Material
All attendees:
- Git and git client (to download or share code)
- A GitHub account (to share your creations)
- Text editor or IDE of choice
- Either the native Java Client (see provided skeleton Java ES Client project ), or an ElasticSearch client for the language of your choice: http://www.elasticsearch.org/guide/clients/
It's recommended that you download and play with Elasticsearch locally if only to get familiar with the basic commands.
http://www.elasticsearch.org/guide/reference/setup/installation/
- Loaded on elasticsearch cluster on cluster-7-slave-00.sl.hackreduce.net (visual cluster representation can be seesn through the ElasticSearch Head Plugin)
- Two indices are available:
- wikipedia: collection of english wikipedia articles and tweets. About 13 million records. Mapping: https://gist.github.com/imotov/5169928
- enron: collection of emails from Enron Email Dataset. About 0.5mln records. Mapping: https://gist.github.com/imotov/5169937
This data is loaded in MongoDB so that you can re-index it into ES in any way you find interesting:
- Loaded on Mongo instance on cluster-7-data-00.sl.hackreduce.net
- Mongo URI: mongodb://cluster-7-data-00.sl.hackreduce.net:28953
- Database name: traackr
- Two collections are available:
- posts: collection of articles and tweets. About 23 million records. JSON data structure: https://gist.github.com/gpstathis/5170137
- influencers: collection of authors corresponding to the articles in the “posts” collection. About 85K records. JSON data structure: https://gist.github.com/gpstathis/5170171
- Plugin Directory
- Native Script
- Analysis - https://github.com/elasticsearch/elasticsearch-analysis-icu, https://github.com/spinscale/elasticsearch-opennlp-plugin
- River
- REST API
- Script Facets
- CSV data loader (Ruby)
- JSON data loader (Ruby)
- CSV data loader (Perl)
- JSON data loader (Clojure)
- Enron data loader (Python)
- Two skeleton projects are availalbe to get you up and running right away: Java or Python
- Using the Java driver
- Java Driver Examples Code
- Using the Python driver
- Python driver tutorial
- How to connect to the Hack/Reduce MongoDB Shell via local client:
- Install MongoDB in your local environment
- Ubuntu / Debian:
sudo apt-get update; sudo apt-get install mongodb
- Fedora / RedHat:
sudo yum install mongodb
- Test if installed successfully:
mongo --version
- Connect to Mongo instance on cluster-7-data-00.sl.hackreduce.net:
mongo cluster-7-data-00.sl.hackreduce.net:28953/traackr
- Ubuntu / Debian:
- Install MongoDB in your local environment