Skip to content

echeipesh/vagrant-geotrellis-mesos-spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

vagrant-geotrellis-mesos-spark

This is a Vagrant project that attempts to produce a local development environment for GeoTrellis, on top of Apache Spark, on top of Apache Mesos.

Local Development

A combination of Vagrant 1.5+, Ansible 1.6+, and the vagrant-hostmanager Vagrant plug-in is used to setup the development environment for this project. It consists of the following virtual machines:

  • leader
  • follower01
  • follower02

The leader virtual machine is overloaded with a Mesos leader, Marathon, Zookeeper, and an HDFS NameNode. The follower* virtual machines are Mesos followers, as well as HDFS DataNodes.

Use the following command to bring up a local development environment:

$ vagrant up

Note: This step may prompt you for a password so that the vagrant-hostmanager plugin can add records to the virtual machine host's /etc/hosts file.

After provisioning is complete, you can view the Mesos web console by navigating to:

Service UIs

Service Port URL
Mesos 5050 http://localhost:5050
Marathon 8080 http://localhost:8080
HDFS 50070 http://localhost:50070
Accumulo 50095 http://localhost:50095

Caching

In order to speed up things up, you may want to consider using a local caching proxy. The VAGRANT_PROXYCONF_ENDPOINT environmental variable provides a way to supply a caching proxy endpoint for the virtual machines to use:

$ VAGRANT_PROXYCONF_ENDPOINT="http://192.168.96.10:8123/" vagrant up

Alternatively, you can also install the vagrant-cachier plugin.

Testing

Testing the Mesos/Spark integration consists of running a few tasks in the spark-shell from the Mesos leader.

First, login to the Mesos leader:

$ vagrant ssh leader

From there, set the following environmental variables:

vagrant@leader:~$ export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
vagrant@leader:~$ export MASTER=mesos://zk://zookeeper.service.geotrellis-spark.internal:2181/mesos
vagrant@leader:~$ export SPARK_EXECUTOR_URI="http://d3kbcqa49mib13.cloudfront.net/spark-1.2.1-bin-cdh4.tgz"

Next, download and extract the Spark 1.2.1 distribution for CDH4 locally:

vagrant@leader:~$ wget $SPARK_EXECUTOR_URI
vagrant@leader:~$ tar xzf spark-1.2.1-bin-cdh4.tgz

From here we can launch the spark-shell and run the test program:

vagrant@leader:~$ ./spark-1.2.1-bin-cdh4/bin/spark-shell
scala> val data = 1 to 10000
scala> val distData = sc.parallelize(data)
scala> distData.filter(_< 10).collect()

If all goes well, you should be able to see Spark distributing bits of the filter across the follower* virtual machines.

About

How many buzzwords can I fit into one repository name?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 46.7%
  • Shell 40.6%
  • JavaScript 9.6%
  • Makefile 3.1%