We will be running Spark in a single node mode.
None
20 mins
Please go through this lab on 'screen' first.
Instructor will provide details
There is no 'install'. Just unzip/untar and run :-) (copy paste the following commands on terminal, do not include $ in your commands)
$ cd
$ rm -rf spark # cleanup existing spark installation (if any)
$ tar xvf files/spark-1.6.1-bin-hadoop2.6.tgz
$ mv spark-1.6.1-bin-hadoop2.6 spark
Now we have spark installed in ~/spark
directory
$ ~/spark/sbin/start-all.sh
Verify Spark is running by 'jps' command
$ jps
Your output may look like this..
30624 Jps
30431 Master
30565 Worker
you will see Master and Worker processes running. (you probably will get different values for process ids - first column )
Spark UI will be at port 8080 of the host. In browser go to http://your_spark_host_address:8080 (be sure to use the 'public' ip address)
bingo! Now we have spark running.
You will see a similar screen shot like this
To explore:
-
Is Master and Worker running on the same node?
-
Inspect memory & CPU available for Spark worker
-
Note the Spark master URI, it will be something like spark://host_name:7077 We will need this for later labs
Do the following to update the labs to latest
$ cd ~/spark-labs
$ git pull # this will update the labs to latest
Here is a virtual machine for you, https://s3.amazonaws.com/elephantscale-public/vm/CentOS.ova
It is in the OVA format, useable both in VMWare and VirtualBox.
Password: spark
If one has to do a Windows install, here is the magic
http://nishutayaltech.blogspot.com/2015/04/how-to-run-apache-spark-on-windows7-in.html
set winutils as described
For example
\projects
\projects\spark-labs
\projects\spark-1.4.1-bin-hadoop2.4
\projects\winutils\bin\winutils
System env variable
HADOOP_HOME=\projects\winutils
With this download spark-1.4.1-bin-hadoop2.4