Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIVY-246 Support multiple Spark home in runtime #318

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ayushiagarwal
Copy link

Made code changes for enabling multiple spark versions on Livy for spark batch jobs. User can pass sparkVersion on run time for batch jobs and SPARK_HOME and SPARK_CONF_DIR would be set according to given sparkVersion.

Need to set path for livy.server.spark-home_$version in Livy.conf and those versions which are added would be supported.

Assumption:

  • bin/livy-env.sh will continue to have SPARK_HOME and SPARK_CONF_DIR which will be used as default spark version for batch and interactive

@codecov-io
Copy link

codecov-io commented Apr 26, 2017

Codecov Report

Merging #318 into master will decrease coverage by 2.62%.
The diff coverage is 62.5%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #318      +/-   ##
============================================
- Coverage     70.46%   67.84%   -2.63%     
+ Complexity      713      686      -27     
============================================
  Files            96       96              
  Lines          5079     5091      +12     
  Branches        754      756       +2     
============================================
- Hits           3579     3454     -125     
- Misses          995     1156     +161     
+ Partials        505      481      -24
Impacted Files Coverage Δ Complexity Δ
...loudera/livy/server/batch/CreateBatchRequest.scala 100% <100%> (ø) 18 <1> (+1) ⬆️
.../com/cloudera/livy/server/batch/BatchSession.scala 85.18% <100%> (-2%) 11 <0> (-2)
...er/src/main/scala/com/cloudera/livy/LivyConf.scala 87.96% <50%> (-6.44%) 15 <4> (-1)
.../com/cloudera/livy/utils/SparkProcessBuilder.scala 53.84% <50%> (-3.3%) 11 <1> (-1)
...in/scala/com/cloudera/livy/server/LivyServer.scala 2% <0%> (-32.67%) 2% <0%> (-5%)
.../main/scala/com/cloudera/livy/utils/SparkApp.scala 61.53% <0%> (-19.24%) 1% <0%> (ø)
...in/java/com/cloudera/livy/rsc/ContextLauncher.java 67.16% <0%> (-13.44%) 14% <0%> (-5%)
.../com/cloudera/livy/utils/LineBufferedProcess.scala 60% <0%> (-13.34%) 5% <0%> (-2%)
...a/livy/server/interactive/InteractiveSession.scala 54.63% <0%> (-9.27%) 34% <0%> (-8%)
...ore/src/main/scala/com/cloudera/livy/Logging.scala 63.63% <0%> (-9.1%) 0% <0%> (ø)
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7212e3f...da6edff. Read the comment docs.

Copy link
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the best to review how well this accomplishes it's purpose, but I've given config related feedback where I can

version.map {version => Option(get(s"livy.server.spark-home_$version"))
.orElse(throw new IllegalArgumentException(
s"Spark version: $version is not supported"))
}.getOrElse(Option(get(s"livy.server.spark-home")).orElse(sys.env.get("SPARK_HOME")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still use SPARK_HOME here as before not livy.server.spark-home directly

# livy.server.spark-home_1.5.2=<path>/spark-1.5.2
# livy.server.spark-home_1.6.3=<path>/spark-1.6.3
# livy.server.spark-home_2.0.1=<path>/spark-2.0.1
# livy.server.spark-home_2.1.0=<path>/spark-2.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _ should be a - to match conf style

val builder = new SparkProcessBuilder(livyConf)
val builder = new SparkProcessBuilder(livyConf, request.sparkVersion)
request.sparkVersion.map({ value =>
builder.env("SPARK_CONF_DIR", livyConf.sparkHome(request.sparkVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm still figuring out the ins and outs of Livy, but I'm not quite sure why SPARK_CONF_DIR needs to updated here, where is it used?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajbozarth, if we are not setting up SPARK_CONF_DIR, then if SPARK_CONF_DIR is set on environment already, then the older one gets precedence and hence spark version change might not get effect.
Example on Hortonworks Distribution: in SPARK_CONF_DIR/spark-defaults.conf, then "spark.yarn.jar" is set to "hdfs:///hdp/spark-assembly.1.6.2.jar" and hence if we are not overwriting the SPARK_CONF_DIR, then always this spark-assembly jar will be used for any spark version.

@@ -246,14 +246,20 @@ class LivyConf(loadDefaults: Boolean) extends ClientConf[LivyConf](null) {
def sparkDeployMode(): Option[String] = Option(get(LIVY_SPARK_DEPLOY_MODE)).filterNot(_.isEmpty)

/** Return the location of the spark home directory */
def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
def sparkHome(version: Option[String] = None): Option[String] = {
version.map {version => Option(get(s"livy.server.spark-home_$version"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should abstract out livy.server.spark-home_$version here into a def like the val's above. Something along the lines of:
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)

@ayushiagarwal
Copy link
Author

Fixed the changes suggested. Please review

Copy link
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, added some follow up comments

@@ -245,15 +245,25 @@ class LivyConf(loadDefaults: Boolean) extends ClientConf[LivyConf](null) {
/** Return the spark deploy mode Livy sessions should use. */
def sparkDeployMode(): Option[String] = Option(get(LIVY_SPARK_DEPLOY_MODE)).filterNot(_.isEmpty)

/** Return the spark home version */
def SPARK_HOME_VER(version: String): Option[String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more along the lines of adding
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)
to the LivyConf object above as a def similar to the val's, then doing Option(get(SPARK_HOME_VER(version))) which you could then do an orElse throw Exception on inside the sparkHome def as before.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I declare def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null) then I cant pass it in def sparkHome(version: Option[String] = None): Option[String] = {

  • version.map {version=>Option(get(SPARK_HOME_VER(version))).orElse(throw ...)} as it is a map and it will not accept null values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding you correctly, that's where my comment below would fix it. I'll edit the comment below to clarify it.

/** Return the location of the spark home directory */
def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
def sparkHome(version: Option[String] = None): Option[String] = {
version.map {version => SPARK_HOME_VER(version)
Copy link
Contributor

@ajbozarth ajbozarth Apr 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second look it may be better to use a case here, assuming the SPARK_HOME_VER def I suggested above I recommend above something like:

version.match {
   case Some(version) => Option(get(SPARK_HOME_VER(version))).orElse(throw ...)
   case None => Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
}

where the None case is just the old code. This would be easier to read overall.

@ayushiagarwal
Copy link
Author

Fixed the suggested changes. Please review

@ayushiagarwal
Copy link
Author

Moved the version with rest of the Entry's and also added code to set up SPARK_CONF_DIR. We need to set up SPARK_CONF_DIR otherwise sparksubmit will take the default environment set up SPAR_CONF_DIR - so setting it up according to spark version.

Copy link
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. I found one nit and made a comment on the SPARK_CONF_DIR code you added (I understand why it's needed now)

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was added by accident

@@ -62,7 +63,11 @@ object BatchSession {
request.conf, request.jars, request.files, request.archives, request.pyFiles, livyConf))
require(request.file != null, "File is required.")

val builder = new SparkProcessBuilder(livyConf)
val builder = new SparkProcessBuilder(livyConf, request.sparkVersion)
request.sparkVersion.map({ value =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need maps here, this should work fine as:

val sparkConfDir = livyConf.sparkHome(request.sparkVersion).map(_ + File.separator + "conf")
builder.env("SPARK_CONF_DIR", sparkConfDir)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need map in builder.env as sparkConfDir is optional variable and builder.env declaration needs a string. Agreed we can remove the first map of request.sparkVersion. Will make the changes and deploy.

@ayushiagarwal
Copy link
Author

Fixed the suggested changes in SPARK_CONF_DIR. Please review.

Copy link
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jerryshao
Copy link
Contributor

What's difference compared to #232 ?

@ajbozarth
Copy link
Contributor

@ayushiagarwal want to reopen this on apache/incubator-livy and address @jerryshao concern with his similar PR?

@karthikknnatarajan
Copy link

@ayushiagarwal @ajbozarth
We are looking for such a feature in Livy. Any ideas around when this will be merged into master? I can help getting this change merged in.

@ajbozarth
Copy link
Contributor

If @ayushiagarwal lets this continue to sit idle for another week I think it would be ok to reopen this on apache/incubator-livy yourself @karthikknnatarajan
No promises on whether it gets merged though, @jerryshao has an alternate implementation ( #232 ) that he hasn't moved over either

@karthikknnatarajan
Copy link

Sounds good. I will drop a note here before I create a new PR on apache/incubator-livy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants