Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIVY-239: Moving the logic to generate session IDs from Session Manager to SessionStore #220

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

meisam
Copy link
Contributor

@meisam meisam commented Nov 3, 2016

Summary of changes:

  • Removes the AtomicCounter that is used in SessionManager to generate session IDs.
  • Generates the session IDs in SessionStore instead of SessionManager
  • Writes the newly generated session ID to the store each time nextSessionId is called.

Task-url: https://issues.cloudera.org/browse/LIVY-239

@codecov-io
Copy link

codecov-io commented Nov 3, 2016

Current coverage is 71.78% (diff: 54.54%)

Merging #220 into master will increase coverage by 0.25%

@@             master       #220   diff @@
==========================================
  Files            91         89     -2   
  Lines          4697       4516   -181   
  Methods           0          0          
  Messages          0          0          
  Branches        811        764    -47   
==========================================
- Hits           3360       3242   -118   
+ Misses          861        836    -25   
+ Partials        476        438    -38   

Powered by Codecov. Last update 69ac11e...eb4ab11

.map(_.nextSessionId).getOrElse(0)
store.set(sessionManagerPath(sessionType), SessionManagerState(nextSessionId + 1))
nextSessionId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the nextSessionId you get is the previous one persisted and you didn't increase the number when returned.

Also each time when you get the session id, you will fetch it from persisted storage, I think it is not necessary and will bring in extra overhead.

Copy link
Contributor Author

@meisam meisam Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryshao The scala doc comments are a bit misleading. The value stored in session store is the next unused session ID.
Performance wise, this change calls the session store as many times as the previous code. The only difference is that previously SessionManager was calling SessionStore.set, but now the SessionStore calls it.
Please take a look at SessionManager (lines 81 to 84) in 5e8474e (

)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the poor doc. My bad. Would you mind fixing it?
I think fixing the doc and rename val nextSessionId = store.get to lastUsedSessionId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to correct myself. Prior to this commit, there was one read from SessionStore on recovery or on server start. There was also one write per nextId() call. Now we have one read and one write to SessionStore per nextId() call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this's necessary for HA consistency.
Can we come up with some smart way to not pay this price for recovery mode, but just for HA mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my initial thoughts on it.
We can refactor the logic to generate sessionIDs into a separate trait/class and have two implementation: one for single-node, and one for multi-node. The single node mode can simply use a local atomic value, but the multi-node mode should use a more sophisticated implementation to guarantee uniqueness and consistency across all nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are thinking similarly but have 1 difference. You are adding the session id generation to SessionStore while I'm thinking about adding it to StateStore.

I would prefer to keep HA logic out of SessionStore and move these logic into ZookeeperStateStore. Having a single code path in SessionStore makes isolating HA related changes easier.

@alex-the-man
Copy link
Contributor

Is this done for implementing HA?

@meisam
Copy link
Contributor Author

meisam commented Nov 3, 2016

@tc0312 Yes. This change is needed to implement multimode HA.

* If no SessionManagerState is stored, it returns 0.
* It saves the new session ID to the session store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Extra space here.

.map(_.nextSessionId).getOrElse(0)
store.set(sessionManagerPath(sessionType), SessionManagerState(nextSessionId + 1))
nextSessionId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the poor doc. My bad. Would you mind fixing it?
I think fixing the doc and rename val nextSessionId = store.get to lastUsedSessionId?

def getNextSessionId(sessionType: String): Int = {
store.get[SessionManagerState](sessionManagerPath(sessionType))
def getNextSessionId(sessionType: String): Int = synchronized {
val nextSessionId = store.get[SessionManagerState](sessionManagerPath(sessionType))
Copy link
Contributor

@alex-the-man alex-the-man Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this with HA in mind. I think it's possible that another livy-server instance updates SessionManagerState between the store.get() and store.set() call.
I think we should add locking/optimistic locking + retry here to avoid servers stepping on each other. Or better, we change the interface of StateStore to make use of Sequence Nodes in ZooKeeper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right! But for multi-node HA, I am overriding this method in ZooKeeperStore and I am using a distributed atomic long recipe (https://curator.apache.org/curator-recipes/distributed-atomic-long.html) to guarantee that generated session IDs are unique and are generated and stored atomically. But for single node HA synchronizing on the session store instance would suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your ZooKeeperStore change, who's doing nextSessionId + 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionStore calls into the distributed Atomic recipe from Apache curator.
This is the how I am doing it:

  val distributedSessionId = new DistributedAtomicLong(curatorClient, zkPath, retryPolicy)

  def nextBatchSessionId: Int = {
    recursiveTry(distributedSessionId, MAX_RETRY) match {
      case Some(sessionId) =>
        sessionId.toInt
      case None =>
        val msg: String = "Failed to get the next session id from Zookeeper"
        logger.warn(msg)
        throw new IOException(msg)
    }
  }

  @tailrec
  private def recursiveTry(distributedLong: DistributedLong, retryCount: Int): Option[Long] = {
    val updatedValue = distributedLong.increment
    updatedValue.succeeded match {
      case _ if retryCount <= 0 =>
        None
      case true if retryCount > 0 =>
        Option(updatedValue.preValue())
      case _ =>
        recursiveTry(distributedLong, retryCount - 1)
    }
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these codes in SessionStore or a subclass of SessionStore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean ZooKeeperStateStore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. it is in ZooKeeperStateStore. Sorry!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you add a new method in StateStore to generate unique id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did. But as you said in your comment, moving the logic out of SessionStore makes a lot more sense.

I would prefer to keep HA logic out of SessionStore and move these logic into ZookeeperStateStore. Having a single code path in SessionStore makes isolating HA related changes easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to update the pull request based on the feedback.

sessionStore.saveNextSessionId(sessionType, idCounter.get())
id
}
// sessionStore.getNextSessionId is guaranteed to return atomic and returns unique IDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Extra space between return & atomic.

@jerryshao
Copy link
Contributor

Can we have a whole plan of Livy HA before doing this small refactoring work? My concern is that currently we don't have a decided plan of Livy HA, if finally the plan is changed or updated, current changes may be obsolete.

@meisam
Copy link
Contributor Author

meisam commented Nov 4, 2016

@jerryshao There is a JIRA ticket open for it: https://issues.cloudera.org/browse/LIVY-231
Desing doc: https://github.com/meisam/livy/wiki/Multi-node-HA-for-batch-sessions
preliminary pull request for LIVY-231: #222

@alex-the-man alex-the-man added this to the HA milestone Nov 8, 2016
This commit also
- updates the test cases around SessionStore
- Removes the SessionManagerState class (Not neede anymore). We are
storing a Long value that can be incremented.

Task-url: https://issues.cloudera.org/browse/LIVY-239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants