Integrate glow with Kubernetes #42

justicezyx · 2016-04-23T22:38:34Z

Tl,dr, the cluster management functionality can be built to work natively using Kubernetes' APIs.

Glow framework is extended to manage its jobs through Kubernetes' APIs. A new logical working unit, controller, is introduced, which manage master/mapper/reducer jobs.

A glow application is built as a standalone binary. The user specifies the parameters, like how many mappers/reducers, the input/output, etc. The user then start the application on Kubernetes with Kubernetes' own basic client tool. When the application starts, it changes its role to controller, and start the other jobs based on user configured parameters.

The benefits would be that Glow can be made as a standalone batch job that work natively on top of Kubernetes. When to run the glow application, it would need to be built as a Docker image. And can be started and terminated on demand, eliminating the cost of maintaining a dedicated cluster.

chrislusf · 2016-04-24T02:20:18Z

Thanks for the good idea! With this, we do not need "glow master" to manage the available nodes.

Any good Kubernates' go API? I can not find an easy-to-follow example.

justicezyx · 2016-04-24T02:53:54Z

http://kubernetes.io/ is the general site.

https://github.com/kubernetes/kubernetes/tree/30891c7f3f1ba48d57cda24ede9ba9f65ab305f8/pkg/client/restclient
This is the go rest client.

I am trying to get approval from manager to allocate time working on this. How about we spend some time reading the docs, and then we sync up to write a design doc for this?

chrislusf · 2016-04-24T03:20:34Z

Thanks! It'll be great if you can get the approval.

The related logic is all under github.com/chrislusf/glow/driver/scheduler . I hope it is simple to read. Let me know if you have any questions.

joeblew99 · 2016-04-24T07:52:19Z

Also nomad would be a good option for this type of workload. It's also much
easier in my opinion.

Nomad is from hashicorp, and integrated with consul

On Sun, 24 Apr 2016, 05:20 Chris Lu, [email protected] wrote:

Thanks! It'll be great if you can get the approval.

The related logic is all under github.com/chrislusf/glow/driver/scheduler
. I hope it is simple to read. Let me know if you have any questions.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#42 (comment)

chrislusf · 2016-04-24T15:41:00Z

I see similar frameworks usually need to integrate with existing cluster management systems, which could be what users are actually using with other systems. So we would need to integrate with Mesos, YARN, Nomad, Kubernate, AWS, etc. We can just start with any one of them, whichever is easier to anyone working on this.

btw: Currently "glow master" not only manage node locations, but also dataset locations. The dataset location is for users to "glow receive" to peek into the dataset. This may not be that important.

justicezyx · 2016-04-25T06:06:39Z

SGTM, I can start with the Kubernetes implementation as this issue suggested.

To get started, a relatively naive approach is to start master and agent nodes inside driver program, then continue the exitent workflow by contacting the master node. This would have many redundancy because Kubernetes has overlapping functionality with the master.

The benefit, though, is easier to get familiar with Kubernetes, and quicker to reach a usable prototype.

Does this sound reasonable?

chrislusf · 2016-04-25T06:38:34Z

The cluster management section is poorly documented. I would recommend you help to write some documents first, getting familiar with the existing process, organize the code into a better structure, and then extend it.

I think the driver can skip starting the master node, which does almost the same work as Kubernetes. The driver can just ask Kubernetes for nodes and start agents there.

justicezyx · 2016-05-23T02:14:54Z

Thanks Chris, first let me congratulate your new position. :) I was dormant for the past weeks due to some personal affairs.

Your suggestion of starting with cluster management documentation is spot on. I also agree with your suggestion of skipping starting master node.

I will start with what you suggested. Will keep updating this thread in the process.

justicezyx · 2016-07-14T06:55:18Z

From what I learned in the past a few days, I think the distributed execution of Glow works as follows:

User starts the binary, which runs in 'driver' mode. Use 'driver' to refer to this running process.
Driver processes a FlowContext, which is statically compiled into the driver binary; figure out the command line flags for the worker processes; and submit task requests to cluster leader.
Driver continues to send request and process replies from cluster leader to starts all tasks.
A worker process can figure out what are its inputs and outputs from its command line flags, with the help from driver to resolve the identities of other running worker processes.

I have a rather preliminary plan of integrating with K8s, which is to let the driver contact K8s directly, and just start all tasks directly on a K8s cluster. That seems fit correctly in the picture describe above. When tasks are started, driver will know their identity and worker processes should be able to contact driver or K8s for the identities of other running worker processes.

Chris, could you please point me to the code where 2 happens? Particularly, I am interested in the code that figures out the command line flags for a task.

chrislusf · 2016-07-14T07:36:19Z

for 2: The driver will request resources from the glow master. And then the driver will ask glow agents to start the executors.

The driver talk to the agents via a protocol buffer based message. You can search for "ControlMessage_StartRequest" in the code to see how it is used.

justicezyx · 2016-07-15T05:20:10Z

Thanks.

justicezyx · 2016-09-04T02:25:56Z

I spent most of time reading the code and writing tests. Now I think I have a good enough understanding of glow's internals. And I have a very rough idea on how to integrate glow with k8s:

Add capability for driver to talk to k8s scheduler.

Driver asks k8s to schedule a job with N replicas. Use workers to refer to the running processes on K8s. N here is the number of task groups as decided by glow/driver/plan, each task group will be run by a worker.

Afterwards, workers start talking to the driver, and the driver will pass the execution parameters (flow_id, task_id, etc.) to them. The code of producing execution parameters and execute work according to them is already there. Right now, this code primarily interfaces directly with command line flags, and is scattered between driver and scheduler, we will need to adapt it to become a cleaner RPC interfaces for reusing it in K8s integration.
2. Users need to build the docker image of the glow binary before starting the driver. Note that when we say 'driver', it still is the same binary. Glow's architecture can be summarized as one binary with internal logic of split the workload and can execute distributedly.

The driver will accepts an argument which is the docker image ID. And use that to create the replicas of itself.

We should provide a glow builder tool to build a glow program and produce a docker image for the same binary.
3. We can provide a convenient the driver on K8s itself. So that we have everything running natively inside K8s.

Any comments on this high-level idea? If no objections, I plan to proceed to write a document with more details, primarily focus on how the driver work with K8s and Docker through their APIs.

chrislusf · 2016-09-04T02:37:53Z

Currently the glow agents download the binary and start it as executors.
Can we share a k8s image and acts just like the glow agent?

On Sat, Sep 3, 2016 at 7:26 PM Yaxiong [email protected] wrote:

I spent most of time reading the code and writing tests. Now I think I
have a good enough understanding of glow's internals. And I have a very
rough idea on how to integrate glow with k8s:

Add capability for driver to talk to k8s scheduler.

Driver asks k8s to schedule a job with N replicas. Use workers to
refer to the running processes on K8s. N here is the number of task groups
as decided by glow/driver/plan, each task group will be run by a worker.

Afterwards, workers start talking to the driver, and the driver will
pass the execution parameters (flow_id, task_id, etc.) to them. The code of
producing execution parameters and execute work according to them is
already there. Right now, this code primarily interfaces directly with
command line flags, and is scattered between driver and scheduler, we will
need to adapt it to become a cleaner RPC interfaces for reusing it in K8s
integration.
2.

Users need to build the docker image of the glow binary before
starting the driver. Note that when we say 'driver', it still is the same
binary. Glow's architecture can be summarized as one binary with internal
logic of split the workload and can execute distributedly.

The driver will accepts an argument which is the docker image ID. And
use that to create the replicas of itself.

We should provide a glow builder tool to build a glow program and
produce a docker image for the same binary.
3.

We can provide a convenient the driver on K8s itself. So that we have
everything running natively inside K8s.

Any comments on this high-level idea? If no objections, I plan to proceed
to write a document with more details, primarily focus on how the driver
work with K8s and Docker through their APIs.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABeL72lMEwk27jbbHYeVGLUjg6fx9j_-ks5qmiw4gaJpZM4IOSqV
.

justicezyx · 2016-09-04T05:30:53Z

Thanks Chris!

Can we share a k8s image and acts just like the glow agent?

It probably is impossible in K8s. K8s does not provide interface for users to directly communicate with a Kubelet. A kubelet runs on machines and executes binaries. Kubelets are controlled entirely by K8s scheduler and controller.

K8s' model is opaque, where applications are isolated from the details of running tasks. Based on my limited knowledge, Mesos probably is more friendly to your suggestion. It seems provide the resource offering mechanism for glow master to request resources and deploy the agents.

chrislusf · 2016-09-04T06:22:07Z

Good to learn this. You know more than me on k8s. Please go ahead and
thanks for your effort!
On Sat, Sep 3, 2016 at 10:30 PM Yaxiong [email protected] wrote:

Thanks Chris!

Can we share a k8s image and acts just like the glow agent?

It probably is impossible in K8s. K8s does not provide interface for users
to directly communicate with a Kubelet. A kubelet runs on machines and
executes binaries. Kubelets are controlled entirely by K8s scheduler and
controller.

K8s' model is opaque, where applications are isolated from the details of
running tasks. Based on my limited knowledge, Mesos probably is more
friendly to your suggestion. It seems provide the resource offering
mechanism for glow master to request resources and deploy the agents.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#42 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABeL76cpI3bHurgARNZDcG5dtVslAyj6ks5qmleSgaJpZM4IOSqV
.

justicezyx · 2016-09-04T21:23:22Z

Thanks Chris!

I guess I will need a few months to finish the complete document.

BTW, the manager support thing never happened. My manager seems not so supportive for such a 20% project, so I never bothered to discuss with him...

But going forward, I plan to set aside regular time outside work hours for this project.

chrislusf · 2017-01-18T06:06:02Z

Hi, @justicezyx Yaxiong, I am working on https://github.com/chrislusf/gleam now, which I think is much more performant and flexible. I want to know whether I can get some free credit on GCE platform. I am not full time on this, so the 60 days free trial does not fit me well.

justicezyx · 2017-01-18T06:49:07Z

Hi Chris, I am not in the cloud team, so I am not aware any relevant information off my mind. Let me ask around to see if there are some thing tailored to your situation. To clarify, you probably want something that can be used for free for much longer time, preferably with multiple machines, but do not need beefy ones. Does this sound reasonable to you? Chris Lu <[email protected]>于2017年1月17日周二22:06写道：

Hi, @justicezyx <https://github.com/justicezyx> Yaxiong, I am working on https://github.com/chrislusf/gleam now, which I think is much more performant and flexible. I want to know whether I can get some free credit on GCE platform. I am not full time on this, so the 60 days free trial does not fit me well. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#42 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDpWyJh2W3-M8yj0swQ2FVK-PgCognlks5rTavLgaJpZM4IOSqV> .

-- Thanks, Yaxiong Zhao Under the relentless thrust of accelerating over-population and increasing over-organization, and by means of ever more effective methods of mind-manipulation, the democracies will change their nature; the quaint old forms—elections, parliaments, Supreme Courts and all the rest—will remain. The underlying substance will be a new kind of non-violent totalitarianism. Aldous Huxley, Brave New World Revisited (1958)

justicezyx self-assigned this Jul 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate glow with Kubernetes #42

Integrate glow with Kubernetes #42

justicezyx commented Apr 23, 2016 •

edited

Loading

chrislusf commented Apr 24, 2016

justicezyx commented Apr 24, 2016

chrislusf commented Apr 24, 2016

joeblew99 commented Apr 24, 2016

chrislusf commented Apr 24, 2016

justicezyx commented Apr 25, 2016

chrislusf commented Apr 25, 2016

justicezyx commented May 23, 2016 •

edited

Loading

justicezyx commented Jul 14, 2016

chrislusf commented Jul 14, 2016 •

edited

Loading

justicezyx commented Jul 15, 2016

justicezyx commented Sep 4, 2016

chrislusf commented Sep 4, 2016

justicezyx commented Sep 4, 2016

chrislusf commented Sep 4, 2016

justicezyx commented Sep 4, 2016

chrislusf commented Jan 18, 2017

justicezyx commented Jan 18, 2017 via email

Integrate glow with Kubernetes #42

Integrate glow with Kubernetes #42

Comments

justicezyx commented Apr 23, 2016 • edited Loading

chrislusf commented Apr 24, 2016

justicezyx commented Apr 24, 2016

chrislusf commented Apr 24, 2016

joeblew99 commented Apr 24, 2016

chrislusf commented Apr 24, 2016

justicezyx commented Apr 25, 2016

chrislusf commented Apr 25, 2016

justicezyx commented May 23, 2016 • edited Loading

justicezyx commented Jul 14, 2016

chrislusf commented Jul 14, 2016 • edited Loading

justicezyx commented Jul 15, 2016

justicezyx commented Sep 4, 2016

chrislusf commented Sep 4, 2016

justicezyx commented Sep 4, 2016

chrislusf commented Sep 4, 2016

justicezyx commented Sep 4, 2016

chrislusf commented Jan 18, 2017

justicezyx commented Jan 18, 2017 via email

justicezyx commented Apr 23, 2016 •

edited

Loading

justicezyx commented May 23, 2016 •

edited

Loading

chrislusf commented Jul 14, 2016 •

edited

Loading