You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I apologize if I have missed something obvious, but I am using glow to map and reduce time series. I would like to do a reduce or reduceByKey on every time slices (for instance, reduceByKey on all events received in the last minute.
Right now, I am setting the code up to be distributed and, following the tutorial, have put my flows in the func init() section so that they are "statically -- (instantiated only once the same way)" on every nodes.
The data is coming from an unlimited stream (i.e, not from a bounded file). So I have something like this:
funcinit() {
mapRecordsToMetadata.
Channel(mapRecordsToMetadataInput).
Map(mapTimeSeriesToTimeSliceFunc).
Map(mapConvertColumnValuesFunc).
// ... some more maps and filtersReduceByKey(reduceByFlowKey).
AddOutput(mapRecordsToMetadataOutput)
}
// letsDoIt uses mapRecordsToMetadata to map and reduces all events for a given key during a time slicefuncletsDoIt(streamedEventschan []string) chanGroupedEventsByKeyChan{
out:=make (chanGroupedEventsByKeyChan)
gofunc() {
forevt:=rangestreamedEvents {
mapRecordsToMetadataInput<-evt
}
}()
gofunc() {
forevt:=rangemapRecordsToMetadataOutput {
out<-evt
}
}()
returnout
}
I have simplified a bit, but hopefully this is enough to get the idea. Now, reduceByKey is blocking until I close mapRecordsToMetadataInput input channel (makes sense). However, if I do this, I can't really use my flow mapRecordsToMetadata anymore (is there a way to replace the input channel and restart it?).
Conceptually, I would "close" my input flow (mapRecordsToMetadataInput every "time slices" where I want the aggregate to run (i.e every 30 seconds) so that my reduceByKey would run on that intervals of inputs.
My only option seems to make the "map" operations in the init() section (i.e mapRecordsToMetadataInput) and the reduceByKey() operation in a dynamic flow, recreating the dynamic flow every 30 seconds in my case.
Something like this:
funcinit() {
mapRecordsToMetadata.
Channel(mapRecordsToMetadataInput).
Map(mapTimeSeriesToTimeSliceFunc).
Map(mapConvertColumnValuesFunc).
// ... some more maps and filters// Removed the Reduce By Key AddOutput(mapRecordsToMetadataOutput)
}
funcletsDoIt(streamedEventschan []string) chanGroupedEventsByKeyChan{
out:=make (chanGroupedEventsByKeyChan)
gofunc() {
forevt:=rangestreamedEvents {
mapRecordsToMetadataInput<-evt
}
}()
gofunc() {
nextInterval:=time.Now().Add(30*time.SECOND)
for {
reduceFlow:=flow.New()
reduceInChan:=make(chanEventsByKeychan)
reduceFlow.
Channel(reduceInChan).
ReduceByKey(reduceByFlowKey).
AddOutput(out)
forevt:=rangemapRecordsToMetadataOutput {
reduceInChan<-evtif (evt.Time.After(nextInterval) {
//flush and reduce for that intervalclose(ReduceInChan)
nextInterval:=nextInterval.Add(30*time.SECOND)
}
}
}
}()
returnout
}
Is this the "right" canonical way to proceed? Does that scale? Or are we missing a small feature that would allow to "flush" our static flows at fixed intervals or on demand so that we can operate on streaming use cases in a more streamline fashion?
The text was updated successfully, but these errors were encountered:
Hello.
I apologize if I have missed something obvious, but I am using glow to map and reduce time series. I would like to do a reduce or reduceByKey on every time slices (for instance, reduceByKey on all events received in the last minute.
Right now, I am setting the code up to be distributed and, following the tutorial, have put my flows in the func init() section so that they are "statically -- (instantiated only once the same way)" on every nodes.
The data is coming from an unlimited stream (i.e, not from a bounded file). So I have something like this:
I have simplified a bit, but hopefully this is enough to get the idea. Now, reduceByKey is blocking until I close
mapRecordsToMetadataInput
input channel (makes sense). However, if I do this, I can't really use my flowmapRecordsToMetadata
anymore (is there a way to replace the input channel and restart it?).Conceptually, I would "close" my input flow (
mapRecordsToMetadataInput
every "time slices" where I want the aggregate to run (i.e every 30 seconds) so that my reduceByKey would run on that intervals of inputs.My only option seems to make the "map" operations in the init() section (i.e
mapRecordsToMetadataInput
) and thereduceByKey()
operation in a dynamic flow, recreating the dynamic flow every 30 seconds in my case.Something like this:
Is this the "right" canonical way to proceed? Does that scale? Or are we missing a small feature that would allow to "flush" our static flows at fixed intervals or on demand so that we can operate on streaming use cases in a more streamline fashion?
The text was updated successfully, but these errors were encountered: