-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: FlatGeobuf async stream #966
base: main
Are you sure you want to change the base?
Conversation
Hmm, ok, I think it's not too difficult to This'd be ok-ish for a Stream<Result<FgbFeature, Error>> (supposedly, though it does look correct): use flatgeobuf::{FgbFeature, Error, HttpFgbReader, AsyncFeatureIter};
use futures::stream::Stream;
use std::pin::Pin;
use futures::task::{Context, Poll};
use futures::Future;
pub struct FgbStream<T>
where
T: AsyncRead + AsyncSeek + Unpin + Send,
{
iter: AsyncFeatureIter<T>,
}
impl<T> Stream for FgbStream<T>
where
T: AsyncRead + AsyncSeek + Unpin + Send,
{
type Item = Result<FgbFeature, Error>;
fn poll_next(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
// Use the async next method of AsyncFeatureIter
let future = self.iter.next();
futures::pin_mut!(future);
match future.poll(cx) {
Poll::Ready(Some(Ok(feature))) => Poll::Ready(Some(Ok(feature))),
Poll::Ready(Some(Err(err))) => Poll::Ready(Some(Err(err))),
Poll::Ready(None) => Poll::Ready(None), // End of stream
Poll::Pending => Poll::Pending, // Still waiting
}
}
} |
Thanks for the input! That's really cool to see how to use I think in theory it's not too hard to wrap up into a record batch. You just continue calling |
Yeah, it looks like a fresh GeoTableBuilder per batch, and more or less what you're doing in read_flatgeobuf_async. No need for the stream of FgbFeatures I think: impl<T> Stream for FgbStream<T>
where
T: AsyncRead + AsyncSeek + Unpin + Send,
{
type Item = Result<RecordBatch, Error>;
fn poll_next(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
let fut = self.read_batch();
// Rest of the original
}
async fn read_batch(&mut self) {
// Construct GeoTableBuilder, conventional loop call of next on the iter, stopping at batch_size.
} I'd say read_batch would be 95% source identical to read_flatgeobuf_async, it's really just the batch limiting, and grabbing the 0th record batch vs returning the GeoTable. Really depends if you want to avoid doubling up on stream interfaces 🤷 (that or one of the approaches runs into lifetime/Send issues). |
Ok cool that's a big help to get me unblocked. Hopefully the rest can be very similar to #933 |
It compiles at least! Now to see if it works! |
I tried running the test locally but it just hangs forever 🥲 ; I must be doing something wrong |
Ahah! I bet it's the else case in That being said... The hanging is a bit weird 🤔, handling short (and trailing, I'm fairly certain it'd truncate to exactly a multiple of the batch size with the current code) batches might not completely solve it. |
Ah, ok, this makes sense now - the fut of next_batch needs to be long-lived (rather than constructed in poll_next repeatedly), and /that/ needs to be polled (going by examples in the parquet stream implementation, So you pretty much have either: fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) {
// Initialise the fut for the next batch, loop poll it (the approach used in the parquet stream reader)
let fut = self.next_batch();
// Probably necessary
futures::pin_mut!(fut);
loop {
match ready!(fut.poll_unpin(cx)) {
Some(Ok(batch)) => {
// Break out of the loop
return Poll::Ready(Some(Ok(batch)));
},
// Error case as per usual
None => {
// Close out the stream
return Poll::Ready(None);
}
// No need for an explicit pending case
}
} Or:
|
Alright, update on this - I (finally) have a functional version of this (lodging a PR shortly). Indirecting via an Option (to avoid constant simultaneous borrowing problems), and dropping the |
I don't really know how to implement this.
I'd like to implement something like
RecordBatchStream
, a struct that implementsStream<Item = Result<RecordBatch>>
. But I don't understand the low-level implementation ofStream
andpoll_next
well enough to do that.FlatGeobuf exposes the
AsyncFeatureIter
struct, which implementsnext
but does not implementStream
.Ideally I want a high level API to transform the
AsyncFeatureIter
into a stream. I tried to use https://github.com/tokio-rs/async-stream but got stuck becauseAsyncFeatureIter
does not implementStream
.Additionally, even if I'm able to implement this method using
async-stream
, I really want the struct itself to implementStream
rather than having a method that returns an opaqueimpl Stream
.@H-Plus-Time maybe you have some thoughts on this? Ideally we would be able to plumb this through to JS like we have with the GeoParquet reader as well.