Skip to content

Commit

Permalink
rust: remove copy in GCS reading (tensorflow#4675)
Browse files Browse the repository at this point in the history
Summary:
We now let `gcs::Client::read` return `Bytes` rather than `Vec<u8>` so
that it can pass the response body directly from the HTTP client to the
caller, without additional copying.

For end-to-end speed, benchmarks suggest that this is inconclusive:

dataset     | before (s)   | after (s)   | improvement factor
------------|--------------|-------------|--------------------
`mnist`     | 1.12 ± 0.14  | 1.05 ± 0.06 | 1.05 ± 0.14
`edge_cgan` | 21.0 ± 2.0   | 20.1 ± 0.6  | 1.05 ± 0.11

The confidence intervals clearly overlap. But this change eliminates a
large copy and simplifies the code a bit, so it’s nice to have anyway.

Test Plan:
Type checking suffices; smoke testing `gs://tensorboard-bench-logs/mnist`
with `--load_fast` also checks out.

wchargin-branch: rust-gcs-read-no-copy
  • Loading branch information
wchargin authored Feb 10, 2021
1 parent 842296a commit 510d225
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 4 deletions.
1 change: 1 addition & 0 deletions tensorboard/data/server/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ rust_library(
"//tensorboard/data/server/cargo:async_stream",
"//tensorboard/data/server/cargo:base64",
"//tensorboard/data/server/cargo:byteorder",
"//tensorboard/data/server/cargo:bytes",
"//tensorboard/data/server/cargo:clap",
"//tensorboard/data/server/cargo:crc",
"//tensorboard/data/server/cargo:env_logger",
Expand Down
8 changes: 4 additions & 4 deletions tensorboard/data/server/gcs/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ limitations under the License.

//! Client for listing and reading GCS files.
use bytes::Bytes;
use log::debug;
use reqwest::{
blocking::{Client as HttpClient, RequestBuilder, Response},
Expand Down Expand Up @@ -126,17 +127,16 @@ impl Client {
bucket: &str,
object: &str,
range: RangeInclusive<u64>,
) -> reqwest::Result<Vec<u8>> {
) -> reqwest::Result<Bytes> {
let mut url = Url::parse(STORAGE_BASE).unwrap();
url.path_segments_mut().unwrap().extend(&[bucket, object]);
// With "Range: bytes=a-b", if `b >= 2**63` then GCS ignores the range entirely.
let max_max = (1 << 63) - 1;
let range = format!("bytes={}-{}", range.start(), range.end().min(&max_max));
let res = self.send_authenticated(self.http.get(url).header("Range", range))?;
if res.status() == StatusCode::RANGE_NOT_SATISFIABLE {
return Ok(Vec::new());
return Ok(Bytes::new());
}
let body = res.error_for_status()?.bytes()?;
Ok(body.to_vec())
res.error_for_status()?.bytes()
}
}

0 comments on commit 510d225

Please sign in to comment.