-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid libc buffered IO #294
base: master
Are you sure you want to change the base?
Avoid libc buffered IO #294
Conversation
For reference, my benchmarking has been done on a The background of this is that a lot of EC2 instances don't live that long (relatively speaking), and never install RPMs except on launch - so all the time-to-install RPMs is time spent scaling up a system that could be better served by running the customer workload. |
librepo/downloader.c
Outdated
@@ -619,7 +619,7 @@ lr_writecb(char *ptr, size_t size, size_t nmemb, void *userdata) | |||
if (range_start <= 0 && range_end <= 0) { | |||
// Write everything curl give to you | |||
target->writecb_recieved += all; | |||
return fwrite(ptr, size, nmemb, target->f); | |||
return write(target->fd, ptr, all); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing fwrite() with write() is not so simple:
- write() expects size_t in its last argument, while you pass gint64. That won't match on 32-bit memory model, e.g. on 32-bit x86 platform.
- write() returns a signed ssize_t, while fwrite() unsigned size_t. Your return statement does match lr_writecb() prototype.
- write() can return before writing all data with errno==EINTR and you need to call write() again.
Please fix these issues at all places you replaced fwrite() with write().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
Included some code for keeping to reasonable IO sizes and just avoiding any of these issues with signed/unsigned.
librepo/downloader.c
Outdated
@@ -684,7 +684,7 @@ lr_writecb(char *ptr, size_t size, size_t nmemb, void *userdata) | |||
} | |||
|
|||
assert(nmemb > 0); | |||
cur_written = fwrite(ptr, size, nmemb, target->f); | |||
cur_written = write(target->fd, ptr, size * nmemb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if size*nmemb overflows size_t? fwrite() handled it for you. Now you need to implement it check yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to check for overflow and fall back to a well known size.
librepo/downloader.c
Outdated
@@ -1033,9 +1033,9 @@ remove_librepo_xattr(LrDownloadTarget * target) | |||
gboolean | |||
lr_zck_clear_header(LrTarget *target, GError **err) | |||
{ | |||
assert(target && target->f && target->target && target->target->path); | |||
assert(target && target->fd && target->target && target->target->path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fd is an int. fd==0 is a valid descriptor. You should compare it to -1, or at least to < 0 as in invalid descriptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
librepo/downloader.c
Outdated
@@ -1295,7 +1295,7 @@ static gboolean | |||
check_zck(LrTarget *target, GError **err) | |||
{ | |||
assert(!err || *err == NULL); | |||
assert(target && target->f && target->target); | |||
assert(target && target->fd && target->target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again. fd == 0 is a valid value. Compare to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
librepo/downloader.c
Outdated
fclose(target->f); | ||
target->f = NULL; | ||
close(target->fd); | ||
target->fd = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assign -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
librepo/downloader.c
Outdated
@@ -1573,8 +1563,7 @@ prepare_next_transfer(LrDownload *dd, gboolean *candidatefound, GError **err) | |||
|
|||
if (target->original_offset == -1) { | |||
// Determine offset | |||
fseek(target->f, 0L, SEEK_END); | |||
gint64 determined_offset = ftell(target->f); | |||
gint64 determined_offset = lseek(target->fd, 0L, SEEK_END); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lseek() returns off_t. Not gint64. Those won't match on 32-bit platforms without large file support enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed (in upcoming updated patch...)
there's this snipped below from the original code:
gint64 used_offset = target->original_offset;
g_debug("%s: Used offset for download resume: %"G_GINT64_FORMAT,
__func__, used_offset);
c_rc = curl_easy_setopt(h, CURLOPT_RESUME_FROM_LARGE,
(curl_off_t) used_offset);
So there might be some places in existing code that doesn't match everywhere. Possible that this should all just move to off_t
?
librepo/downloader.c
Outdated
if (target->f != NULL) { | ||
fclose(target->f); | ||
target->f = NULL; | ||
if (target->fd != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare to -1 and set to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
librepo/downloader.c
Outdated
fclose(target->f); | ||
target->f = NULL; | ||
close(target->fd); | ||
target->fd = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
librepo/downloader.c
Outdated
fclose(target->f); | ||
target->f = NULL; | ||
close(target->fd); | ||
target->fd = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set to -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
librepo/downloader.c
Outdated
@@ -2803,7 +2791,7 @@ lr_download(GSList *targets, | |||
for (GSList *elem = dd.targets; elem; elem = g_slist_next(elem)) { | |||
LrTarget *target = elem->data; | |||
assert(target->curl_handle == NULL); | |||
assert(target->f == NULL); | |||
assert(target->fd == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare against -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
In general, I like your patch, but please address the different semantics I pointed in-line. |
Thanks for the eyes on it. I can clean up the review comments, and spin a rev2. |
the FILE related IO functions in libc do buffering inside the C library, and are generally less performant than using the file descriptor based open()/read()/write() functions. The FILE based approach somewhat limits the maximum throughput of librepo, as well as increases CPU usage. In my benchmarks of a reposync of the Amazon Linux 2023 x86-64 repositories, this move to file descriptor based IO saves about 1 second of user time, and .5 seconds of system time, for a wall clock time benefit of a few seconds (102s vs 99s).
a38226c
to
ebd4be0
Compare
I think I've managed to address the comments and ensure correct behavior in (hopefully) all error conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks fine. One small change, though.
* We write up to 32MB at a time, mainly to ensure that the below loops | ||
* never bitrot, and it's regularly tested with real world RPMs. | ||
*/ | ||
#define WRITECB_CHUNK_MAX 32*1024*1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please indent to match the line starting point.
the FILE related IO functions in libc do buffering inside the C library, and are generally less performant than using the file descriptor based open()/read()/write() functions.
The FILE based approach somewhat limits the maximum throughput of librepo, as well as increases CPU usage. In my benchmarks of a reposync of the Amazon Linux 2023 x86-64 repositories, this move to file descriptor based IO saves about 1 second of user time, and .5 seconds of system time, for a wall clock time benefit of a few seconds (102s vs 99s).