-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix WARC headers parsing when record has Content-Length: 0
and record after it.
#42
Conversation
Hi @jedireza! I appreciate your review on this PR, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating a PR.
Fix WARC headers parsing when record has Content-Length: 0 and record after it.
So is there a bug you're trying to fix? If so, could we first add a failing test case?
Also fixed flaky test
record::verify_display
by sorting header names.
I didn't know this was a flaky test, but I was able to confirm. I'm not sure we should incur the cost of sort
during every display.
Here is another way to make verify_display
stable:
fn verify_display() {
let header_entries = vec![
(WarcHeader::WarcType, b"dunno".to_vec()),
(WarcHeader::Date, b"2024-01-01T00:00:00Z".to_vec()),
];
let headers = RawRecordHeader {
version: "1.0".to_owned(),
headers: header_entries.into_iter().collect(),
};
let output = headers.to_string();
let expected_lines = vec![
"WARC/1.0",
"warc-type: dunno",
"warc-date: 2024-01-01T00:00:00Z",
"",
];
let actual_lines: Vec<_> = output.lines().collect();
let mut expected_headers: Vec<_> = expected_lines[1..expected_lines.len() - 1].to_vec();
expected_headers.sort();
let mut actual_headers: Vec<_> = actual_lines[1..actual_lines.len() - 1].to_vec();
actual_headers.sort();
// verify parts
assert_eq!(actual_lines[0], expected_lines[0]); // WARC version
assert_eq!(actual_headers, expected_headers); // headers (sorted)
assert_eq!(actual_lines.last(), expected_lines.last()); // empty line
}
3e4bbd1
to
7f662f8
Compare
@jedireza Thanks for coming back on this! The test case Regarding |
…rd after it. Validated against Python implementation: https://github.com/webrecorder/warcio Also fixed flaky test `record::verify_display` by sorting header names in the test.
7f662f8
to
7d7a2d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! This makes sense to me now. Had to reacquaint myself with this code and warc a bit.
Thanks for quick response and PR approval! I'd appreciate if you could merge it and release new version of the crate, so that I can update dependencies in my projects instead of using a fork. |
Going to merge this and make it a patch release. |
Published as |
Validated against Python implementation: https://github.com/webrecorder/warcio
Also fixed flaky test
record::verify_display
by sorting header names.