Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap input files rather than reading #109

Open
evmar opened this issue Jan 21, 2024 · 5 comments · May be fixed by #112
Open

mmap input files rather than reading #109

evmar opened this issue Jan 21, 2024 · 5 comments · May be fixed by #112

Comments

@evmar
Copy link
Owner

evmar commented Jan 21, 2024

[Consolidating some comments from #107 and #108]

Currently we read input files into a local buffer. I had thought this was better than mmap because:

  1. we need a trailing nul bye on the file for parser reasons
  2. performance -- I can't remember where I had read it, but I recall the argument that mmap can be costlier than reading when it's a large file you're reading sequentially, due to some kernel overhead related to setting up all the memory mapping data structures(?)

However, it turns out that you can just mmap a private page at the end of the file to write the nul, and @Colecf says "Judging based on n2's tracing output, just reading the file into memory takes over twice as long as the android fork's entire ninja invocation".

@evmar
Copy link
Owner Author

evmar commented Jan 21, 2024

Relevant code in Android's Ninja fork. (Not sure how to get a permalink to that commit but the file is unlikely to change much in the future, but it's the function MapFile.)

@evmar
Copy link
Owner Author

evmar commented Jan 21, 2024

I wrote a mini benchmark here.

On this Mac Air M2 for a random 1.4gb file:

without madvise:

  ./target/release/read-file mmap $F ran
    1.25 ± 0.01 times faster than ./target/release/read-file read $F

with madvise:

  ./target/release/read-file mmap $F ran
    1.36 ± 0.02 times faster than ./target/release/read-file read $F

So I conclude at least for this hw/OS that mmap is faster, and madvise helps too.

@Colecf
Copy link
Contributor

Colecf commented Jan 21, 2024

Relevant code in Android's Ninja fork. (Not sure how to get a permalink to that commit but the file is unlikely to change much in the future, but it's the function MapFile.)

Sorry, this link should be permanent.

@Sarcasm
Copy link

Sarcasm commented Jan 22, 2024

Just a thought regarding "correctness" more than performances.

Is there any risk of the mmaped build.ninja changing during the build?
I had a bad experience trying to mmap user-controlled file, an update of the file at the wrong time could cause my process to crash with SIGBUS.
Edit: it could just be I didn't use the proper type of memory mapping.

The build.ninja file can be regenerated during build by the build system (I think cmake does this when a CMakeLists.txt changes), can this invalidate the memory mapping?

@evmar
Copy link
Owner Author

evmar commented Jan 29, 2024

can this invalidate the memory mapping?

The Linux man pages says it's undefined behavior. The Mac ones don't say either way. It is true that we wouldn't want to preserve the mapping beyond the point where the file might be rewritten though.

@Colecf Colecf linked a pull request Feb 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants