-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sensitive to the order inputs and outputs are listed? #62
Comments
Making sure I follow, it's something like
and you reordered baz1/baz2/baz3? |
Pretty much, yep. The inputs were listed as implicit dependencies, and then listed out in the command line as well. The dependency line was changing, but not the command line. I think the same issue existed for implicit outputs as well. |
Hrmm. On the one hand, this might be easy to fix. On the other, it feels like if you're changing your build file around it's not so bad for ninja to rebuild? How would you feel about this if there were better debugging output available? |
Better debugging facilities would be great - it's useful for a bunch of other cases as well, and "build file change triggers rebuild" does make sense. |
Thinking about it more, n2 doesn’t have as much debug info available (see the discussion of manifests on https://neugierig.org/software/blog/2022/03/n2.html ), hrm… |
Thanks for implementing that. Yeah, I remember reading your note about hashes losing information about the build when I read that article, and I encountered it when using Bazel as well. The approach I used in my quick-and-dirty hack was to gather the same data the hash uses and serialize it: diff --git a/src/graph.rs b/src/graph.rs
index 506023d..f1f60a6 100644
--- a/src/graph.rs
+++ b/src/graph.rs
@@ -377,7 +377,7 @@ const UNIT_SEPARATOR: u8 = 0x1F;
fn hash_files(
hasher: &mut std::collections::hash_map::DefaultHasher,
graph: &Graph,
- file_state: &mut FileState,
+ file_state: &FileState,
ids: &[FileId],
) {
for &id in ids {
@@ -395,6 +395,31 @@ fn hash_files(
}
}
+pub fn dump_files(
+ dump: &mut Vec<(String, i64)>,
+ graph: &Graph,
+ file_state: &FileState,
+ ids: &[FileId],
+) {
+ for &id in ids {
+ let name = &graph.file(id).name;
+ let mtime = file_state
+ .get(id)
+ .unwrap_or_else(|| panic!("no state for {:?}", name));
+ let mtime = match mtime {
+ MTime::Stamp(mtime) => mtime,
+ MTime::Missing => panic!("missing file: {:?}", name),
+ };
+ dump.push((
+ name.clone(),
+ mtime
+ .duration_since(SystemTime::UNIX_EPOCH)
+ .unwrap()
+ .as_millis() as i64,
+ ));
+ }
+}
+
// Hashes the inputs of a build to compute a signature.
// Prerequisite: all referenced files have already been stat()ed and are present.
// (It doesn't make sense to hash a build with missing files, because it's out
@@ -417,6 +442,27 @@ pub fn hash_build(
Ok(Hash(hasher.finish()))
}
+#[derive(Default, Debug, serde::Serialize, serde::Deserialize, PartialEq)]
+pub struct BuildDump {
+ pub ins: Vec<(String, i64)>,
+ pub outs: Vec<(String, i64)>,
+ pub cmdline: String,
+}
+
+pub fn dump_build(
+ graph: &Graph,
+ file_state: &FileState,
+ build: &Build,
+) -> std::io::Result<BuildDump> {
+ let mut dump = BuildDump::default();
+ dump_files(&mut dump.ins, graph, file_state, build.dirtying_ins());
+ dump_files(&mut dump.ins, graph, file_state, build.discovered_ins());
+ dump_files(&mut dump.outs, graph, file_state, build.outs());
+ dump.cmdline = build.cmdline.clone().unwrap_or_default();
+ assert!(build.rspfile.is_none());
+
+ Ok(dump)
+}
#[derive(Default)]
pub struct Hashes(HashMap<BuildId, Hash>);
diff --git a/src/work.rs b/src/work.rs
index 227f5bb..9363c6f 100644
--- a/src/work.rs
+++ b/src/work.rs
@@ -4,8 +4,10 @@ use crate::{
canon::canon_path, db, densemap::DenseMap, graph::*, progress, progress::Progress, signal,
smallmap::SmallMap, task, trace,
};
+use std::collections::HashMap;
use std::collections::HashSet;
use std::collections::VecDeque;
+use std::path::Path;
use std::path::PathBuf;
use std::time::Duration;
@@ -292,6 +294,7 @@ pub struct Work<'a> {
last_hashes: &'a Hashes,
build_states: BuildStates,
runner: task::Runner,
+ dump: HashMap<String, BuildDump>,
}
impl<'a> Work<'a> {
@@ -315,6 +318,7 @@ impl<'a> Work<'a> {
last_hashes,
build_states: BuildStates::new(builds, pools),
runner: task::Runner::new(parallelism),
+ dump: Default::default(),
}
}
@@ -598,6 +602,13 @@ impl<'a> Work<'a> {
// so compare the hash against the last hash.
// TODO: skip this whole function if no previous hash is present.
let hash = hash_build(self.graph, &mut self.file_state, build)?;
+
+ assert!(build.cmdline.is_some());
+ self.dump.insert(
+ build.cmdline.clone().unwrap(),
+ dump_build(self.graph, &self.file_state, build)?,
+ );
+
Ok(self.last_hashes.changed(id, hash))
}
@@ -725,6 +736,37 @@ impl<'a> Work<'a> {
// "interrupted by user" and exit with success, and in that case we
// don't want n2 to print a "succeeded" message afterwards.
let success = tasks_failed == 0 && !signal::was_interrupted();
+ if success {
+ let dump_path = Path::new(".n2-dump");
+ if dump_path.exists() {
+ let previous_dump: HashMap<String, BuildDump> =
+ serde_json::from_slice(&std::fs::read(dump_path).unwrap()).unwrap();
+
+ for (k, current) in &self.dump {
+ match previous_dump.get(k) {
+ Some(prev) => {
+ if prev != current {
+ let old = serde_json::to_string(prev).unwrap();
+ let new = serde_json::to_string(current).unwrap();
+ println!(
+ "{k}: {:#?}",
+ json_diff::compare_jsons(&old, &new).unwrap()
+ );
+ }
+ }
+ None => {
+ println!("not seen before: {}", k);
+ }
+ }
+ }
+
+ for (k, v) in previous_dump {
+ // add any previous entries that are missing to current dump
+ self.dump.entry(k).or_insert(v);
+ }
+ };
+ std::fs::write(dump_path, serde_json::to_vec(&self.dump).unwrap()).unwrap();
+ }
Ok(success.then(|| tasks_done))
}
If the extra work were gated behind a tool like explain so the extra costs were only paid when debugging, could that be a possible solution to the all-we-have-is-a-hash problem? |
Ok, I did this. I made it indirect through a trait so the debug-printer and real calculation are guaranteed to cover the same data. Output looks like:
|
Thanks Evan, this is great - I already found myself reaching for it today while debugging a build issue. After those initial migration issues I mentioned previously, everything has been smooth sailing, and I'm still loving the new status output. Wish I'd made the switch earlier! |
Wow thanks! I think Anki is super cool so please let me know if there's anything else you'd like in n2, even something minor! |
Thank you for your work on both ninja and this project - the simplicity and speed is great.
I considerably prefer the progress output of n2, so have been looking at moving our build over to it. The first issue I ran into was #39, and that was an easy fix (though resolving the variables at build.ninja creation time increased the build.ninja size from about 1.6M to 1.9MB).
The other issue took considerably longer to debug - I was seeing some rules being rebuilt on subsequent invocations of n2, and had no idea why. I couldn't resort to using ninja -d explain, as the issue didn't happen there, and n2 has no native explain tool. I ended up hacking together a solution that dumps the mtime of files on each run and compares them to make it clear what root change is triggering a build step to be rebuilt, and finally figured out that it wasn't the mtimes that were changing, but the order of the input and output files in build.ninja. Our build generator was writing the cmdlines consistently, but the inputs/outputs were being accumulated into hashmaps before being written out, leading to non-deterministic output.
Once I knew what the problem was, it was easy to do the sorting in the build generator, but I thought I would mention it in case you feel it's worth solving on n2's end for consistency with ninja, or at least documenting as a migration gotcha.
The text was updated successfully, but these errors were encountered: