Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft of single component rebuilds for spin watch #2478

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

garikAsplund
Copy link
Contributor

@garikAsplund garikAsplund commented Apr 29, 2024

I'm still new to larger projects and Rust in general, so hopefully this isn't too janky.

This potentially closes #1417.

What this draft does is make the watched_changes receiver a tuple with a Uuid and a String. This allows paths for where the file change originated to be passed into the build_components function in the buildifier. The logic in build_components is effective for what projects I have up and running, but could use a test or inspection. Essentially if there is an empty string in the receiver tuple, it does a full build since it's the first execution. Any changes after that are matched to component IDs and selectively built with -c.

Unfortunately, since spawn_watchexec contains the notifier, the tuples have to be added to artifact and manifest senders and receivers as well. I tried to make the String an Option<String> but that seemed to overly complicate things. That could be because I'm not as familiar with idiomatic Rust.

Another small thing I noticed was that the debounce value is a u64. That could be set very high and never rebuild. I don't know if it's worth changing that to a u16 and then pass it to Duration::from_millis as a u64? If the user enters too large of a value, clap will emit an error.

Also, if the debounce value is high--anything over a few seconds--then there is potential for multiple components to be rebuilt. In this case Vec<String> or some other similar structure is better suited and would need to be refactored.

One last thing--I tried making changes within the manifest but never got it to rebuild automatically in response. That doesn't appear to be the intent though. I'm guessing that stems from ManifestFilterFactory.

@itowlson
Copy link
Contributor

Thanks for looking at this @garikAsplund - I've bounced off it a couple of times myself but had forgotten to remove the "good first issue" tag...

Unfortunately, I am sceptical about this being the way to go. There are some easy ways to improve it (e.g. send a list of affected files rather than the CSV debug string), but I'm not sure this tackles the broad problem of "what happens if two changes happen close together". You'll start a spin build -c foo for the first change, but then that gets cancelled by the second change which runs spin build -c bar - doesn't this leave foo not being rebuilt? There is also a problem if another component then changes - the watch::Receiver only tells you the single most recent value - so now bar is missed and only baz gets built. The challenge is to accumulate the list of components needing rebuilding.

I'm also a bit confused about how the buildifier is inferring which components to rebuild. It seems to be receiving a CSV string of changed files and scanning that to see if it contains any component IDs? But that can't work: component IDs are not file names. Perhaps I'm misunderstanding? It wouldn't be the first time... grin But generally I'm wary of having to infer the changed components from the list of files: this seems like it involves a lot of re-reading and glob matching.

The strategy I've taken with this - and, to be clear, which has failed twice, so take this with a grain of salt - was to create independent watchers per component, each knowing the associated component ID. Each watcher would then be able to say "a file for component X has changed." But then something needed to accumulate those into the watch channel, so that when the receiver detected a change, it atomically grabbed the accumulated list and could process the full set. (Plus if more came in before it had done the list, it would either ignore them or they would continue to accumulate.)

As you can see, the problem ends up being quite a subtle synchronisation issue, and my suspicion is that we're going to need a new synchronisation structure rather than relying solely on the watch channel. Your approach of sending file lists for the receiver to convert into component lists is an intriguing innovation but I'm not sure it solves that fundamental problem.

I apologise that the issue labelling implies that this is more approachable than (I think) it really is. Please do feel free to keep at it if you relish the challenge, or if you have an insight that bypasses my worries! And please do correct me if I've misunderstood your approach here!

And thanks again for taking a crack at this!

@garikAsplund
Copy link
Contributor Author

garikAsplund commented Apr 29, 2024

@itowlson Thanks for the feedback. That seems about right!

My first inclination was that every component needed its own watcher, but when I realized there was other info available I ran with that since it seemed simpler. I guess I also didn't see the -o flag for spin add which as you say complicates or negates my current approach of yanking file paths as a proxy for component IDs via paths_of() on line 337 of watch.rs.

Do you have shared code available for attempts at componentized watchers? I may take a look at that and see about adding a collection of component IDs to build values on refresh.

Also, do you have any input about desired/expected behavior of manifest file changes? Alternatively I could open an issue there and investigate that.

@itowlson
Copy link
Contributor

I don't have anything even close to working I'm afraid.

Manifest file changes must trigger a full rebuild and restart, and must reload the watch configuration (because we may now be watching different paths). This is currently managed by the ManifestFilterFactory, Reconfiguriser and ReconfigurableWatcher.

@garikAsplund
Copy link
Contributor Author

Is that working? Whenever I make changes to spin.toml there's no rebuild. Am I missing something with config or flagging?

@itowlson
Copy link
Contributor

It should work... if it's not then that's a bug.

@itowlson
Copy link
Contributor

I think you're right. It's not firing for me either. That's a bug: I'll take a look. Thanks!

@garikAsplund
Copy link
Contributor Author

@itowlson I'm not saying this is by any means complete, but I think this updated version takes into account some of the points you brought up yesterday.

I kept at the indirect approach of using paths_of as an entry point. Since the manifest has workdirs you can match on that for each component ID. That's what I went with and it seems to be resulting in desired behavior for rebuilds of multiple components with longer debounce times.

It seems like the watcher joins paths of changed files with ", " so it was easy to construct a collection of affected files/components that way. Is that correct?

There's some hacky stuff with selecting when to do a full rebuild with the startup and when the manifest changes. It's doubtful I've taken all considerations into account since I'm not as familiar with the whole process there.

Please take a look and see if this is way off base, somewhat fruitful, or in between.

@itowlson
Copy link
Contributor

Thanks for this! I do think this could be a fruitful avenue and I appreciate you coming up with it and exploring it. However, I have a couple of comments, one a minor tactical one, but one fundamental.

  1. paths_of returns a human-readable string, used for tracing. If you are going to interact with the path list programmatically, you should be sending it as a Vec<PathBuf> or Vec<String> or such like. Don't try to pick apart the tracing string.

  2. None of this helps with the fundamental watch channel problem. As you'll see from the watch documentation, a watch channel retains only the last sent value. So if you send a message saying "hey alice.txt has changed", and then moments later another saying "hey bob.txt has changed", then moments after _that "hey carol.txt has changed, busy day out there on the filesystem huh," the receiver may entirely miss one or more of these messages depending on when they are calling the receive method (e.g. changed or borrow_and_update). For example, if the receiver is awaiting a changed when alice is sent, it sees the latest value as alice.txt, it does some processing, comes back, and now yep there is a new latest value and it's... carol.txt. We completely missed bob.txt and now the user is going "why can I see some of my changes but not others." The current watch system works because we don't care what has changed, we only care that something has changed, and we represent that by sending random GUIDs. If we miss a GUID we don't care because we are going to do a full rebuild regardless. But if the value is semantically important then we need to remember that we only see the single latest value not all values since the last receive. My personal feeling is that we will need to rip out watch in favour of a sync structure that retains history, but there are challenges here around atomic drains. (If this is a topic that interests you, Mara Bos has a phenomenal book on it, free online at https://marabos.nl/atomics/.)

Hope this is helpful! It's a knotty problem...

@garikAsplund
Copy link
Contributor Author

Hmmmm, I guess I'm curious as to why it seems to work for me then 🤔

I don't mean to take away valuable time, but have you tried spin watch with this implementation? How granular would updates have to be for files to be missed and how likely is that? I've tried making changes in multiple files as fast as I can and it catches only those files. After a rebuild it seems to be flushed and catch the next round properly, too. I know there are times--like a merge or switching between branches--when a bunch of files would change simultaneously. That seems to work as well.

As to your points:

  1. Good to know. Currently this works by passing a &String around, then a &str, then splitting it into Vec<&str>. Should be a simple refactor to do the same or similar but with a PathBuf.
  2. Reading the watch documentation it looks like borrow_and_update() is the right call here since its within the loop. Not sure if that provides an escape hatch to save from a larger redesign.

Running spin watch --listen localhost:3003 -d 3000 yields these logs after making changes to these two components within 3 seconds:
change files manually

And the same thing with git checkout main from another branch where every component has changes:
git checkout

@itowlson
Copy link
Contributor

itowlson commented May 1, 2024

Yes, multiple changes within the debounce period get accumulated within watchexec, so trigger a single update. The race condition happens when you have changes that happen outside the debounce period - so that it sends multiple notifications - but within the check-to-check interval (basically, within an iteration of the loop) - so that one "latest" value gets overwritten by another before ever being seen.

And for sure I'm not surprised it works for you. I'd expect it to work for me too! This sort of thing manifests in an unpredictable race condition: it will cut in only when the timing is just right. (The specific timing condition is during a major customer demo.)

Again, the fundamental problem is that the current synchronisation mechanism is not well suited to the goal. We now want a mechanism that tells us which components have changed since last build. A mechanism that tells us the latest component that changed, with a hope that we pick up changes fast enough that we don't miss anything, is not that. Even if we get lucky 99% of the time! We shouldn't be reluctant to change the mechanism to meet the new need.

For example, tokio::sync::channel::mpsc and broadcast are closer to our needs because they can hold multiple values. watchexec could send changes on as they happen, and then the builidifier could pull off all pending changes and consolidate them. There is a subtlety around "how do we know we've removed all the values" which is why these aren't a slam dunk. But to my mind they're closer than watch.

Hope that all makes sense, I've already rewritten this message a few times to try to focus it, but still not quite happy with how it's come out... happy to discuss further for sure!

@garikAsplund
Copy link
Contributor Author

garikAsplund commented May 1, 2024

Ok, I think I more fully understand the issue and am able to reproduce it myself by staggering a bunch of file changes in and out of debounce periods--there is something lost in the mix.

Appreciate the thoughtful response.

I'm still curious and will poke around but I'm not promising anything. I'm not opposed to looking at the options you mentioned, but is a queue of use here? Or how/when/where exactly are the values within the check-to-check interval being replaced or lost? I guess are they ever accessible or is that what you're getting at with by bringing up alternative methods?

@itowlson
Copy link
Contributor

itowlson commented May 1, 2024

Yes, q queue is definitely useful, but I wasn't able to find a queue that was awaitable. It could certainly be a building block though! The characteristics I was thinking of for the synchronisation structure were something like:

  • Can hold multiple values
  • Ideally, multiple senders / single receiver. But multiple senders makes things harder in terms of atomicity and might not be feasible.
  • Sending a value appends it to the current contents (this could include deduplicating, or that could be down to whatever processes the values - doesn't really matter)
  • Receive operation:
    • Suspends if there is no pending data
    • Wakes as soon as there is pending data, and atomically drains that data and returns it to the caller. Any data sent after this point becomes pending for the next receive.

We might be able to do this by combining a watch channel (of Uuid, for awaiting changes) with an Arc<RwLock<Vec/HashSet/Queue>> to hold the pending list. The send operation would lock the collection and append to it, then send a new Uuid in the watch channel. The receive operation would await a change in the watch channel, and when it got one, would lock the collection and drain it. We'd need to look carefully at timing and potential races though. (E.g. with this sketch there is a risk of spurious wakes with an empty change set, although I think those could be managed and if not then the receiver could always ignore them. But there may be more subtle flaws in the idea. There usually are.)

And I may be overthinking this. Like polling a shared queue on a interval that's imperceptibly short to users but extremely long at the CPU scale (say every 10ms) might feel less beautiful than a simple receive().await but be perfectly responsive without incurring load. There are definitely options to explore here if you feel enthusiastic - but I totally hear you about not wanting to promise anything, and totally understand if you want to step away from this thorny little thicket!

@garikAsplund
Copy link
Contributor Author

garikAsplund commented May 1, 2024

Good stuff. I definitely have more of an appreciation for the nuances now!

Appears that mpsc and broadcast are more or less queues and looking around at other projects they're used quite a bit. For example, here's Deno's file watcher:

pub struct WatcherCommunicator {
  /// Send a list of paths that should be watched for changes.
  paths_to_watch_tx: tokio::sync::mpsc::UnboundedSender<Vec<PathBuf>>,

  /// Listen for a list of paths that were changed.
  changed_paths_rx: tokio::sync::broadcast::Receiver<Option<Vec<PathBuf>>>,

  /// Send a message to force a restart.
  restart_tx: tokio::sync::mpsc::UnboundedSender<()>,

  restart_mode: Mutex<WatcherRestartMode>,

  banner: String,
}
...
impl WatcherCommunicator {
  pub fn watch_paths(&self, paths: Vec<PathBuf>) -> Result<(), AnyError> {
    self.paths_to_watch_tx.send(paths).map_err(AnyError::from)
  }

  pub fn force_restart(&self) -> Result<(), AnyError> {
    // Change back to automatic mode, so that HMR can set up watching
    // from scratch.
    *self.restart_mode.lock() = WatcherRestartMode::Automatic;
    self.restart_tx.send(()).map_err(AnyError::from)
  }

  pub async fn watch_for_changed_paths(
    &self,
  ) -> Result<Option<Vec<PathBuf>>, AnyError> {
    let mut rx = self.changed_paths_rx.resubscribe();
    rx.recv().await.map_err(AnyError::from)
  }

  pub fn change_restart_mode(&self, restart_mode: WatcherRestartMode) {
    *self.restart_mode.lock() = restart_mode;
  }

  pub fn print(&self, msg: String) {
    log::info!("{} {}", self.banner, msg);
  }
}

They use notify instead of watchexec, but those projects look to be closely related. The watch_recv function probably has a bunch of good info to influence this redesign, too.

I'm not sure exactly how it's all related, but then there's the whole separate issue of HMR or workarounds in Rust and web spaces like what Dioxus did last year with hot reloading. Not at all diving into that mess, just mentioning it to be thorough--also see here. It depends on what the full spec of spin watch is or wants to be.

@itowlson
Copy link
Contributor

itowlson commented May 1, 2024

I would definitely not worry about hot reloading for now...!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 Triage Needed
Development

Successfully merging this pull request may close these issues.

Spin watch builds all components when a single component is changed
2 participants