Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpointing to metagraph build #197

Draft
wants to merge 59 commits into
base: master
Choose a base branch
from
Draft

Add checkpointing to metagraph build #197

wants to merge 59 commits into from

Conversation

danieldanciu
Copy link
Contributor

As discussed via chat, there are multiple checkpoints after each major operation: generate kmers, generate reverse complements, splitting into chunks, generate dummy1, concatenate chunks, generate dummy2..k.

The client interface exposes only 2 phases:

  • phase 1 builds everything up to and including dummy k-mers
  • phase 2 finishes the build

I tested by inserting exit statements in the code after each phase and making sure the build is successful on resume.
I also added functional tests for building in phases.

@karasikov
Copy link
Member

Could you add a test where it builds one graph, kills the process, so it fails to build to the end.
Then it does the same thing for a different graph. Then restarts building both graphs in parallel, and checks that both graphs are correctly generated.

So we check that there are no collisions here and you can correctly generate multiple graphs in parallel on the same machine.

@danieldanciu
Copy link
Contributor Author

danieldanciu commented Sep 8, 2020 via email

@danieldanciu
Copy link
Contributor Author

danieldanciu commented Sep 9, 2020 via email

Comment on lines +20 to +24
if (std::filesystem::exists(checkpoint_file_)) {
std::ifstream f(checkpoint_file_);
f >> checkpoint_;
if (checkpoint_ > 0) {
f >> kmer_dir_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write the phase too. Otherwise, how do you know which phase this checkpoint corresponds to?

If you assume that different phases have different checkpoints, like 1: (1,2,3), 2: (4,5,6), then make the interface not allow setting checkpoint = 0 for the second phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you said, the phases correspond to different checkpoints, so writing the phase is not useful - it just tells the program up to which checkpoint it should go before stopping.
So if the users enters phase=1, we might make it up to checkpoint 2, and then resume the operation from checkpoint 2 and finalize the phase.

Setting checkpoint_=0 for phase_=2 is perfectly valid. One tells us how far we got with the computation, the other how far we wish to go. The other way around, setting checkpoint_=5 and phase_=1 could be invalid, but in our case phase_=1 corresponds to having all checkpoints done, so any combination of valid checkpoint (0..5) and phase values (1..2) is fine.

@@ -703,7 +819,7 @@ void recover_dummy_nodes(const KmerCollector &kmer_collector,
return kmer::transform<KMER>(reinterpret_cast<const KMER_REAL &>(v), k + 1) + kmer_delta;
}
},
real_split_by_W, true
real_split_by_W, false /* remove sources */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a variable name commented out

Suggested change
real_split_by_W, false /* remove sources */
real_split_by_W, false // remove sources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the standard way of commenting parameters represented by constants. You can only use // here because the param happens to be at the end of the line.

@karasikov
Copy link
Member

It's better to have non-deterministic tests than no tests.
The expected result is well determined. The second run of metagraph started after a previous run that was killed, must construct a valid graph.
So it's a perfectly defined test, isn't it?

I can add a test that builds 2 graphs in parallel (using phases), but I won't kill the process as that causes the test to be undeterministic.

On Tue, 8 Sep 2020 at 19:31, Mikhail Karasikov @.***> wrote: Could you add a test where it builds one graph, kills the process, so it fails to build to the end. Then it does the same thing for a different graph. Then restarts building both graphs in parallel, and checks that both graphs are correctly generated. So we check that there are no collisions here and you can correctly generate multiple graphs in parallel on the same machine. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#197 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQLCNGGYLC7VF3SD5WZLSEZS75ANCNFSM4RAHV52Q .

@karasikov karasikov changed the base branch from dev to master June 10, 2021 09:11
@hmusta hmusta marked this pull request as draft October 25, 2023 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants