-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More crutest and dsc updates to support Volume layer activities. #1472
Conversation
These changes should be a fix for both issues: #1451 #1457 This adds support for crutest to use a provided dsc endpoint to construct a Volume object. Moved the existing volume creation steps to a new function, and added another option on how we can create a Volume. The two previous ways of creating a volume are not changed (though I changed a log message and added some warnings). The new code is in taking the dsc provided endpoint and using that to construct a volume. Additional dsc changes were made to help provide Volume info. Renamed things in dsc to better reflect what information they hold. Specifically, update a bunch of region set comments, as dsc just controls crucible-downstairs processes, and does not know which ones are part of what region set. New dsc commands: get_ds_uuid: Returns the UUID for the given client ID. all_running: Returns true if all downstairs that dsc knows about are currently in Running state. get_region_count: Returns the total number of regions that dsc knows about. New dsc behavior. dsc will now wait on all downstairs starting before taking any commands. The ability for dsc to answer a request can be used by a test to confirm that all downstairs had started. Add the ability to supply a dsc endpoint to crutest-cli #1459 tools/test_replay.sh transitioned to using the new --dsc option, as that test already required a dsc endpoint and was using a hard coded default value for it. tools/test_restart_repair.sh was updated to wait for dsc to report that all downstairs are online after a restart. This avoids a race where we told dsc to start, and then start crutest, but the downstairs are not yet online. All the tests that use dsc will eventually transition to using it to construct a Volume, but I'm pushing that work to another PR. There are more changes coming, specifically: Updating replace-before-acive and replace-reconcile to use the new dsc option correctly instead of a default. Updating tools/test_* to not require targets for crutest and instead use dsc option. More updates in crutest to update the current RegionInfo struct to be aware of multiple sub-volumes. #1454 Possibly some updates to the BlockIO trait. #1455 Other work this enables Making tests that use multiple sub-volumes. Layering the current set of tests in a way so we can run the same tests with a single sub-volume and with multiple sub-volumes
crutest/src/cli.rs
Outdated
} else { | ||
println!("dsc: Need to be connected first"); | ||
match dsc_cmd { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you write this as
} else if let DscCommand::Connect { server } = dsc_cmd {
instead of the nested match
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's much better. Done in 11fb248
@@ -139,7 +137,7 @@ enum Workload { | |||
ReplaceBeforeActive { | |||
/// URL location of the running dsc server | |||
#[clap(long, default_value = "http://127.0.0.1:9998", action)] | |||
dsc: String, | |||
dsc_str: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the new global dsc
argument here (and in ReplaceReconcile
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can and will! Coming in my next PR.
It also requires some changes to the tests themselves, and I felt this PR had enough in it.
if !opt.target.is_empty() { | ||
warn!(test_log, "targets are ignored when VCR is provided"); | ||
} | ||
let volume = Volume::construct(vcr, pr, block_io_logger).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we warn or error if dsc
is provided here?
Personally, I would vote for errors (including for the lines that are warnings above), since invalid CLI arguments are probably a sign that the user is doing something wrong (and no one reads the logs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are not quite ready to make these hard errors yet, though you are right that no one reads the logs, unless there is a failure. This could lead to someone thinking they are testing one thing, when in fact they are testing another, which is not good.
My next PR (yes, I'm using that again) has more changes that will get us closer to not
needing both targets and a VCR file. The
There are some cases where we use the targets provided for building test specific args. The ReplaceBeforeActive
test you pointed to above is one of these cases.
I've made an issue to come back and harden up the errors here.
For dsc
specifically, we could, in theory, create a volume from a VCR file, then use DSC to connect and control the downstairs themselves. I'm trying to figure out how we will support having sub-volumes with different sizes and how to ingest that into crutest, and I was able to do that by layering a few dsc create commands in a way that did it.
Another idea is to support multiple dsc
endopints. I'm still not sure which way is better.
let res = dsc_client.dsc_get_region_count().await.unwrap(); | ||
let regions = res.into_inner(); | ||
|
||
let sv_count = regions / 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error if sv_count
is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 11fb248
crutest/src/main.rs
Outdated
let mut extent_info_result = None; | ||
for target in &crucible_opts.target { | ||
let port = target.port() + crucible_common::REPAIR_PORT_OFFSET; | ||
println!("look at: http://{}:{} ", target.ip(), port); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info!(test_log, ...)
instead of println!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot I had a logger here (or possibly when I wrote this before I moved it)
Fixed in 11fb248
crutest/src/main.rs
Outdated
let repair_client = repair_client::new(&repair_url); | ||
match repair_client.get_region_info().await { | ||
Ok(ri) => { | ||
println!("RI is: {:?}", ri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 11fb248
crutest/src/main.rs
Outdated
break; | ||
} | ||
Err(e) => { | ||
println!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here (warn!
perhaps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I consider myself warned in 11fb248
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small comments, but generally LGTM
If anyone else want's to comment (or has comments in progress) let me know, otherwise I'll merge this |
This adds support for crutest to use a provided dsc endpoint to construct a Volume object.
These changes should be a fix for both issues:
#1451 and #1457
Moved the existing volume creation steps to a new function, and added another option on how we can create a Volume. The two previous ways of creating a volume are not changed (though I changed a log message and added some warnings). The new code is in taking the dsc provided endpoint and using that to construct a volume.
Additional dsc changes were made to help provide Volume info.
Renamed things in dsc to better reflect what information they hold. Specifically, update a bunch of region set comments, as dsc just controls crucible-downstairs processes, and does not know which ones are part of what region set.
New dsc commands
New dsc behavior
Other changes
tools/test_replay.sh
transitioned to using the new --dsc option, as that test already required a dsc endpoint and was using a hard coded default value for it.tools/test_restart_repair.sh
was updated to wait for dsc to report that all downstairs are online after a restart. This avoids a race where we told dsc to start, and then start crutest, but the downstairs are not yet online.All the tests that use dsc will eventually transition to using it to construct a Volume, but I'm pushing that work to another PR.
There are more changes coming, specifically
crutest's
RegionInfo
needs to support Volume level. #1454Some BlockOps need more logic at the Volume layer #1455
Other work this enables