Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable servers to use pmi to get rank and count #626

Merged
merged 1 commit into from
Aug 9, 2021

Conversation

adammoody
Copy link
Collaborator

@adammoody adammoody commented May 4, 2021

For the servers to acquire their rank and the number of servers in the job, one must currently create a hostfile. This is automatic when using the unifyfs utility to launch the servers, but it is a bit cumbersome and error prone when launching the servers directly, which is helpful at times.

export UNIFYFS_SERVER_HOSTFILE=/path/to/hostlist
rm -f $UNIFYFS_SERVER_HOSTFILE
echo $SLURM_NNODES > $UNIFYFS_SERVER_HOSTFILE
srun -n $SLURM_NNODES -N $SLURM_NNODES /bin/hostname >> $UNIFYFS_SERVER_HOSTFILE

srun -n2 -N2 /path/to/install/bin/unifyfsd

And if one forgets to define a hostfile, all servers start up assuming they are rank 0 in a one process job. That situation can be confusing to debug.

This PR enables the servers to use PMI2/PMIX to acquire the number of servers and their rank within the set. If PMI is enabled and if UNIFYFS_SERVER_HOSTFILE is not set, then the servers use PMI to get the rank and server count. If PMI is not enabled or if UNIFYFS_SERVER_HOSTFILE is defined, then the servers use the host file method.

When PMI is available, this simplifies the task of launching the servers manually through the job launcher as the hostfile can be avoided. In particular, the above simplifies to just:

srun -n2 -N2 /path/to/install/bin/unifyfsd

@adammoody adammoody added the WIP label May 4, 2021
@adammoody
Copy link
Collaborator Author

Maybe we could check whether the hostfile is set at runtime, and use it if so, even if PMI2/PMIX is selected?

Similarly for the key/value store if we check the SHAREDFS path, use the file system if defined, and use PMI otherwise if PMI is enabled?

@adammoody adammoody force-pushed the pmirank branch 2 times, most recently from 094652e to 463eb7a Compare May 4, 2021 01:18
@adammoody
Copy link
Collaborator Author

adammoody commented Jul 19, 2021

One problem with exchanging server addresses with PMI2.. at least one version of SLURM PMI2 seems to use ; characters to separate key/value pairs. The margo address uses the ; character, which confuses SLURM so that a put with a value like:

PMI2_KVS_Put("unifyfs.margo-svr", "ofi+tcp;ofi_rxm://123.123.123.123:55555")

leads to an error like:

slurmstepd: error: mpi/pmi2: no value for key ;ofi_rxm://123.123.123.123:55555; in req

As a fix, I put in a hack to convert any ; character in the value string to a ! character when inserting a key/value in PMI2. All ! chars are then converted to ; in the value string of a PMI2 get call. This assumes that the value string does not actually use the ! character, which looks to be safe for margo address strings for now.

@adammoody adammoody force-pushed the pmirank branch 2 times, most recently from f113726 to 50a3089 Compare July 20, 2021 20:16
@adammoody adammoody removed the WIP label Jul 20, 2021
@adammoody adammoody added the WIP label Jul 21, 2021
@adammoody
Copy link
Collaborator Author

Finding some strange behavior in testing, so putting this back on WIP while trying to figure it out.

@adammoody
Copy link
Collaborator Author

Setting the following does fix the problem I was chasing:

export ABT_THREAD_STACKSIZE=256000

Hooray for @CamStan !

Under Totalview, I could see that some data structure had a NULL pointer. Since I'm messing with the order of startup calls, I thought I had messed up some ABT_Init call with my changes, so I was chasing that. However, after some more breakpoints the problem shifted to a segfault in clock_gettime(), and I never touched anything that should mess that up.

@CamStan
Copy link
Member

CamStan commented Aug 2, 2021

Setting the following does fix the problem I was chasing:

export ABT_THREAD_STACKSIZE=256000

Oh nice! I thought I'd tried this in attempt to resolve the issue I was seeing, but I'll try it again and see if it helps there as well.

@adammoody
Copy link
Collaborator Author

@adammoody
Copy link
Collaborator Author

Confirmed that this works on both lassen and quartz with the new setting, but it fails with a segfault in both cases without it. The segfaults come from different parts of the code.

@CamStan
Copy link
Member

CamStan commented Aug 3, 2021

I don't know what I did when I first tried fixing my problem by setting ABT_THREAD_STACKSIZE, but on another first attempt, it appears to have fixed it this time.

@adammoody
Copy link
Collaborator Author

Margo sets the ABT stack size to be 2MB by default. PR #659 changes things to let margo call ABT_init for us, in which case, its setting of 2MB is used. With that, things work for me without having to set ABT_THREAD_STACK_SIZE in the environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants