-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I... additions #1165
Comments
Ah I filed the sample metadata part of this already https://github.com/pystatgen/sgkit/issues/1151 |
Just noting that merging separate contigs could be easy with (Looks like the Eh on second thought that |
Okay I think this should do it for merging 2 datasets with no overlapping contigs, which I think will be the most common case. If we wanted to do this safely we'd need to build an index over samples and merge on (This was almost there: somehow def concat_chrs(chr1, chr2):
new_ds_dict = {}
# Concatenate contig_id and increment chr2 variant_contig indexes by chr1.contigs.size
new_ds_dict['contig_id'] = xr.concat([chr1.contig_id, chr2.contig_id], dim='contigs')
new_ds_dict['variant_contig'] = xr.concat([chr1.variant_contig, chr2.variant_contig + chr1.contigs.size], dim='variants')
# Concatenate remaining variant data variables
data_vars_variants = [
'call_genotype',
'call_genotype_mask',
'variant_allele',
'variant_id',
'variant_position',
]
for dv in data_vars_variants:
new_ds_dict[dv] = xr.concat([chr1[dv], chr2[dv]], dim='variants')
# Copy over sample data variables from chr1
data_vars_samples = [
'sample_family_id',
'sample_id',
'sample_maternal_id',
'sample_member_id',
'sample_paternal_id',
'sample_phenotype',
'sample_sex',
]
for dv in data_vars_samples:
new_ds_dict[dv] = chr1[dv]
return xr.Dataset(data_vars=new_ds_dict) |
Operations I find myself doing regularly
We have a TODO in the docs for Adding custom data to a Dataset so the first 2 solutions should probably go there as well.
The text was updated successfully, but these errors were encountered: