-
If I try to get rmsd of two proteins with different number of atoms, it throws an error like this:
Is there a way to overcome the error? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The problem is that If both structures are conformations of the same protein, but one misses side chains, some residues, etc. , you can filter the atoms that appear in both structures via If however your proteins are merely homologous to each other, it is not that simple. In this case my approach would be a sequence alignment of both protein sequences via Example: If the alignment looks like the following
the selected CA atoms would come from Here is a small example script that demonstrates this idea for the superimposition of streptavidin onto avidin: import numpy as np
import biotite.structure as struc
import biotite.sequence.align as align
import biotite.structure.io.pdbx as pdbx
import biotite.database.rcsb as rcsb
avidin_file = pdbx.PDBxFile.read(rcsb.fetch("1VYO", "pdbx", "."))
strept_file = pdbx.PDBxFile.read(rcsb.fetch("3RY2", "pdbx", "."))
# 'use_author_fields=False' is important here, in order to ensure
# that the residue ID corresponds to the sequence position
avidin = pdbx.get_structure(avidin_file, model=1, use_author_fields=False)
strept = pdbx.get_structure(strept_file, model=1, use_author_fields=False)
avidin = avidin[(avidin.chain_id == "A") & struc.filter_amino_acids(avidin)]
strept = strept[(strept.chain_id == "A") & struc.filter_amino_acids(strept)]
avidin_seq = pdbx.get_sequence(avidin_file)[0]
strept_seq = pdbx.get_sequence(strept_file)[0]
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignment = align.align_optimal(
avidin_seq, strept_seq, matrix, gap_penalty=(-10, -1), max_number=1
)[0]
print("Alignment:")
print(alignment)
print()
print("Trace:")
print(alignment.trace)
trace_without_gaps = alignment.trace[(alignment.trace != -1).all(axis=1)]
# The residue ID is the sequence position + 1
select_res_ids = trace_without_gaps + 1
# Remove columns where the residue ID misses in either structure
select_res_ids = select_res_ids[np.isin(select_res_ids[:,0], avidin.res_id)]
select_res_ids = select_res_ids[np.isin(select_res_ids[:,1], strept.res_id)]
# Select related CA atoms
avidin_ca = avidin[
(avidin.atom_name == "CA") & np.isin(avidin.res_id, select_res_ids[:,0])
]
strept_ca = strept[
(strept.atom_name == "CA") & np.isin(strept.res_id, select_res_ids[:,1])
]
_, transformation = struc.superimpose(avidin_ca, strept_ca)
# Apply rotation/translation from superimposition to original structures
strept = struc.superimpose_apply(strept, transformation) Output:
An even more sophisticated approach would be a structure alignment instead (https://www.biotite-python.org/examples/gallery/structure/pb_alignment.html), but this is probably to much effort for your use case. |
Beta Was this translation helpful? Give feedback.
The problem is that
superimpose()
needs to unambiguously know, which atoms to superimpose onto each other, which does not work for different lengths of atom arrays. How to solve this issue depends on the similarity of both structures:If both structures are conformations of the same protein, but one misses side chains, some residues, etc. , you can filter the atoms that appear in both structures via
biotite.structure.filter_intersection()
. An example of this can be found in https://www.biotite-python.org/examples/gallery/structure/ku_superimposition.html.If however your proteins are merely homologous to each other, it is not that simple. In this case my approach would be a sequence align…