What if two pdbs of different atom numbers are to be comapred and rmsd is to be calculated? #288

ssunidhi · 2021-02-23T05:58:54Z

ssunidhi
Feb 23, 2021

If I try to get rmsd of two proteins with different number of atoms, it throws an error like this:

The mobile array (8633 atoms) and the fixed array (1023 atoms), have an unequal amount of atoms

Is there a way to overcome the error?

Feb 23, 2021

The problem is that superimpose() needs to unambiguously know, which atoms to superimpose onto each other, which does not work for different lengths of atom arrays. How to solve this issue depends on the similarity of both structures:

If both structures are conformations of the same protein, but one misses side chains, some residues, etc. , you can filter the atoms that appear in both structures via biotite.structure.filter_intersection(). An example of this can be found in https://www.biotite-python.org/examples/gallery/structure/ku_superimposition.html.

If however your proteins are merely homologous to each other, it is not that simple. In this case my approach would be a sequence align…

View full answer

padix-key · 2021-02-23T09:55:47Z

padix-key
Feb 23, 2021
Maintainer

The problem is that superimpose() needs to unambiguously know, which atoms to superimpose onto each other, which does not work for different lengths of atom arrays. How to solve this issue depends on the similarity of both structures:

If both structures are conformations of the same protein, but one misses side chains, some residues, etc. , you can filter the atoms that appear in both structures via biotite.structure.filter_intersection(). An example of this can be found in https://www.biotite-python.org/examples/gallery/structure/ku_superimposition.html.

If however your proteins are merely homologous to each other, it is not that simple. In this case my approach would be a sequence alignment of both protein sequences via biotite.sequence.align_optimal(). Then I would select from both structures those CA atoms, where the alignment has no gap at that position. These atoms can then be superimposed.

Example:

If the alignment looks like the following

V-W
ISF

the selected CA atoms would come from V and Wfor the first structure and I and F from the second structure.

Here is a small example script that demonstrates this idea for the superimposition of streptavidin onto avidin:

import numpy as np
import biotite.structure as struc
import biotite.sequence.align as align
import biotite.structure.io.pdbx as pdbx
import biotite.database.rcsb as rcsb


avidin_file = pdbx.PDBxFile.read(rcsb.fetch("1VYO", "pdbx", "."))
strept_file = pdbx.PDBxFile.read(rcsb.fetch("3RY2", "pdbx", "."))
# 'use_author_fields=False' is important here, in order to ensure
# that the residue ID corresponds to the sequence position
avidin = pdbx.get_structure(avidin_file, model=1, use_author_fields=False)
strept = pdbx.get_structure(strept_file, model=1, use_author_fields=False)
avidin = avidin[(avidin.chain_id == "A") & struc.filter_amino_acids(avidin)]
strept = strept[(strept.chain_id == "A") & struc.filter_amino_acids(strept)]
avidin_seq = pdbx.get_sequence(avidin_file)[0]
strept_seq = pdbx.get_sequence(strept_file)[0]

matrix = align.SubstitutionMatrix.std_protein_matrix()
alignment = align.align_optimal(
    avidin_seq, strept_seq, matrix, gap_penalty=(-10, -1), max_number=1
)[0]
print("Alignment:")
print(alignment)
print()
print("Trace:")
print(alignment.trace)

trace_without_gaps = alignment.trace[(alignment.trace != -1).all(axis=1)]
# The residue ID is the sequence position + 1
select_res_ids = trace_without_gaps + 1
# Remove columns where the residue ID misses in either structure
select_res_ids = select_res_ids[np.isin(select_res_ids[:,0], avidin.res_id)]
select_res_ids = select_res_ids[np.isin(select_res_ids[:,1], strept.res_id)]
# Select related CA atoms 
avidin_ca = avidin[
    (avidin.atom_name == "CA") & np.isin(avidin.res_id, select_res_ids[:,0])
]
strept_ca = strept[
    (strept.atom_name == "CA") & np.isin(strept.res_id, select_res_ids[:,1])
]

_, transformation = struc.superimpose(avidin_ca, strept_ca)
# Apply rotation/translation from superimposition to original structures
strept = struc.superimpose_apply(strept, transformation)

Output:

Alignment:
ARKCSLTGKWTNDLGSNMTIGAVNSRGEFTGTYITAV-TATSNEIKESPLHGTQNTINKRTQPTFGFTVN
A-EAGITGTWYNQLGSTFIVTA-GADGALTGTYESAVGNAESRYVLTGRYDSAPATDGSGT--ALGWTVA

WK----FSESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVNDIGDDWKATRVGINIFTRLR-TQKE
WKNNYRNAHSATTWSGQ-YV--GGAEARINTQWLLTSGTTE-ANAWKSTLVGHDTFTKVKPSAAS

Trace:
[[  0   0]
 [  1  -1]
 [  2   1]
 [  3   2]
 [  4   3]
 [  5   4]
 [  6   5]
 [  7   6]
 [  8   7]
 [  9   8]
 [ 10   9]
 [ 11  10]
 [ 12  11]
 [ 13  12]
 [ 14  13]
 [ 15  14]
 [ 16  15]
 [ 17  16]
 [ 18  17]
 [ 19  18]
 [ 20  19]
 [ 21  20]
 [ 22  -1]
 [ 23  21]
 [ 24  22]
 [ 25  23]
 [ 26  24]
 [ 27  25]
 [ 28  26]
 [ 29  27]
 [ 30  28]
 [ 31  29]
 [ 32  30]
 [ 33  31]
 [ 34  32]
 [ 35  33]
 [ 36  34]
 [ -1  35]
 [ 37  36]
 [ 38  37]
 [ 39  38]
 [ 40  39]
 [ 41  40]
 [ 42  41]
 [ 43  42]
 [ 44  43]
 [ 45  44]
 [ 46  45]
 [ 47  46]
 [ 48  47]
 [ 49  48]
 [ 50  49]
 [ 51  50]
 [ 52  51]
 [ 53  52]
 [ 54  53]
 [ 55  54]
 [ 56  55]
 [ 57  56]
 [ 58  57]
 [ 59  58]
 [ 60  -1]
 [ 61  -1]
 [ 62  59]
 [ 63  60]
 [ 64  61]
 [ 65  62]
 [ 66  63]
 [ 67  64]
 [ 68  65]
 [ 69  66]
 [ 70  67]
 [ -1  68]
 [ -1  69]
 [ -1  70]
 [ -1  71]
 [ 71  72]
 [ 72  73]
 [ 73  74]
 [ 74  75]
 [ 75  76]
 [ 76  77]
 [ 77  78]
 [ 78  79]
 [ 79  80]
 [ 80  81]
 [ 81  82]
 [ 82  -1]
 [ 83  83]
 [ 84  84]
 [ 85  -1]
 [ 86  -1]
 [ 87  85]
 [ 88  86]
 [ 89  87]
 [ 90  88]
 [ 91  89]
 [ -1  90]
 [ 92  91]
 [ 93  92]
 [ 94  93]
 [ 95  94]
 [ 96  95]
 [ 97  96]
 [ 98  97]
 [ 99  98]
 [100  99]
 [101 100]
 [102 101]
 [103 102]
 [104 103]
 [105  -1]
 [106 104]
 [107 105]
 [108 106]
 [109 107]
 [110 108]
 [111 109]
 [112 110]
 [113 111]
 [114 112]
 [115 113]
 [116 114]
 [117 115]
 [118 116]
 [119 117]
 [120 118]
 [121 119]
 [122 120]
 [123 121]
 [ -1 122]
 [124 123]
 [125 124]
 [126 125]
 [127 126]]

An even more sophisticated approach would be a structure alignment instead (https://www.biotite-python.org/examples/gallery/structure/pb_alignment.html), but this is probably to much effort for your use case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if two pdbs of different atom numbers are to be comapred and rmsd is to be calculated? #288

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What if two pdbs of different atom numbers are to be comapred and rmsd is to be calculated? #288

ssunidhi Feb 23, 2021

Replies: 1 comment

padix-key Feb 23, 2021 Maintainer

ssunidhi
Feb 23, 2021

padix-key
Feb 23, 2021
Maintainer