Tutorial 3: Evaluation

[1]:
import homelette as hm

Introduction

Welcome to the third tutorial for homelette. In this tutorial, we will explore which evaluation metrics are implemented in homelette and how to use them.

Model evaluation is an important step in any homology modelling procedure. In most practical scenarios, you will end up with more than one possible model and have to decide which one is “best”. Obtaining multiple models can be the result of trying out different templates or combinations of templates, different algorithms generating the models, or due to using an algorithm which can generate multiple models.

The following evaluation metrics are implemented in homelette:

  • evaluation.Evaluation_dope: DOPE score from modeller [1]

  • evaluation.Evaluation_soap_protein: SOAP score from modeller for the evaluation of single proteins [2]

  • evaluation.Evaluation_soap_pp: SOAP score from modeller for the evaluation of protein complexes [2]

  • evaluation.Evaluation_qmean4: QMEAN4 score [3,4]

  • evaluation.Evaluation_qmean6: QMEAN6 score [3,4]

  • evaluation.Evaluation_qmeandisco: QMEAN DisCo score [3,4,5]

  • evaluation.Evaluation_mol_probity: MolProbity score for the structural evaluation of proteins [6,7,8]

All files necessary for running this tutorial are already prepared and deposited in the following directory: homelette/example/data/. If you execute this tutorial from homelette/example/, you don’t have to adapt any of the paths.

homelette comes with an extensive documentation. You can either check out our online documentation, compile a local version of the documentation in homelette/docs/ with sphinx or use the help() function in Python.

Model Generation

In order to have a few models to evaluate, we will briefly generate some models of ARAF as we have done in previous tutorials (please check Tutorial 1 and Tutorial 2 for more information on this part).

[2]:
# get alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')

# annotate the alignment
aln.get_sequence('ARAF').annotate(
    seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(
    seq_type = 'structure',
    pdb_code = '3NY5',
    begin_res = '1',
    begin_chain = 'A',
    end_res = '81',
    end_chain = 'A')

# initialize task object
t = hm.Task(
    task_name = 'Tutorial3',
    target = 'ARAF',
    alignment = aln,
    overwrite = True)

# generate models with modeller
t.execute_routine(
    tag = 'modeller',
    routine = hm.routines.Routine_automodel_default,
    templates = ['3NY5'],
    template_location = './data/single',
    n_models = 5)

# generate models with altmod
t.execute_routine(
    tag = 'altmod',
    routine = hm.routines.Routine_altmod_default,
    templates = ['3NY5'],
    template_location = './data/single',
    n_models = 5)

We now have generated 10 models, 5 generated with modeller and another 5 generated with altmod.

Model Evaluation using evaluation

Similar to routines, evaluations can be executed on their own, although it is recommended to use an interface through the Task object (see next section). For showcasing how an evaluation can be executed on its own, we will take one of the previously generated models as an example:

[3]:
# example model
model = t.models[0]
model
[3]:
<homelette.organization.Model at 0x7f7f681ae250>

Every Model object has an Model.evaluation attribute where information about the model and its evaluations are collected:

[4]:
model.evaluation
[4]:
{'model': 'modeller_1.pdb', 'tag': 'modeller', 'routine': 'automodel_default'}

After performing an evaluation, this dictionary will be updated with the results of the evaluation:

[5]:
hm.evaluation.Evaluation_dope(model, quiet=True)
model.evaluation
[5]:
{'model': 'modeller_1.pdb',
 'tag': 'modeller',
 'routine': 'automodel_default',
 'dope': -7216.8564453125,
 'dope_z_score': -1.5211129532811163}

The interface to evaluations is relatively simple:

[6]:
?hm.evaluation.Evaluation_dope
Init signature:
hm.evaluation.Evaluation_dope(
    model: Type[ForwardRef('Model')],
    quiet: bool = False,
) -> None
Docstring:
Class for evaluating a model with DOPE score.

Will dump the following entries to the model.evaluation dictionary:

* dope
* dope_z_score

Parameters
----------
model : Model
    The model object to evaluate
quiet : bool
    If True, will perform evaluation with suppressing stdout (default
    False). Needs to be False for running it asynchronously, as done
    when running Task.evaluate_models with multple cores

Attributes
----------
model : Model
    The model object to evaluate
output : dict
    Dictionary that all outputs will be dumped into

Raises
------
ImportError
    Unable to import dependencies

Notes
-----
DOPE is a staticial potential for the evaluation of homology models [1]_.
For further information, please check the modeller documentation or the
associated publication.

References
----------
.. [1] Shen, M., & Sali, A. (2006). Statistical potential for assessment
   and prediction of protein structures. Protein Science, 15(11),
   2507–2524. https://doi.org/10.1110/ps.062416606
File:           /usr/local/src/homelette-1.4/homelette/evaluation.py
Type:           type
Subclasses:

Evaluations take only two arguments: - model: A Model object - quiet: A boolean value determining whether any output to the console should be suppressed.

Unlike routines, evaluations are executed as soon as the object is initialized.

Model Evaluation using Task and evaluation

Using the interface to evaluations that is implemented in Task objects has several advantages: it is possible to evaluate multiple models with multiple evaluation metrics in one command. In addition, multi-threading can be enabled (see Tutorial 5 for more details). The method to run evaluations with a Task object is called evaluate_models.

[7]:
?hm.Task.evaluate_models
Signature:
hm.Task.evaluate_models(
    self,
    *args: Type[ForwardRef('evaluation.Evaluation')],
    n_threads: int = 1,
) -> None
Docstring:
Evaluates models using one or multiple evaluation metrics

Parameters
----------
*args: Evaluation
    Evaluation objects that will be applied to the models
n_threads : int, optional
    Number of threads used for model evaluation (default is 1, which
    deactivates parallelization)

Returns
-------
None
File:      /usr/local/src/homelette-1.4/homelette/organization.py
Type:      function
[8]:
# running dope and soap at the same time
t.evaluate_models(hm.evaluation.Evaluation_dope,
                  hm.evaluation.Evaluation_soap_protein)

After running evaluations, output of all Model.evaluation can be compiled to a pandas data frame as such:

[9]:
t.get_evaluation()
[9]:
model tag routine dope dope_z_score soap_protein
0 modeller_1.pdb modeller automodel_default -7216.856445 -1.521113 -44167.968750
1 modeller_2.pdb modeller automodel_default -7274.457520 -1.576995 -45681.269531
2 modeller_3.pdb modeller automodel_default -7126.735352 -1.433681 -43398.992188
3 modeller_4.pdb modeller automodel_default -7225.522461 -1.529520 -42942.808594
4 modeller_5.pdb modeller automodel_default -7128.661621 -1.435550 -41418.894531
5 altmod_1.pdb altmod altmod_default -8148.456055 -2.424912 -53440.839844
6 altmod_2.pdb altmod altmod_default -8187.364258 -2.462659 -49991.304688
7 altmod_3.pdb altmod altmod_default -8202.568359 -2.477409 -53909.824219
8 altmod_4.pdb altmod altmod_default -8170.016602 -2.445829 -52208.402344
9 altmod_5.pdb altmod altmod_default -8145.944336 -2.422475 -50776.855469

On the combination of different evaluation metrics

Oftentimes it is useful to use different metrics to evaluate models. However, that produces the problem of having multiple metrics to base a decision on. There are multiple solutions to this problem, all of them with their own advantages and disadvantes. We want to mention the combination of z-scores of the different metrics and the combination of metrics by borda count.

In the following, we show how to combine multiple scores to one borda score. In short, borda count is an agglomeration of ranks in the different individual metrics to one score.

Note

Be careful because, for some metrics, lower values are better (DOPE, SOAP, MolProbity), but for others higher values are better (QMEAN).

[10]:
df = t.get_evaluation()
df = df.drop(labels=['routine', 'tag'], axis=1)

# rank by dope and soap
df['rank_dope'] = df['dope'].rank()
df['rank_soap'] = df['soap_protein'].rank()

# calculate points based on rank
n = df.shape[0]
df['points_dope'] = n - df['rank_dope']
df['points_soap'] = n - df['rank_soap']

df
[10]:
model dope dope_z_score soap_protein rank_dope rank_soap points_dope points_soap
0 modeller_1.pdb -7216.856445 -1.521113 -44167.968750 8.0 7.0 2.0 3.0
1 modeller_2.pdb -7274.457520 -1.576995 -45681.269531 6.0 6.0 4.0 4.0
2 modeller_3.pdb -7126.735352 -1.433681 -43398.992188 10.0 8.0 0.0 2.0
3 modeller_4.pdb -7225.522461 -1.529520 -42942.808594 7.0 9.0 3.0 1.0
4 modeller_5.pdb -7128.661621 -1.435550 -41418.894531 9.0 10.0 1.0 0.0
5 altmod_1.pdb -8148.456055 -2.424912 -53440.839844 4.0 2.0 6.0 8.0
6 altmod_2.pdb -8187.364258 -2.462659 -49991.304688 2.0 5.0 8.0 5.0
7 altmod_3.pdb -8202.568359 -2.477409 -53909.824219 1.0 1.0 9.0 9.0
8 altmod_4.pdb -8170.016602 -2.445829 -52208.402344 3.0 3.0 7.0 7.0
9 altmod_5.pdb -8145.944336 -2.422475 -50776.855469 5.0 4.0 5.0 6.0
[11]:
# calculate borda score and borda rank
df['borda_score'] = df['points_dope'] + df['points_soap']
df['borda_rank'] = df['borda_score'].rank(ascending=False)

df = df.drop(labels=['rank_dope', 'rank_soap', 'points_dope', 'points_soap'], axis=1)
df.sort_values(by='borda_rank')
[11]:
model dope dope_z_score soap_protein borda_score borda_rank
7 altmod_3.pdb -8202.568359 -2.477409 -53909.824219 18.0 1.0
5 altmod_1.pdb -8148.456055 -2.424912 -53440.839844 14.0 2.5
8 altmod_4.pdb -8170.016602 -2.445829 -52208.402344 14.0 2.5
6 altmod_2.pdb -8187.364258 -2.462659 -49991.304688 13.0 4.0
9 altmod_5.pdb -8145.944336 -2.422475 -50776.855469 11.0 5.0
1 modeller_2.pdb -7274.457520 -1.576995 -45681.269531 8.0 6.0
0 modeller_1.pdb -7216.856445 -1.521113 -44167.968750 5.0 7.0
3 modeller_4.pdb -7225.522461 -1.529520 -42942.808594 4.0 8.0
2 modeller_3.pdb -7126.735352 -1.433681 -43398.992188 2.0 9.0
4 modeller_5.pdb -7128.661621 -1.435550 -41418.894531 1.0 10.0

The model with the highest borda score or the lowest borda count is the best model according to the combination of DOPE and SOAP scores.

Further reading

You are now familiar with using the implemented evaluation features of homelette. For further reading, please consider checking out the other tutorials:

  • Tutorial 1: Learn about the basics of homelette.

  • Tutorial 2: Learn more about already implemented routines for homology modelling.

  • Tutorial 4: Learn about extending homelette’s functionality by defining your own modelling routines and evaluation metrics.

  • Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.

  • Tutorial 6: Learn about modelling protein complexes.

  • Tutorial 7: Learn about assembling custom pipelines.

  • Tutorial 8: Learn about automated template identification, alignment generation and template processing.

References

[1] Shen, M., & Sali, A. (2006). Statistical potential for assessment and prediction of protein structures. Protein Science, 15(11), 2507–2524. https://doi.org/10.1110/ps.062416606

[2] Dong, G. Q., Fan, H., Schneidman-Duhovny, D., Webb, B., Sali, A., & Tramontano, A. (2013). Optimized atomic statistical potentials: Assessment of protein interfaces and loops. Bioinformatics, 29(24), 3158–3166. https://doi.org/10.1093/bioinformatics/btt560

[3] Benkert, P., Tosatto, S. C. E., & Schomburg, D. (2008). QMEAN: A comprehensive scoring function for model quality assessment. Proteins: Structure, Function and Genetics, 71(1), 261–277. https://doi.org/10.1002/prot.21715

[4] Benkert, P., Biasini, M., & Schwede, T. (2011). Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics, 27(3), 343–350. https://doi.org/10.1093/bioinformatics/btq662

[5] Studer, G., Rempfer, C., Waterhouse, A. M., Gumienny, R., Haas, J., & Schwede, T. (2020). QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics, 36(6), 1765–1771. https://doi.org/10.1093/bioinformatics/btz828

[6] Davis, I. W., Leaver-Fay, A., Chen, V. B., Block, J. N., Kapral, G. J., Wang, X., Murray, L. W., Arendall, W. B., Snoeyink, J., Richardson, J. S., & Richardson, D. C. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Research, 35(suppl_2), W375–W383. https://doi.org/10.1093/NAR/GKM216

[7] Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S., & Richardson, D. C. (2010). MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallographica Section D: Biological Crystallography, 66(1), 12–21. https://doi.org/10.1107/S0907444909042073

[8] Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B., Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S., & Richardson, D. C. (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein Science, 27(1), 293–315. https://doi.org/10.1002/pro.3330

Session Info

[12]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette           1.4
pandas              1.5.3
session_info        1.0.0
-----
PIL                 7.0.0
altmod              NA
anyio               NA
asttokens           NA
attr                19.3.0
babel               2.12.1
backcall            0.2.0
certifi             2022.12.07
chardet             3.0.4
charset_normalizer  3.1.0
comm                0.1.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.6
decorator           4.4.2
executing           1.2.0
fastjsonschema      NA
idna                3.4
importlib_metadata  NA
importlib_resources NA
ipykernel           6.21.3
ipython_genutils    0.2.0
jedi                0.18.2
jinja2              3.1.2
json5               NA
jsonschema          4.17.3
jupyter_events      0.6.3
jupyter_server      2.4.0
jupyterlab_server   2.20.0
kiwisolver          1.0.1
markupsafe          2.1.2
matplotlib          3.1.2
modeller            10.4
more_itertools      NA
mpl_toolkits        NA
nbformat            5.7.3
numexpr             2.8.4
numpy               1.24.2
ost                 2.3.1
packaging           20.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.1.1
prometheus_client   NA
promod3             3.2.1
prompt_toolkit      3.0.38
psutil              5.5.1
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.14.0
pyparsing           2.4.6
pyrsistent          NA
pythonjsonlogger    NA
pytz                2022.7.1
qmean               NA
requests            2.28.2
rfc3339_validator   0.1.4
rfc3986_validator   0.1.1
send2trash          NA
sitecustomize       NA
six                 1.12.0
sniffio             1.3.0
stack_data          0.6.2
swig_runtime_data4  NA
tornado             6.2
traitlets           5.9.0
urllib3             1.26.15
wcwidth             NA
websocket           1.5.1
yaml                6.0
zipp                NA
zmq                 25.0.1
-----
IPython             8.11.0
jupyter_client      8.0.3
jupyter_core        5.2.0
jupyterlab          3.6.1
notebook            6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:37