Tutorial 3: Evaluation
[1]:
import homelette as hm
Introduction
Welcome to the third tutorial for homelette
. In this tutorial, we will explore which evaluation metrics are implemented in homelette
and how to use them.
Model evaluation is an important step in any homology modelling procedure. In most practical scenarios, you will end up with more than one possible model and have to decide which one is “best”. Obtaining multiple models can be the result of trying out different templates or combinations of templates, different algorithms generating the models, or due to using an algorithm which can generate multiple models.
The following evaluation metrics are implemented in homelette
:
evaluation.Evaluation_dope
: DOPE score frommodeller
[1]evaluation.Evaluation_soap_protein
: SOAP score frommodeller
for the evaluation of single proteins [2]evaluation.Evaluation_soap_pp
: SOAP score frommodeller
for the evaluation of protein complexes [2]evaluation.Evaluation_qmean4
: QMEAN4 score [3,4]evaluation.Evaluation_qmean6
: QMEAN6 score [3,4]evaluation.Evaluation_qmeandisco
: QMEAN DisCo score [3,4,5]evaluation.Evaluation_mol_probity
: MolProbity score for the structural evaluation of proteins [6,7,8]
All files necessary for running this tutorial are already prepared and deposited in the following directory: homelette/example/data/
. If you execute this tutorial from homelette/example/
, you don’t have to adapt any of the paths.
homelette
comes with an extensive documentation. You can either check out our online documentation, compile a local version of the documentation in homelette/docs/
with sphinx
or use the help()
function in Python.
Model Generation
In order to have a few models to evaluate, we will briefly generate some models of ARAF as we have done in previous tutorials (please check Tutorial 1 and Tutorial 2 for more information on this part).
[2]:
# get alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')
# annotate the alignment
aln.get_sequence('ARAF').annotate(
seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(
seq_type = 'structure',
pdb_code = '3NY5',
begin_res = '1',
begin_chain = 'A',
end_res = '81',
end_chain = 'A')
# initialize task object
t = hm.Task(
task_name = 'Tutorial3',
target = 'ARAF',
alignment = aln,
overwrite = True)
# generate models with modeller
t.execute_routine(
tag = 'modeller',
routine = hm.routines.Routine_automodel_default,
templates = ['3NY5'],
template_location = './data/single',
n_models = 5)
# generate models with altmod
t.execute_routine(
tag = 'altmod',
routine = hm.routines.Routine_altmod_default,
templates = ['3NY5'],
template_location = './data/single',
n_models = 5)
We now have generated 10 models, 5 generated with modeller
and another 5 generated with altmod
.
Model Evaluation using evaluation
Similar to routines, evaluations can be executed on their own, although it is recommended to use an interface through the Task
object (see next section). For showcasing how an evaluation can be executed on its own, we will take one of the previously generated models as an example:
[3]:
# example model
model = t.models[0]
model
[3]:
<homelette.organization.Model at 0x7f7f681ae250>
Every Model
object has an Model.evaluation
attribute where information about the model and its evaluations are collected:
[4]:
model.evaluation
[4]:
{'model': 'modeller_1.pdb', 'tag': 'modeller', 'routine': 'automodel_default'}
After performing an evaluation, this dictionary will be updated with the results of the evaluation:
[5]:
hm.evaluation.Evaluation_dope(model, quiet=True)
model.evaluation
[5]:
{'model': 'modeller_1.pdb',
'tag': 'modeller',
'routine': 'automodel_default',
'dope': -7216.8564453125,
'dope_z_score': -1.5211129532811163}
The interface to evaluations is relatively simple:
[6]:
?hm.evaluation.Evaluation_dope
Init signature:
hm.evaluation.Evaluation_dope(
model: Type[ForwardRef('Model')],
quiet: bool = False,
) -> None
Docstring:
Class for evaluating a model with DOPE score.
Will dump the following entries to the model.evaluation dictionary:
* dope
* dope_z_score
Parameters
----------
model : Model
The model object to evaluate
quiet : bool
If True, will perform evaluation with suppressing stdout (default
False). Needs to be False for running it asynchronously, as done
when running Task.evaluate_models with multple cores
Attributes
----------
model : Model
The model object to evaluate
output : dict
Dictionary that all outputs will be dumped into
Raises
------
ImportError
Unable to import dependencies
Notes
-----
DOPE is a staticial potential for the evaluation of homology models [1]_.
For further information, please check the modeller documentation or the
associated publication.
References
----------
.. [1] Shen, M., & Sali, A. (2006). Statistical potential for assessment
and prediction of protein structures. Protein Science, 15(11),
2507–2524. https://doi.org/10.1110/ps.062416606
File: /usr/local/src/homelette-1.4/homelette/evaluation.py
Type: type
Subclasses:
Evaluations take only two arguments: - model
: A Model
object - quiet
: A boolean value determining whether any output to the console should be suppressed.
Unlike routines, evaluations are executed as soon as the object is initialized.
Model Evaluation using Task
and evaluation
Using the interface to evaluations that is implemented in Task
objects has several advantages: it is possible to evaluate multiple models with multiple evaluation metrics in one command. In addition, multi-threading can be enabled (see Tutorial 5 for more details). The method to run evaluations with a Task
object is called evaluate_models
.
[7]:
?hm.Task.evaluate_models
Signature:
hm.Task.evaluate_models(
self,
*args: Type[ForwardRef('evaluation.Evaluation')],
n_threads: int = 1,
) -> None
Docstring:
Evaluates models using one or multiple evaluation metrics
Parameters
----------
*args: Evaluation
Evaluation objects that will be applied to the models
n_threads : int, optional
Number of threads used for model evaluation (default is 1, which
deactivates parallelization)
Returns
-------
None
File: /usr/local/src/homelette-1.4/homelette/organization.py
Type: function
[8]:
# running dope and soap at the same time
t.evaluate_models(hm.evaluation.Evaluation_dope,
hm.evaluation.Evaluation_soap_protein)
After running evaluations, output of all Model.evaluation
can be compiled to a pandas
data frame as such:
[9]:
t.get_evaluation()
[9]:
model | tag | routine | dope | dope_z_score | soap_protein | |
---|---|---|---|---|---|---|
0 | modeller_1.pdb | modeller | automodel_default | -7216.856445 | -1.521113 | -44167.968750 |
1 | modeller_2.pdb | modeller | automodel_default | -7274.457520 | -1.576995 | -45681.269531 |
2 | modeller_3.pdb | modeller | automodel_default | -7126.735352 | -1.433681 | -43398.992188 |
3 | modeller_4.pdb | modeller | automodel_default | -7225.522461 | -1.529520 | -42942.808594 |
4 | modeller_5.pdb | modeller | automodel_default | -7128.661621 | -1.435550 | -41418.894531 |
5 | altmod_1.pdb | altmod | altmod_default | -8148.456055 | -2.424912 | -53440.839844 |
6 | altmod_2.pdb | altmod | altmod_default | -8187.364258 | -2.462659 | -49991.304688 |
7 | altmod_3.pdb | altmod | altmod_default | -8202.568359 | -2.477409 | -53909.824219 |
8 | altmod_4.pdb | altmod | altmod_default | -8170.016602 | -2.445829 | -52208.402344 |
9 | altmod_5.pdb | altmod | altmod_default | -8145.944336 | -2.422475 | -50776.855469 |
On the combination of different evaluation metrics
Oftentimes it is useful to use different metrics to evaluate models. However, that produces the problem of having multiple metrics to base a decision on. There are multiple solutions to this problem, all of them with their own advantages and disadvantes. We want to mention the combination of z-scores of the different metrics and the combination of metrics by borda count.
In the following, we show how to combine multiple scores to one borda score. In short, borda count is an agglomeration of ranks in the different individual metrics to one score.
Note
Be careful because, for some metrics, lower values are better (DOPE, SOAP, MolProbity), but for others higher values are better (QMEAN).
[10]:
df = t.get_evaluation()
df = df.drop(labels=['routine', 'tag'], axis=1)
# rank by dope and soap
df['rank_dope'] = df['dope'].rank()
df['rank_soap'] = df['soap_protein'].rank()
# calculate points based on rank
n = df.shape[0]
df['points_dope'] = n - df['rank_dope']
df['points_soap'] = n - df['rank_soap']
df
[10]:
model | dope | dope_z_score | soap_protein | rank_dope | rank_soap | points_dope | points_soap | |
---|---|---|---|---|---|---|---|---|
0 | modeller_1.pdb | -7216.856445 | -1.521113 | -44167.968750 | 8.0 | 7.0 | 2.0 | 3.0 |
1 | modeller_2.pdb | -7274.457520 | -1.576995 | -45681.269531 | 6.0 | 6.0 | 4.0 | 4.0 |
2 | modeller_3.pdb | -7126.735352 | -1.433681 | -43398.992188 | 10.0 | 8.0 | 0.0 | 2.0 |
3 | modeller_4.pdb | -7225.522461 | -1.529520 | -42942.808594 | 7.0 | 9.0 | 3.0 | 1.0 |
4 | modeller_5.pdb | -7128.661621 | -1.435550 | -41418.894531 | 9.0 | 10.0 | 1.0 | 0.0 |
5 | altmod_1.pdb | -8148.456055 | -2.424912 | -53440.839844 | 4.0 | 2.0 | 6.0 | 8.0 |
6 | altmod_2.pdb | -8187.364258 | -2.462659 | -49991.304688 | 2.0 | 5.0 | 8.0 | 5.0 |
7 | altmod_3.pdb | -8202.568359 | -2.477409 | -53909.824219 | 1.0 | 1.0 | 9.0 | 9.0 |
8 | altmod_4.pdb | -8170.016602 | -2.445829 | -52208.402344 | 3.0 | 3.0 | 7.0 | 7.0 |
9 | altmod_5.pdb | -8145.944336 | -2.422475 | -50776.855469 | 5.0 | 4.0 | 5.0 | 6.0 |
[11]:
# calculate borda score and borda rank
df['borda_score'] = df['points_dope'] + df['points_soap']
df['borda_rank'] = df['borda_score'].rank(ascending=False)
df = df.drop(labels=['rank_dope', 'rank_soap', 'points_dope', 'points_soap'], axis=1)
df.sort_values(by='borda_rank')
[11]:
model | dope | dope_z_score | soap_protein | borda_score | borda_rank | |
---|---|---|---|---|---|---|
7 | altmod_3.pdb | -8202.568359 | -2.477409 | -53909.824219 | 18.0 | 1.0 |
5 | altmod_1.pdb | -8148.456055 | -2.424912 | -53440.839844 | 14.0 | 2.5 |
8 | altmod_4.pdb | -8170.016602 | -2.445829 | -52208.402344 | 14.0 | 2.5 |
6 | altmod_2.pdb | -8187.364258 | -2.462659 | -49991.304688 | 13.0 | 4.0 |
9 | altmod_5.pdb | -8145.944336 | -2.422475 | -50776.855469 | 11.0 | 5.0 |
1 | modeller_2.pdb | -7274.457520 | -1.576995 | -45681.269531 | 8.0 | 6.0 |
0 | modeller_1.pdb | -7216.856445 | -1.521113 | -44167.968750 | 5.0 | 7.0 |
3 | modeller_4.pdb | -7225.522461 | -1.529520 | -42942.808594 | 4.0 | 8.0 |
2 | modeller_3.pdb | -7126.735352 | -1.433681 | -43398.992188 | 2.0 | 9.0 |
4 | modeller_5.pdb | -7128.661621 | -1.435550 | -41418.894531 | 1.0 | 10.0 |
The model with the highest borda score or the lowest borda count is the best model according to the combination of DOPE and SOAP scores.
Further reading
You are now familiar with using the implemented evaluation features of homelette
. For further reading, please consider checking out the other tutorials:
Tutorial 1: Learn about the basics of
homelette
.Tutorial 2: Learn more about already implemented routines for homology modelling.
Tutorial 4: Learn about extending
homelette
’s functionality by defining your own modelling routines and evaluation metrics.Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.
Tutorial 6: Learn about modelling protein complexes.
Tutorial 7: Learn about assembling custom pipelines.
Tutorial 8: Learn about automated template identification, alignment generation and template processing.
References
[1] Shen, M., & Sali, A. (2006). Statistical potential for assessment and prediction of protein structures. Protein Science, 15(11), 2507–2524. https://doi.org/10.1110/ps.062416606
[2] Dong, G. Q., Fan, H., Schneidman-Duhovny, D., Webb, B., Sali, A., & Tramontano, A. (2013). Optimized atomic statistical potentials: Assessment of protein interfaces and loops. Bioinformatics, 29(24), 3158–3166. https://doi.org/10.1093/bioinformatics/btt560
[3] Benkert, P., Tosatto, S. C. E., & Schomburg, D. (2008). QMEAN: A comprehensive scoring function for model quality assessment. Proteins: Structure, Function and Genetics, 71(1), 261–277. https://doi.org/10.1002/prot.21715
[4] Benkert, P., Biasini, M., & Schwede, T. (2011). Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics, 27(3), 343–350. https://doi.org/10.1093/bioinformatics/btq662
[5] Studer, G., Rempfer, C., Waterhouse, A. M., Gumienny, R., Haas, J., & Schwede, T. (2020). QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics, 36(6), 1765–1771. https://doi.org/10.1093/bioinformatics/btz828
[6] Davis, I. W., Leaver-Fay, A., Chen, V. B., Block, J. N., Kapral, G. J., Wang, X., Murray, L. W., Arendall, W. B., Snoeyink, J., Richardson, J. S., & Richardson, D. C. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Research, 35(suppl_2), W375–W383. https://doi.org/10.1093/NAR/GKM216
[7] Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S., & Richardson, D. C. (2010). MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallographica Section D: Biological Crystallography, 66(1), 12–21. https://doi.org/10.1107/S0907444909042073
[8] Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B., Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S., & Richardson, D. C. (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein Science, 27(1), 293–315. https://doi.org/10.1002/pro.3330
Session Info
[12]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette 1.4
pandas 1.5.3
session_info 1.0.0
-----
PIL 7.0.0
altmod NA
anyio NA
asttokens NA
attr 19.3.0
babel 2.12.1
backcall 0.2.0
certifi 2022.12.07
chardet 3.0.4
charset_normalizer 3.1.0
comm 0.1.2
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.6
decorator 4.4.2
executing 1.2.0
fastjsonschema NA
idna 3.4
importlib_metadata NA
importlib_resources NA
ipykernel 6.21.3
ipython_genutils 0.2.0
jedi 0.18.2
jinja2 3.1.2
json5 NA
jsonschema 4.17.3
jupyter_events 0.6.3
jupyter_server 2.4.0
jupyterlab_server 2.20.0
kiwisolver 1.0.1
markupsafe 2.1.2
matplotlib 3.1.2
modeller 10.4
more_itertools NA
mpl_toolkits NA
nbformat 5.7.3
numexpr 2.8.4
numpy 1.24.2
ost 2.3.1
packaging 20.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.1.1
prometheus_client NA
promod3 3.2.1
prompt_toolkit 3.0.38
psutil 5.5.1
ptyprocess 0.7.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.14.0
pyparsing 2.4.6
pyrsistent NA
pythonjsonlogger NA
pytz 2022.7.1
qmean NA
requests 2.28.2
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
send2trash NA
sitecustomize NA
six 1.12.0
sniffio 1.3.0
stack_data 0.6.2
swig_runtime_data4 NA
tornado 6.2
traitlets 5.9.0
urllib3 1.26.15
wcwidth NA
websocket 1.5.1
yaml 6.0
zipp NA
zmq 25.0.1
-----
IPython 8.11.0
jupyter_client 8.0.3
jupyter_core 5.2.0
jupyterlab 3.6.1
notebook 6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:37