Tutorial 1: Basics

[1]:

import homelette as hm

Introduction

Welcome to the first tutorial on how to use the homelette package. In this example, we will generate homology models using both modeller [1,2] and ProMod3 [3,4] and then evaluate them using the DOPE score [5].

homelette is a Python package that delivers a unified interface to various homology modelling and model evaluation software. It is also easily customizable and extendable. Through a series of 7 tutorials, you will learn how to work with homelette as well as how to extend and adapt it to your specific needs.

In tutorial 1, you will learn how to:

Import an alignment.
Generate homology models using a predefined routine with modeller.
Generate homology models using a predefined routine with ProMod3.
Evaluate these models.

In this example, we will generate a protein structure for the RBD domain of ARAF. ARAF is a RAF kinase important in MAPK signalling. As a template, we will choose a close relative of ARAF called BRAF, specifically the structure with the PDB code 3NY5.

All files necessary for running this tutorial are already prepared and deposited in the following directory: homelette/example/data/. If you execute this tutorial from homelette/example/, you don’t have to adapt any of the paths.

homelette comes with an extensive documentation. You can either check out our online documentation, compile a local version of the documentation in homelette/docs/ or use the help() function in Python.

Alignment

The basis for a good homology model is a good alignment between your target and your template(s). There are many ways to generate alignments. Depending on the scope of your project, you might want to generate extensive, high-quality multiple sequence alignments from annotated sequence libraries of your sequences of interest using specific software such as t-coffee [6,7], or get a web service such as HH-Pred [8,9] to search for potential templates and align them.

For this example, we have already provided an alignment for you.

homelette has its own Alignment class which is used to work with alignments. You can import alignments from different file types, write alignments to different file types, select a subset of sequences, calculate sequence identity and print the alignment to screen. For more information, please check out the documentation.

[2]:

# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')

# print to screen to check alignment
aln.print_clustal(line_wrap=70)

ARAF        ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLIKGRKTVTAWDTAIAPLDGEE
3NY5        HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ---KKPIGWDTDISWLTGEE


ARAF        LIVEVL------
3NY5        LHVEVLENVPLT

The template aligns nicely to our target. We can also check how much sequence identity these two sequences share:

[3]:

# calculate identity
aln.calc_identity_target('ARAF')

[3]:

	sequence_1	sequence_2	identity
0	ARAF	3NY5	57.53

The two sequences share a high amount of sequence identity, which is a good sign that our homology model might be reliable.

modeller expects the sequences handed to it to be annotated to a minimal degree. It is usually a good idea to annotate any template given to modeller in addition to the required PDB identifier with beginning and end residues and chains. This can be done as such:

[4]:

# annotate the alignment
aln.get_sequence('ARAF').annotate(seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
                              pdb_code = '3NY5',
                              begin_res = '1',
                              begin_chain = 'A',
                              end_res = '81',
                              end_chain = 'A')

For more information on the sequence annotation, please check the documentation.

Template Structures

For the sake of consistency, we recommend adjusting the residue count to start with residue 1 for each model and ignore missing residues. A good tool for handling PDB structures is pdb-tools (available here) [10].

Model Generation

After importing our alignment, checking it manually, calculating sequence identities and annotating the sequences, as well as taking about the templates we are using, we are now able to proceed with the model generation.

Before starting modelling and evaluation, we need to set up a Task object. The purpose of Task objects is to simplify the interface to modelling and evaluation methods. Task objects are alignment-specific and target-specific.

[5]:

# set up task object
t = hm.Task(
    task_name = 'Tutorial1',
    target = 'ARAF',
    alignment = aln,
    overwrite = True)

Upon initialization, the task object will check if there is a folder in the current working directory that corresponds to the given task_name. If no such folder is available, a new one will be created.

After initialization of the Task object, we can start with homology modelling. For this, we use the execute_routine function of the task object, which applies the chosen homology modelling method with the chosen target, alignment and template(s).

[6]:

# generate models with modeller
t.execute_routine(
    tag = 'example_modeller',
    routine = hm.routines.Routine_automodel_default,
    templates = ['3NY5'],
    template_location = './data/single')

It is possible to use the same Task object to create models from multiple different routine-template combinations.

[7]:

# generate models with promod3
t.execute_routine(
    tag = 'example_promod3',
    routine = hm.routines.Routine_promod3,
    templates = ['3NY5'],
    template_location = './data/single')

Model Evaluation

Similarly to modelling, model evaluation is performed through the evaluate_models function of the Task object. This function is an easy interface to perform one or more evaluation methods on the models deposited in the task object.

[8]:

# perform evaluation
t.evaluate_models(hm.evaluation.Evaluation_dope)

The Task.get_evaluation function retrieves the evaluation for all models in the Task object as a pandas data frame.

[9]:

t.get_evaluation()

[9]:

	model	tag	routine	dope	dope_z_score
0	example_modeller_1.pdb	example_modeller	automodel_default	-7274.457520	-1.576995
1	example_promod3_1.pdb	example_promod3	promod3	-7642.868652	-1.934412

For more details on the available evaluation methods please check out the documentation and the Tutorial 3.

References

[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626

[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3

[3] Biasini, M., Schmidt, T., Bienert, S., Mariani, V., Studer, G., Haas, J., Johner, N., Schenk, A. D., Philippsen, A., & Schwede, T. (2013). OpenStructure: An integrated software framework for computational structural biology. Acta Crystallographica Section D: Biological Crystallography, 69(5), 701–709. https://doi.org/10.1107/S0907444913007051

[4] Studer, G., Tauriello, G., Bienert, S., Biasini, M., Johner, N., & Schwede, T. (2021). ProMod3—A versatile homology modelling toolbox. PLOS Computational Biology, 17(1), e1008667. https://doi.org/10.1371/JOURNAL.PCBI.1008667

[5] Shen, M., & Sali, A. (2006). Statistical potential for assessment and prediction of protein structures. Protein Science, 15(11), 2507–2524. https://doi.org/10.1110/ps.062416606

[6] Notredame, C., Higgins, D. G., & Heringa, J. (2000). T-coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1), 205–217. https://doi.org/10.1006/jmbi.2000.4042

[7] Wallace, I. M., O’Sullivan, O., Higgins, D. G., & Notredame, C. (2006). M-Coffee: Combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Research, 34(6), 1692–1699. https://doi.org/10.1093/nar/gkl091

[8] Söding, J., Biegert, A., & Lupas, A. N. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research, 33(suppl_2), W244–W248. https://doi.org/10.1093/NAR/GKI408

[9] Zimmermann, L., Stephens, A., Nam, S. Z., Rau, D., Kübler, J., Lozajic, M., Gabler, F., Söding, J., Lupas, A. N., & Alva, V. (2018). A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. Journal of Molecular Biology, 430(15), 2237–2243. https://doi.org/10.1016/J.JMB.2017.12.007

[10] Rodrigues, J. P. G. L. M., Teixeira, J. M. C., Trellet, M., & Bonvin, A. M. J. J. (2018). pdb-tools: a swiss army knife for molecular structures. F1000Research 2018 7:1961, 7, 1961. https://doi.org/10.12688/f1000research.17456.1

Session Info

[10]:

# session info
import session_info
session_info.show(html = False, dependencies = True)

-----
homelette           1.4
pandas              1.5.3
session_info        1.0.0
-----
PIL                 7.0.0
altmod              NA
anyio               NA
asttokens           NA
attr                19.3.0
babel               2.12.1
backcall            0.2.0
certifi             2022.12.07
chardet             3.0.4
charset_normalizer  3.1.0
comm                0.1.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.6
decorator           4.4.2
executing           1.2.0
fastjsonschema      NA
idna                3.4
importlib_metadata  NA
importlib_resources NA
ipykernel           6.21.3
ipython_genutils    0.2.0
jedi                0.18.2
jinja2              3.1.2
json5               NA
jsonschema          4.17.3
jupyter_events      0.6.3
jupyter_server      2.4.0
jupyterlab_server   2.20.0
kiwisolver          1.0.1
markupsafe          2.1.2
matplotlib          3.1.2
modeller            10.4
more_itertools      NA
mpl_toolkits        NA
nbformat            5.7.3
numexpr             2.8.4
numpy               1.24.2
ost                 2.3.1
packaging           20.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.1.1
prometheus_client   NA
promod3             3.2.1
prompt_toolkit      3.0.38
psutil              5.5.1
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.14.0
pyparsing           2.4.6
pyrsistent          NA
pythonjsonlogger    NA
pytz                2022.7.1
qmean               NA
requests            2.28.2
rfc3339_validator   0.1.4
rfc3986_validator   0.1.1
send2trash          NA
sitecustomize       NA
six                 1.12.0
sniffio             1.3.0
stack_data          0.6.2
swig_runtime_data4  NA
tornado             6.2
traitlets           5.9.0
urllib3             1.26.15
wcwidth             NA
websocket           1.5.1
yaml                6.0
zipp                NA
zmq                 25.0.1
-----
IPython             8.11.0
jupyter_client      8.0.3
jupyter_core        5.2.0
jupyterlab          3.6.1
notebook            6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:34