Tutorial 1: Basics
[1]:
import homelette as hm
Introduction
Welcome to the first tutorial on how to use the homelette
package. In this example, we will generate homology models using both modeller
[1,2] and ProMod3
[3,4] and then evaluate them using the DOPE score [5].
homelette
is a Python package that delivers a unified interface to various homology modelling and model evaluation software. It is also easily customizable and extendable. Through a series of 7 tutorials, you will learn how to work with homelette
as well as how to extend and adapt it to your specific needs.
In tutorial 1, you will learn how to:
Import an alignment.
Generate homology models using a predefined routine with
modeller
.Generate homology models using a predefined routine with
ProMod3
.Evaluate these models.
In this example, we will generate a protein structure for the RBD domain of ARAF. ARAF is a RAF kinase important in MAPK signalling. As a template, we will choose a close relative of ARAF called BRAF, specifically the structure with the PDB code 3NY5.
All files necessary for running this tutorial are already prepared and deposited in the following directory: homelette/example/data/
. If you execute this tutorial from homelette/example/
, you don’t have to adapt any of the paths.
homelette
comes with an extensive documentation. You can either check out our online documentation, compile a local version of the documentation in homelette/docs/
or use the help()
function in Python.
Alignment
The basis for a good homology model is a good alignment between your target and your template(s). There are many ways to generate alignments. Depending on the scope of your project, you might want to generate extensive, high-quality multiple sequence alignments from annotated sequence libraries of your sequences of interest using specific software such as t-coffee [6,7], or get a web service such as HH-Pred [8,9] to search for potential templates and align them.
For this example, we have already provided an alignment for you.
homelette
has its own Alignment
class which is used to work with alignments. You can import alignments from different file types, write alignments to different file types, select a subset of sequences, calculate sequence identity and print the alignment to screen. For more information, please check out the documentation.
[2]:
# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')
# print to screen to check alignment
aln.print_clustal(line_wrap=70)
ARAF ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLIKGRKTVTAWDTAIAPLDGEE
3NY5 HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ---KKPIGWDTDISWLTGEE
ARAF LIVEVL------
3NY5 LHVEVLENVPLT
The template aligns nicely to our target. We can also check how much sequence identity these two sequences share:
[3]:
# calculate identity
aln.calc_identity_target('ARAF')
[3]:
sequence_1 | sequence_2 | identity | |
---|---|---|---|
0 | ARAF | 3NY5 | 57.53 |
The two sequences share a high amount of sequence identity, which is a good sign that our homology model might be reliable.
modeller
expects the sequences handed to it to be annotated to a minimal degree. It is usually a good idea to annotate any template given to modeller
in addition to the required PDB identifier with beginning and end residues and chains. This can be done as such:
[4]:
# annotate the alignment
aln.get_sequence('ARAF').annotate(seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
pdb_code = '3NY5',
begin_res = '1',
begin_chain = 'A',
end_res = '81',
end_chain = 'A')
For more information on the sequence annotation, please check the documentation.
Template Structures
For the sake of consistency, we recommend adjusting the residue count to start with residue 1 for each model and ignore missing residues. A good tool for handling PDB structures is pdb-tools
(available here) [10].
Model Generation
After importing our alignment, checking it manually, calculating sequence identities and annotating the sequences, as well as taking about the templates we are using, we are now able to proceed with the model generation.
Before starting modelling and evaluation, we need to set up a Task
object. The purpose of Task
objects is to simplify the interface to modelling and evaluation methods. Task
objects are alignment-specific and target-specific.
[5]:
# set up task object
t = hm.Task(
task_name = 'Tutorial1',
target = 'ARAF',
alignment = aln,
overwrite = True)
Upon initialization, the task object will check if there is a folder in the current working directory that corresponds to the given task_name
. If no such folder is available, a new one will be created.
After initialization of the Task object, we can start with homology modelling. For this, we use the execute_routine
function of the task object, which applies the chosen homology modelling method with the chosen target, alignment and template(s).
[6]:
# generate models with modeller
t.execute_routine(
tag = 'example_modeller',
routine = hm.routines.Routine_automodel_default,
templates = ['3NY5'],
template_location = './data/single')
It is possible to use the same Task
object to create models from multiple different routine-template combinations.
[7]:
# generate models with promod3
t.execute_routine(
tag = 'example_promod3',
routine = hm.routines.Routine_promod3,
templates = ['3NY5'],
template_location = './data/single')
Model Evaluation
Similarly to modelling, model evaluation is performed through the evaluate_models
function of the Task
object. This function is an easy interface to perform one or more evaluation methods on the models deposited in the task object.
[8]:
# perform evaluation
t.evaluate_models(hm.evaluation.Evaluation_dope)
The Task.get_evaluation
function retrieves the evaluation for all models in the Task
object as a pandas
data frame.
[9]:
t.get_evaluation()
[9]:
model | tag | routine | dope | dope_z_score | |
---|---|---|---|---|---|
0 | example_modeller_1.pdb | example_modeller | automodel_default | -7274.457520 | -1.576995 |
1 | example_promod3_1.pdb | example_promod3 | promod3 | -7642.868652 | -1.934412 |
For more details on the available evaluation methods please check out the documentation and the Tutorial 3.
Further Reading
Congratulations, you are now familiar with the basic functionality of homelette
. You can now load an alignment, are familiar with the Task
object and can perform homology modelling and evaluate your models.
Please note that there are other, more advanced tutorials, which will teach you more about how to use homelette
:
Tutorial 2: Learn more about already implemented routines for homology modelling.
Tutorial 3: Learn about the evaluation metrics available with
homelette
.Tutorial 4: Learn about extending
homelette
’s functionality by defining your own modelling routines and evaluation metrics.Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.
Tutorial 6: Learn about modelling protein complexes.
Tutorial 7: Learn about assembling custom pipelines.
Tutorial 8: Learn about automated template identification, alignment generation and template processing.
References
[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626
[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3
[3] Biasini, M., Schmidt, T., Bienert, S., Mariani, V., Studer, G., Haas, J., Johner, N., Schenk, A. D., Philippsen, A., & Schwede, T. (2013). OpenStructure: An integrated software framework for computational structural biology. Acta Crystallographica Section D: Biological Crystallography, 69(5), 701–709. https://doi.org/10.1107/S0907444913007051
[4] Studer, G., Tauriello, G., Bienert, S., Biasini, M., Johner, N., & Schwede, T. (2021). ProMod3—A versatile homology modelling toolbox. PLOS Computational Biology, 17(1), e1008667. https://doi.org/10.1371/JOURNAL.PCBI.1008667
[5] Shen, M., & Sali, A. (2006). Statistical potential for assessment and prediction of protein structures. Protein Science, 15(11), 2507–2524. https://doi.org/10.1110/ps.062416606
[6] Notredame, C., Higgins, D. G., & Heringa, J. (2000). T-coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1), 205–217. https://doi.org/10.1006/jmbi.2000.4042
[7] Wallace, I. M., O’Sullivan, O., Higgins, D. G., & Notredame, C. (2006). M-Coffee: Combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Research, 34(6), 1692–1699. https://doi.org/10.1093/nar/gkl091
[8] Söding, J., Biegert, A., & Lupas, A. N. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research, 33(suppl_2), W244–W248. https://doi.org/10.1093/NAR/GKI408
[9] Zimmermann, L., Stephens, A., Nam, S. Z., Rau, D., Kübler, J., Lozajic, M., Gabler, F., Söding, J., Lupas, A. N., & Alva, V. (2018). A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. Journal of Molecular Biology, 430(15), 2237–2243. https://doi.org/10.1016/J.JMB.2017.12.007
[10] Rodrigues, J. P. G. L. M., Teixeira, J. M. C., Trellet, M., & Bonvin, A. M. J. J. (2018). pdb-tools: a swiss army knife for molecular structures. F1000Research 2018 7:1961, 7, 1961. https://doi.org/10.12688/f1000research.17456.1
Session Info
[10]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette 1.4
pandas 1.5.3
session_info 1.0.0
-----
PIL 7.0.0
altmod NA
anyio NA
asttokens NA
attr 19.3.0
babel 2.12.1
backcall 0.2.0
certifi 2022.12.07
chardet 3.0.4
charset_normalizer 3.1.0
comm 0.1.2
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.6
decorator 4.4.2
executing 1.2.0
fastjsonschema NA
idna 3.4
importlib_metadata NA
importlib_resources NA
ipykernel 6.21.3
ipython_genutils 0.2.0
jedi 0.18.2
jinja2 3.1.2
json5 NA
jsonschema 4.17.3
jupyter_events 0.6.3
jupyter_server 2.4.0
jupyterlab_server 2.20.0
kiwisolver 1.0.1
markupsafe 2.1.2
matplotlib 3.1.2
modeller 10.4
more_itertools NA
mpl_toolkits NA
nbformat 5.7.3
numexpr 2.8.4
numpy 1.24.2
ost 2.3.1
packaging 20.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.1.1
prometheus_client NA
promod3 3.2.1
prompt_toolkit 3.0.38
psutil 5.5.1
ptyprocess 0.7.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.14.0
pyparsing 2.4.6
pyrsistent NA
pythonjsonlogger NA
pytz 2022.7.1
qmean NA
requests 2.28.2
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
send2trash NA
sitecustomize NA
six 1.12.0
sniffio 1.3.0
stack_data 0.6.2
swig_runtime_data4 NA
tornado 6.2
traitlets 5.9.0
urllib3 1.26.15
wcwidth NA
websocket 1.5.1
yaml 6.0
zipp NA
zmq 25.0.1
-----
IPython 8.11.0
jupyter_client 8.0.3
jupyter_core 5.2.0
jupyterlab 3.6.1
notebook 6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:34