Tutorial 2: Modelling
[1]:
import os
import homelette as hm
Introduction
Welcome to the second tutorial for homelette
. In this tutorial, we will further explore the already implemented method to generate homology models.
Currently, the following software packages for generating homology models have been integrated in the homelette
homology modelling interface:
modeller
: A robust package for homology modelling with a long history which is widely used [1,2]altmod
: A modification to the standardmodeller
modelling procedure that has been reported to increase the quality of models [3]ProMod3
: The modelling engine behind the popular SwissModel web platform [4,5]
Specifically, the following routines are implemented in homelette
. For more details on the individual routines, please check the documentation or their respective docstring.
routines.Routine_automodel_default
routines.Routine_automodel_slow
routines.Routine_altmod_default
routines.Routine_altmod_slow
routines.Routine_promod3
In this example, we will generate models for the RBD domain of ARAF. ARAF is a RAF kinase important in MAPK signalling. As a template, we will choose a close relative of ARAF called BRAF, specifically the structure with the PDB code 3NY5.
All files necessary for running this tutorial are already prepared and deposited in the following directory: homelette/example/data/
. If you execute this tutorial from homelette/example/
, you don’t have to adapt any of the paths.
homelette
comes with an extensive documentation. You can either check out our online documentation, compile a local version of the documentation in homelette/docs/
with sphinx
or use the help()
function in Python.
Alignment
For this tutorial, we will use the same alignment and template as for Tutorial 1.
[2]:
# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')
# print to screen to check alignment
aln.print_clustal(line_wrap=70)
ARAF ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLIKGRKTVTAWDTAIAPLDGEE
3NY5 HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ---KKPIGWDTDISWLTGEE
ARAF LIVEVL------
3NY5 LHVEVLENVPLT
[3]:
# annotate the alignment
aln.get_sequence('ARAF').annotate(seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
pdb_code = '3NY5',
begin_res = '1',
begin_chain = 'A',
end_res = '81',
end_chain = 'A')
Model Generation using routines
The building blocks in homelette
that take care of model generation are called Routines. There is a number of pre-defined routines, and it is also possible to construct custom routines (see Tutorial 4). Every routine in homelette
expects a number of identical arguments, while some can have a few optional ones as well.
[4]:
?hm.routines.Routine_automodel_default
Init signature:
hm.routines.Routine_automodel_default(
alignment: Type[ForwardRef('Alignment')],
target: str,
templates: Iterable,
tag: str,
n_threads: int = 1,
n_models: int = 1,
) -> None
Docstring:
Class for performing homology modelling using the automodel class from
modeller with a default parameter set.
Parameters
----------
alignment : Alignment
The alignment object that will be used for modelling
target : str
The identifier of the protein to model
templates : Iterable
The iterable containing the identifier(s) of the template(s) used
for the modelling
tag : str
The identifier associated with a specific execution of the routine
n_threads : int
Number of threads used in model generation (default 1)
n_models : int
Number of models generated (default 1)
Attributes
----------
alignment : Alignment
The alignment object that will be used for modelling
target : str
The identifier of the protein to model
templates : Iterable
The iterable containing the identifier(s) of the template(s) used for
the modelling
tag : str
The identifier associated with a specific execution of the routine
n_threads : int
Number of threads used for model generation
n_models : int
Number of models generated
routine : str
The identifier associated with a specific routine
models : list
List of models generated by the execution of this routine
Raises
------
ImportError
Unable to import dependencies
Notes
-----
The following modelling parameters can be set when initializing this
Routine object:
* n_models
* n_threads
The following modelling parameters are set for this class:
+-----------------------+---------------------------------------+
| modelling | value |
| parameter | |
+=======================+=======================================+
| model_class | modeller.automodel.automodel |
+-----------------------+---------------------------------------+
| library_schedule | modeller.automodel.autosched.normal |
+-----------------------+---------------------------------------+
| md_level | modeller.automodel.refine.very_fast |
+-----------------------+---------------------------------------+
| max_var_iterations | 200 |
+-----------------------+---------------------------------------+
| repeat_optmization | 1 |
+-----------------------+---------------------------------------+
File: /usr/local/src/homelette-1.4/homelette/routines.py
Type: type
Subclasses:
The following arguments are required for all pre-defined routines:
alignment
: The alignment object used for modelling.target
: The identifier of the target sequence in the alignment objecttemplates
: An iterable containing the identifier(s) of the templates for this modelling routine.homelette
expects that templates are uniquely identified by their identifier in the alignment and in the template PDB file(s). Routines based onmodeller
work with one or multiple templates, whereasRoutine_promod3
only accepts a single template per run.tag
: Each executed routine is given a tag which will be used to name the generated models.
In addition, pre-defined routines expect the template PDBs to be present in the current working directory.
The routine Routine_automodel_default
has two optional arguments:
n_models
: the number of models that should be produced on this run, as routines based onmodeller
are able to produce an arbitary number of models.n_threads
: enable mulit-threading for the execution of this routine. For more information on parallelization inhomelette
, please check out Tutorial 5.
While it is generally recommended to execute routines using Task
objects (see next section), it is also possible to execute them directly. For doing this, since the template file has to be in the curent working directory, we quickly change working directory to a prepared directory where we can execute the routine (this code assumes that your working directory is homelette/examples
.
[5]:
# change directory
os.chdir('data/single')
# print content of directory to screen
print('Files before modelling:\n' + ' '.join(os.listdir()) + '\n\n')
# perform modelling
routine = hm.routines.Routine_automodel_default(
alignment=aln,
target='ARAF',
templates=['3NY5'],
tag='model')
routine.generate_models()
print('Files after modelling:\n' + ' '.join(os.listdir()) + '\n')
# remove model
os.remove('model_1.pdb')
# change back to tutorial directory
os.chdir('../..')
Files before modelling:
3NY5.pdb aln_1.fasta_aln 4G0N.pdb
Files after modelling:
model_1.pdb 3NY5.pdb aln_1.fasta_aln 4G0N.pdb
Model Generation using Task
and routines
homelette
has Task
objects that allow for easier use of Routines and Evaluations (see also Tutorial 3). Task
objects help to direct and organize modelling pipelines. It is strongly recommended to use Task
objects to execute routines and evaluations.
For more information on Task
objects, please check out the documentation or Tutorial 1.
[6]:
# set up task object
t = hm.Task(
task_name = 'Tutorial2',
target = 'ARAF',
alignment = aln,
overwrite = True)
Using the Task
object, we can now begin to generate our models with different routines using the Task.execute_routine
method.
[7]:
?hm.Task.execute_routine
Signature:
hm.Task.execute_routine(
self,
tag: str,
routine: Type[ForwardRef('routines.Routine')],
templates: Iterable,
template_location: str = '.',
**kwargs,
) -> None
Docstring:
Generates homology models using a specified modelling routine
Parameters
----------
tag : str
The identifier associated with this combination of routine and
template(s). Has to be unique between all routines executed by the
same task object
routine : Routine
The routine object used to generate the models
templates : list
The iterable containing the identifier(s) of the template(s) used
for model generation
template_location : str, optional
The location of the template PDB files. They should be named
according to their identifiers in the alignment (i.e. for a
sequence named "1WXN" to be used as a template, it is expected that
there will be a PDB file named "1WXN.pdb" in the specified template
location (default is current working directory)
**kwargs
Named parameters passed directly on to the Routine object when the
modelling is performed. Please check the documentation in order to
make sure that the parameters passed on are available with the
Routine object you intend to use
Returns
-------
None
File: /usr/local/src/homelette-1.4/homelette/organization.py
Type: function
As we can see, Task.execute_routine
expects a number of arguments from the user:
tag
: Each executed routine is given a tag which will be used to name the generated models. This is useful for differentiating between different routines executed by the sameTask
, for example if different templates are used.routine
: Here the user can set which routine will be used for generating the homology model(s), arguably the most important setting.templates
: An iterable containing the identifier(s) of the templates for this modelling routine.homelette
expects that templates are uniquely identified by their identifier(s) in the alignment and in the template location.template_location
: The folder where the PDB file(s) used as template(s) are found.
We are generating some models with the pre-defined routines of homelette
:
[8]:
# model generation with modeller
t.execute_routine(
tag = 'example_modeller',
routine = hm.routines.Routine_automodel_default,
templates = ['3NY5'],
template_location = './data/single')
# model generation with altmod
t.execute_routine(
tag = 'example_altmod',
routine = hm.routines.Routine_altmod_default,
templates = ['3NY5'],
template_location = './data/single')
# model generation with promod3
t.execute_routine(
tag = 'example_promod3',
routine = hm.routines.Routine_promod3,
templates = ['3NY5'],
template_location = './data/single')
As mentioned before, some modelling routines have optional arguments, such as n_models
for Routine_autmodel_default
. We can pass these optional arguments to Task.execute_routine
which passes them on the routine selected:
[9]:
# multiple model generation with altmod
t.execute_routine(
tag = 'example_modeller_more_models',
routine = hm.routines.Routine_automodel_default,
templates = ['3NY5'],
template_location = './data/single',
n_models = 10)
Models generated using Task
objects are stored as Model
objects in the Task
:
[10]:
t.models
[10]:
[<homelette.organization.Model at 0x7f421f7f9280>,
<homelette.organization.Model at 0x7f421f7cf7f0>,
<homelette.organization.Model at 0x7f421f8f4370>,
<homelette.organization.Model at 0x7f421f8dfca0>,
<homelette.organization.Model at 0x7f421f8df2e0>,
<homelette.organization.Model at 0x7f421f8da2b0>,
<homelette.organization.Model at 0x7f421f8da400>,
<homelette.organization.Model at 0x7f421f8da370>,
<homelette.organization.Model at 0x7f421f806220>,
<homelette.organization.Model at 0x7f421f806cd0>,
<homelette.organization.Model at 0x7f421f806a00>,
<homelette.organization.Model at 0x7f421f806f10>,
<homelette.organization.Model at 0x7f421f806280>]
In conclusion, we have learned how to use a single Task
object to generate models with different modelling routines. We have also learned how to pass optional arguments on to the executed routines.
In this example, the target, the alignment and the templates were kept identical. Varying the templates would be straight forward, under the condition that other templates are included in the alignment. For varying alignments and targets, new Task
objects would need to be created. This is a design choice that is meant to encourage users to try out different routines or templates/template combinations. It is recommended when using different routines or multiple templates to indicate this
using the tag
argument of Task.execute_routine
(i.e. tag='automodel_3NY5')
. Similarly, using a single Task
object for multiple targets or alignments is discouraged and we recommend to utilize multiple Task
objects for these modelling approaches.
Further Reading
You are now familiar with model generation in homelette
.
Please note that there are other tutorials, which will teach you more about how to use homelette
:
Tutorial 1: Learn about the basics of
homelette
.Tutorial 3: Learn about the evaluation metrics available with
homelette
.Tutorial 4: Learn about extending
homelette
’s functionality by defining your own modelling routines and evaluation metrics.Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.
Tutorial 6: Learn about modelling protein complexes.
Tutorial 7: Learn about assembling custom pipelines.
Tutorial 8: Learn about automated template identification, alignment generation and template processing.
References
[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626
[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3
[3] Janson, G., Grottesi, A., Pietrosanto, M., Ausiello, G., Guarguaglini, G., & Paiardini, A. (2019). Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling. PLoS Computational Biology, 15(12), e1007219. https://doi.org/10.1371/journal.pcbi.1007219
[4] Biasini, M., Schmidt, T., Bienert, S., Mariani, V., Studer, G., Haas, J., Johner, N., Schenk, A. D., Philippsen, A., & Schwede, T. (2013). OpenStructure: An integrated software framework for computational structural biology. Acta Crystallographica Section D: Biological Crystallography, 69(5), 701–709. https://doi.org/10.1107/S0907444913007051
[5] Studer, G., Tauriello, G., Bienert, S., Biasini, M., Johner, N., & Schwede, T. (2021). ProMod3—A versatile homology modelling toolbox. PLOS Computational Biology, 17(1), e1008667. https://doi.org/10.1371/JOURNAL.PCBI.1008667
Session Info
[11]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette 1.4
session_info 1.0.0
-----
PIL 7.0.0
altmod NA
anyio NA
asttokens NA
attr 19.3.0
babel 2.12.1
backcall 0.2.0
certifi 2022.12.07
chardet 3.0.4
charset_normalizer 3.1.0
comm 0.1.2
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.6
decorator 4.4.2
executing 1.2.0
fastjsonschema NA
idna 3.4
importlib_metadata NA
importlib_resources NA
ipykernel 6.21.3
ipython_genutils 0.2.0
jedi 0.18.2
jinja2 3.1.2
json5 NA
jsonschema 4.17.3
jupyter_events 0.6.3
jupyter_server 2.4.0
jupyterlab_server 2.20.0
kiwisolver 1.0.1
markupsafe 2.1.2
matplotlib 3.1.2
modeller 10.4
more_itertools NA
mpl_toolkits NA
nbformat 5.7.3
numexpr 2.8.4
numpy 1.24.2
ost 2.3.1
packaging 20.3
pandas 1.5.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.1.1
prometheus_client NA
promod3 3.2.1
prompt_toolkit 3.0.38
psutil 5.5.1
ptyprocess 0.7.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.14.0
pyparsing 2.4.6
pyrsistent NA
pythonjsonlogger NA
pytz 2022.7.1
qmean NA
requests 2.28.2
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
send2trash NA
sitecustomize NA
six 1.12.0
sniffio 1.3.0
stack_data 0.6.2
swig_runtime_data4 NA
tornado 6.2
traitlets 5.9.0
urllib3 1.26.15
wcwidth NA
websocket 1.5.1
yaml 6.0
zipp NA
zmq 25.0.1
-----
IPython 8.11.0
jupyter_client 8.0.3
jupyter_core 5.2.0
jupyterlab 3.6.1
notebook 6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:35