Tutorial 2: Modelling

[1]:

import os

import homelette as hm

Introduction

Welcome to the second tutorial for homelette. In this tutorial, we will further explore the already implemented method to generate homology models.

Currently, the following software packages for generating homology models have been integrated in the homelette homology modelling interface:

modeller: A robust package for homology modelling with a long history which is widely used [1,2]
altmod: A modification to the standard modeller modelling procedure that has been reported to increase the quality of models [3]
ProMod3: The modelling engine behind the popular SwissModel web platform [4,5]

Specifically, the following routines are implemented in homelette. For more details on the individual routines, please check the documentation or their respective docstring.

routines.Routine_automodel_default
routines.Routine_automodel_slow
routines.Routine_altmod_default
routines.Routine_altmod_slow
routines.Routine_promod3

In this example, we will generate models for the RBD domain of ARAF. ARAF is a RAF kinase important in MAPK signalling. As a template, we will choose a close relative of ARAF called BRAF, specifically the structure with the PDB code 3NY5.

All files necessary for running this tutorial are already prepared and deposited in the following directory: homelette/example/data/. If you execute this tutorial from homelette/example/, you don’t have to adapt any of the paths.

homelette comes with an extensive documentation. You can either check out our online documentation, compile a local version of the documentation in homelette/docs/ with sphinx or use the help() function in Python.

Alignment

For this tutorial, we will use the same alignment and template as for Tutorial 1.

[2]:

# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')

# print to screen to check alignment
aln.print_clustal(line_wrap=70)

ARAF        ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLIKGRKTVTAWDTAIAPLDGEE
3NY5        HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ---KKPIGWDTDISWLTGEE


ARAF        LIVEVL------
3NY5        LHVEVLENVPLT

[3]:

# annotate the alignment
aln.get_sequence('ARAF').annotate(seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
                              pdb_code = '3NY5',
                              begin_res = '1',
                              begin_chain = 'A',
                              end_res = '81',
                              end_chain = 'A')

Model Generation using `routines`

The building blocks in homelette that take care of model generation are called Routines. There is a number of pre-defined routines, and it is also possible to construct custom routines (see Tutorial 4). Every routine in homelette expects a number of identical arguments, while some can have a few optional ones as well.

[4]:

?hm.routines.Routine_automodel_default

Init signature:
hm.routines.Routine_automodel_default(
    alignment: Type[ForwardRef('Alignment')],
    target: str,
    templates: Iterable,
    tag: str,
    n_threads: int = 1,
    n_models: int = 1,
) -> None
Docstring:
Class for performing homology modelling using the automodel class from
modeller with a default parameter set.

Parameters
----------
alignment : Alignment
    The alignment object that will be used for modelling
target : str
    The identifier of the protein to model
templates : Iterable
    The iterable containing the identifier(s) of the template(s) used
    for the modelling
tag : str
    The identifier associated with a specific execution of the routine
n_threads : int
    Number of threads used in model generation (default 1)
n_models : int
    Number of models generated (default 1)

Attributes
----------
alignment : Alignment
    The alignment object that will be used for modelling
target : str
    The identifier of the protein to model
templates : Iterable
    The iterable containing the identifier(s) of the template(s) used for
    the modelling
tag : str
    The identifier associated with a specific execution of the routine
n_threads : int
    Number of threads used for model generation
n_models : int
    Number of models generated
routine : str
    The identifier associated with a specific routine
models : list
    List of models generated by the execution of this routine

Raises
------
ImportError
    Unable to import dependencies

Notes
-----
The following modelling parameters can be set when initializing this
Routine object:

* n_models
* n_threads

The following modelling parameters are set for this class:

+-----------------------+---------------------------------------+
| modelling             | value                                 |
| parameter             |                                       |
+=======================+=======================================+
| model_class           | modeller.automodel.automodel          |
+-----------------------+---------------------------------------+
| library_schedule      | modeller.automodel.autosched.normal   |
+-----------------------+---------------------------------------+
| md_level              | modeller.automodel.refine.very_fast   |
+-----------------------+---------------------------------------+
| max_var_iterations    | 200                                   |
+-----------------------+---------------------------------------+
| repeat_optmization    | 1                                     |
+-----------------------+---------------------------------------+
File:           /usr/local/src/homelette-1.4/homelette/routines.py
Type:           type
Subclasses:

The following arguments are required for all pre-defined routines:

alignment: The alignment object used for modelling.
target: The identifier of the target sequence in the alignment object
templates: An iterable containing the identifier(s) of the templates for this modelling routine. homelette expects that templates are uniquely identified by their identifier in the alignment and in the template PDB file(s). Routines based on modeller work with one or multiple templates, whereas Routine_promod3 only accepts a single template per run.
tag: Each executed routine is given a tag which will be used to name the generated models.

In addition, pre-defined routines expect the template PDBs to be present in the current working directory.

The routine Routine_automodel_default has two optional arguments:

n_models: the number of models that should be produced on this run, as routines based on modeller are able to produce an arbitary number of models.
n_threads: enable mulit-threading for the execution of this routine. For more information on parallelization in homelette, please check out Tutorial 5.

While it is generally recommended to execute routines using Task objects (see next section), it is also possible to execute them directly. For doing this, since the template file has to be in the curent working directory, we quickly change working directory to a prepared directory where we can execute the routine (this code assumes that your working directory is homelette/examples.

[5]:

# change directory
os.chdir('data/single')
# print content of directory to screen
print('Files before modelling:\n' + ' '.join(os.listdir()) + '\n\n')

# perform modelling
routine = hm.routines.Routine_automodel_default(
    alignment=aln,
    target='ARAF',
    templates=['3NY5'],
    tag='model')
routine.generate_models()

print('Files after modelling:\n' + ' '.join(os.listdir()) + '\n')

# remove model
os.remove('model_1.pdb')

# change back to tutorial directory
os.chdir('../..')

Files before modelling:
3NY5.pdb aln_1.fasta_aln 4G0N.pdb


Files after modelling:
model_1.pdb 3NY5.pdb aln_1.fasta_aln 4G0N.pdb

Model Generation using `Task` and `routines`

homelette has Task objects that allow for easier use of Routines and Evaluations (see also Tutorial 3). Task objects help to direct and organize modelling pipelines. It is strongly recommended to use Task objects to execute routines and evaluations.

For more information on Task objects, please check out the documentation or Tutorial 1.

[6]:

# set up task object
t = hm.Task(
    task_name = 'Tutorial2',
    target = 'ARAF',
    alignment = aln,
    overwrite = True)

Using the Task object, we can now begin to generate our models with different routines using the Task.execute_routine method.

[7]:

?hm.Task.execute_routine

Signature:
hm.Task.execute_routine(
    self,
    tag: str,
    routine: Type[ForwardRef('routines.Routine')],
    templates: Iterable,
    template_location: str = '.',
    **kwargs,
) -> None
Docstring:
Generates homology models using a specified modelling routine

Parameters
----------
tag : str
    The identifier associated with this combination of routine and
    template(s). Has to be unique between all routines executed by the
    same task object
routine : Routine
    The routine object used to generate the models
templates : list
    The iterable containing the identifier(s) of the template(s) used
    for model generation
template_location : str, optional
    The location of the template PDB files. They should be named
    according to their identifiers in the alignment (i.e. for a
    sequence named "1WXN" to be used as a template, it is expected that
    there will be a PDB file named "1WXN.pdb" in the specified template
    location (default is current working directory)
**kwargs
    Named parameters passed directly on to the Routine object when the
    modelling is performed. Please check the documentation in order to
    make sure that the parameters passed on are available with the
    Routine object you intend to use

Returns
-------
None
File:      /usr/local/src/homelette-1.4/homelette/organization.py
Type:      function

As we can see, Task.execute_routine expects a number of arguments from the user:

tag: Each executed routine is given a tag which will be used to name the generated models. This is useful for differentiating between different routines executed by the same Task, for example if different templates are used.
routine: Here the user can set which routine will be used for generating the homology model(s), arguably the most important setting.
templates: An iterable containing the identifier(s) of the templates for this modelling routine. homelette expects that templates are uniquely identified by their identifier(s) in the alignment and in the template location.
template_location: The folder where the PDB file(s) used as template(s) are found.

We are generating some models with the pre-defined routines of homelette:

[8]:

# model generation with modeller
t.execute_routine(
    tag = 'example_modeller',
    routine = hm.routines.Routine_automodel_default,
    templates = ['3NY5'],
    template_location = './data/single')

# model generation with altmod
t.execute_routine(
    tag = 'example_altmod',
    routine = hm.routines.Routine_altmod_default,
    templates = ['3NY5'],
    template_location = './data/single')

# model generation with promod3
t.execute_routine(
    tag = 'example_promod3',
    routine = hm.routines.Routine_promod3,
    templates = ['3NY5'],
    template_location = './data/single')

As mentioned before, some modelling routines have optional arguments, such as n_models for Routine_autmodel_default. We can pass these optional arguments to Task.execute_routine which passes them on the routine selected:

[9]:

# multiple model generation with altmod
t.execute_routine(
    tag = 'example_modeller_more_models',
    routine = hm.routines.Routine_automodel_default,
    templates = ['3NY5'],
    template_location = './data/single',
    n_models = 10)

Models generated using Task objects are stored as Model objects in the Task:

[10]:

t.models

[10]:

[<homelette.organization.Model at 0x7f421f7f9280>,
 <homelette.organization.Model at 0x7f421f7cf7f0>,
 <homelette.organization.Model at 0x7f421f8f4370>,
 <homelette.organization.Model at 0x7f421f8dfca0>,
 <homelette.organization.Model at 0x7f421f8df2e0>,
 <homelette.organization.Model at 0x7f421f8da2b0>,
 <homelette.organization.Model at 0x7f421f8da400>,
 <homelette.organization.Model at 0x7f421f8da370>,
 <homelette.organization.Model at 0x7f421f806220>,
 <homelette.organization.Model at 0x7f421f806cd0>,
 <homelette.organization.Model at 0x7f421f806a00>,
 <homelette.organization.Model at 0x7f421f806f10>,
 <homelette.organization.Model at 0x7f421f806280>]

In conclusion, we have learned how to use a single Task object to generate models with different modelling routines. We have also learned how to pass optional arguments on to the executed routines.

In this example, the target, the alignment and the templates were kept identical. Varying the templates would be straight forward, under the condition that other templates are included in the alignment. For varying alignments and targets, new Task objects would need to be created. This is a design choice that is meant to encourage users to try out different routines or templates/template combinations. It is recommended when using different routines or multiple templates to indicate this using the tag argument of Task.execute_routine (i.e. tag='automodel_3NY5'). Similarly, using a single Task object for multiple targets or alignments is discouraged and we recommend to utilize multiple Task objects for these modelling approaches.

References

[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626

[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3

[3] Janson, G., Grottesi, A., Pietrosanto, M., Ausiello, G., Guarguaglini, G., & Paiardini, A. (2019). Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling. PLoS Computational Biology, 15(12), e1007219. https://doi.org/10.1371/journal.pcbi.1007219

[4] Biasini, M., Schmidt, T., Bienert, S., Mariani, V., Studer, G., Haas, J., Johner, N., Schenk, A. D., Philippsen, A., & Schwede, T. (2013). OpenStructure: An integrated software framework for computational structural biology. Acta Crystallographica Section D: Biological Crystallography, 69(5), 701–709. https://doi.org/10.1107/S0907444913007051

[5] Studer, G., Tauriello, G., Bienert, S., Biasini, M., Johner, N., & Schwede, T. (2021). ProMod3—A versatile homology modelling toolbox. PLOS Computational Biology, 17(1), e1008667. https://doi.org/10.1371/JOURNAL.PCBI.1008667

Session Info

[11]:

# session info
import session_info
session_info.show(html = False, dependencies = True)

-----
homelette           1.4
session_info        1.0.0
-----
PIL                 7.0.0
altmod              NA
anyio               NA
asttokens           NA
attr                19.3.0
babel               2.12.1
backcall            0.2.0
certifi             2022.12.07
chardet             3.0.4
charset_normalizer  3.1.0
comm                0.1.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.6
decorator           4.4.2
executing           1.2.0
fastjsonschema      NA
idna                3.4
importlib_metadata  NA
importlib_resources NA
ipykernel           6.21.3
ipython_genutils    0.2.0
jedi                0.18.2
jinja2              3.1.2
json5               NA
jsonschema          4.17.3
jupyter_events      0.6.3
jupyter_server      2.4.0
jupyterlab_server   2.20.0
kiwisolver          1.0.1
markupsafe          2.1.2
matplotlib          3.1.2
modeller            10.4
more_itertools      NA
mpl_toolkits        NA
nbformat            5.7.3
numexpr             2.8.4
numpy               1.24.2
ost                 2.3.1
packaging           20.3
pandas              1.5.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.1.1
prometheus_client   NA
promod3             3.2.1
prompt_toolkit      3.0.38
psutil              5.5.1
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.14.0
pyparsing           2.4.6
pyrsistent          NA
pythonjsonlogger    NA
pytz                2022.7.1
qmean               NA
requests            2.28.2
rfc3339_validator   0.1.4
rfc3986_validator   0.1.1
send2trash          NA
sitecustomize       NA
six                 1.12.0
sniffio             1.3.0
stack_data          0.6.2
swig_runtime_data4  NA
tornado             6.2
traitlets           5.9.0
urllib3             1.26.15
wcwidth             NA
websocket           1.5.1
yaml                6.0
zipp                NA
zmq                 25.0.1
-----
IPython             8.11.0
jupyter_client      8.0.3
jupyter_core        5.2.0
jupyterlab          3.6.1
notebook            6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:35