Tutorial 4: Extending homelette

[1]:

import homelette as hm

import contextlib
import glob
import os.path
import sys

from modeller import environ, Selection
from modeller.automodel import LoopModel

Introduction

Welcome to the forth tutorial on homelette. In this tutorial, we will discuss how to implement custom building blocks, either for generating or for evaluating models. These custom building blocks can be integrated in homology modelling pipelines.

This is probably the most important tutorial in the series. After this tutorial, you will be able to implement your own routines into the homelette framework, which gives you complete control over the homology modelling pipelines you want to establish!

Please note that we encourage users to share custom routines and evaluation metrics if they think they might be useful for the community. In our online documentation, there is a dedicated section for these contributions. If you are interested, please contact us on GitHub or via email.

Alignment

For this tutorial, we are using the same alignment as in Tutorial 1. Identical to Tutorial 1, the alignment is imported and annotated and a Task object is created.

[2]:

# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')

# annotate the alignment
aln.get_sequence('ARAF').annotate(
    seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(
    seq_type = 'structure',
    pdb_code = '3NY5',
    begin_res = '1',
    begin_chain = 'A',
    end_res = '81',
    end_chain = 'A')

# initialize task object
t = hm.Task(
    task_name = 'Tutorial4',
    target = 'ARAF',
    alignment = aln,
    overwrite = True)

Defining custom routines

As an example for a custom routine, we will implement a LoopModel class from modeller [1,2] loosely following this tutorial on the modeller web page (in the section Loop Refining).

[3]:

class Routine_loopmodel(hm.routines.Routine): # (1)
    '''
    Custom routine for modeller loop modelling.
    '''
    def __init__(self, alignment, target, templates, tag, n_models=1, n_loop_models=1): # (2)
        hm.routines.Routine.__init__(self, alignment, target, templates, tag)
        self.routine = 'loopmodel' # string identifier of routine

        self.n_models = n_models
        self.n_loop_models = n_loop_models

    def generate_models(self): # (3)
        # (4) process alignment
        self.alignment.select_sequences([self.target] + self.templates)
        self.alignment.remove_redundant_gaps()
        # write alignemnt to temporary file
        self.alignment.write_pir('.tmp.pir')

        # (5) define custom loop model class
        class MyLoop(LoopModel):
            # set residues that will be refined by loop modelling
            def select_loop_atoms(self):
                return Selection(self.residue_range('18:A', '22:A'))

        with contextlib.redirect_stdout(None): # (6) suppress modeller output to stdout
            # (7) set up modeller environment
            env = environ()
            env.io.hetatm = True

            # initialize model
            m = MyLoop(env,
                       alnfile='.tmp.pir',
                       knowns=self.templates,
                       sequence=self.target)

            # set modelling parameters
            m.blank_single_chain = False
            m.starting_model = 1
            m.ending_model = self.n_models
            m.loop.starting_model = 1
            m.loop.ending_model = self.n_loop_models

            # make models
            m.make()

        # (8) capture output
        for pdb in glob.glob('{}.BL*.pdb'.format(self.target)):
            self.models.append(
                hm.Model(os.path.realpath(os.path.expanduser(pdb)),
                         self.tag, self.routine))

        # (9) rename files with method from hm.routines.Routine
        self._rename_models()

        # (10) clean up
        self._remove_files(
            '{}.B99*.pdb'.format(self.target),
            '{}.D00*'.format(self.target),
            '{}.DL*'.format(self.target),
            '{}.IL*'.format(self.target),
            '{}.ini'.format(self.target),
            '{}.lrsr'.format(self.target),
            '{}.rsr'.format(self.target),
            '{}.sch'.format(self.target),
            '.tmp*')

The lines of code in the definition of the custom routine above that are marked with numbers get special comments here:

Our custom routine in this example inherits from a parent class Routine defined in homelette. This is not strictly necessary, however, the parent class has a few useful functions already implemented that we will make use of (see steps 2, 9, 10)
Every routine needs to accept these arguments: alignment, target, templates, tag. In our case, we just hand them through to the parent method Routine.__init__ that saves them as attributes, as well as introduces the attribute self.models where models will be deposited after generation.
Every routine needs a generate_models method. Usually, functionality for, you guessed it, model generation is packed in there.
modeller requires the aligment as a file in PIR format. The following few lines of code format the alignment and then produce the required file.
The following lines follow closely the modeller tutorial for loop modelling. This part implements a custom LoopModel class that defines a specific set of residue to be considered for loop modelling.
modeller writes a lot of output to stdout, and using contextlib is a way to suppress this output. If you want to see all the output from modeller, either delete the with statement or write with contextlib.redirect_stdout(sys.stdout): instead.
The following lines follow closely the modeller tutorial for loop modelling. This part initializes the model and generates the models requested.
The final models generated will be called ARAF.BL00010001.pdb and so on. These lines of code find these PDB files and add them to the Routine_loopmodel.models list as Models. After execution by a Task objects, Model objects in this list will be added to the Task.models list.
Models generated will be renamed according to the tag given using the parent class method Routine._rename_models.
Temporary files from modeller as well as the temporary alignment file are removed from the folder using the parent class method Routine._remove_files.

Now, after implementing the routine, let’s try it out in practice. As explained in Tutorial 2, we will be using the Task.execute_routine interface for that:

[4]:

# perform modelling
t.execute_routine(
    tag = 'custom_loop',
    routine = Routine_loopmodel,
    templates = ['3NY5'],
    template_location = './data/single',
    n_models = 2,
    n_loop_models = 2)

[5]:

# check generated models
t.models

[5]:

[<homelette.organization.Model at 0x7f211ff3fa30>,
 <homelette.organization.Model at 0x7f211ff54a30>,
 <homelette.organization.Model at 0x7f211ff54d90>,
 <homelette.organization.Model at 0x7f211ff54e20>]

In practice, a valid routine only needs to adhere to a small number of formal criteria to fit in the homelette framework:

It needs to be an object.
It needs to have an __init__ method that can handle the named arguments alignment, target, templates and tag.
It needs a generate_models method.
It needs an attribute models in which generated models are stored as Model objects in list.

Any object that satisfies these criteria can be used in the framework.

Defining custom evaluations

As an example for a custom evaluation, we will implement a sample evaluation that counts the number of residues in the models.

[6]:

class Evaluation_countresidues():
    '''
    Custom evaluation: counting CA atoms
    '''
    def __init__(self, model, quiet=True): # (1)
        self.model = model
        self.output = dict()
        # (2) perform evaluation
        self.evaluate()
        # (3) update model.evaluation
        self.model.evaluation.update(self.output)

    def evaluate(self): # (4)
        # (5) parse model pdb
        pdb = self.model.parse_pdb()

        # count number of CA atoms in PDB
        n_residues = pdb['name'].eq('CA').sum()

        # append to output
        self.output['n_residues'] = n_residues

The lines of code marked with numbers in the definiton of the custom evaluation get special comments here:

The __init__ function takes exactly 2 arguments: model and quiet. quiet is a boolean value indicating whether output to stdout should be suppressed (not applicable in this case).
All evaluation metrics are executed upon initialization.
The custom_evaluation.output dictionary is merged with the Model.evaluation dictionary to make the output of our evaluation metrics available to the model.
Here we define the function where the actual evaluation takes place.
For the actual evaluation, we make use of the Model.parse_pdb method, which parses the PDB file associated to a specific model object to a pandas data frame. This can be useful for a number of evaluations (access residues, coordinates, etc.)

Note

If more arguments are required for a custom evaluation, we recomment to store them as attributes in the Model objects and then access these attributes while running the evaluation.

Now we apply our custom evaluation to our previously generated models using the Task.evaluate_models interface (for more details, see Tutorial 3):

[7]:

t.evaluate_models(Evaluation_countresidues)
t.get_evaluation()

[7]:

	model	tag	routine	n_residues
0	custom_loop_1.pdb	custom_loop	loopmodel	73
1	custom_loop_2.pdb	custom_loop	loopmodel	73
2	custom_loop_3.pdb	custom_loop	loopmodel	73
3	custom_loop_4.pdb	custom_loop	loopmodel	73

In practice, the formal requirements for a custom evaluation are the following:

It has to be an object.
__init__ has the two arguments model and quiet. More arguments would work in conjunction with Task.evaluate_models only if defaults are set and used. We recommend storing more arguments as attributes in the Model object and then accessing them during the evaluation.
It executes evaluation on initialization.
On finishing the evaluation, it updates the Model.evaluation dictionary with the results of the evaluation.

References

[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626

[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3

Session Info

[8]:

# session info
import session_info
session_info.show(html = False, dependencies = True)

-----
homelette           1.4
modeller            10.4
pandas              1.5.3
session_info        1.0.0
-----
PIL                 7.0.0
altmod              NA
anyio               NA
asttokens           NA
attr                19.3.0
babel               2.12.1
backcall            0.2.0
certifi             2022.12.07
chardet             3.0.4
charset_normalizer  3.1.0
comm                0.1.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.6
decorator           4.4.2
executing           1.2.0
fastjsonschema      NA
idna                3.4
importlib_metadata  NA
importlib_resources NA
ipykernel           6.21.3
ipython_genutils    0.2.0
jedi                0.18.2
jinja2              3.1.2
json5               NA
jsonschema          4.17.3
jupyter_events      0.6.3
jupyter_server      2.4.0
jupyterlab_server   2.20.0
kiwisolver          1.0.1
markupsafe          2.1.2
matplotlib          3.1.2
more_itertools      NA
mpl_toolkits        NA
nbformat            5.7.3
numexpr             2.8.4
numpy               1.24.2
ost                 2.3.1
packaging           20.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.1.1
prometheus_client   NA
promod3             3.2.1
prompt_toolkit      3.0.38
psutil              5.5.1
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.14.0
pyparsing           2.4.6
pyrsistent          NA
pythonjsonlogger    NA
pytz                2022.7.1
qmean               NA
requests            2.28.2
rfc3339_validator   0.1.4
rfc3986_validator   0.1.1
send2trash          NA
sitecustomize       NA
six                 1.12.0
sniffio             1.3.0
stack_data          0.6.2
swig_runtime_data4  NA
tornado             6.2
traitlets           5.9.0
urllib3             1.26.15
wcwidth             NA
websocket           1.5.1
yaml                6.0
zipp                NA
zmq                 25.0.1
-----
IPython             8.11.0
jupyter_client      8.0.3
jupyter_core        5.2.0
jupyterlab          3.6.1
notebook            6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:36