Tutorial 4: Extending homelette
[1]:
import homelette as hm
import contextlib
import glob
import os.path
import sys
from modeller import environ, Selection
from modeller.automodel import LoopModel
Introduction
Welcome to the forth tutorial on homelette
. In this tutorial, we will discuss how to implement custom building blocks, either for generating or for evaluating models. These custom building blocks can be integrated in homology modelling pipelines.
This is probably the most important tutorial in the series. After this tutorial, you will be able to implement your own routines into the homelette
framework, which gives you complete control over the homology modelling pipelines you want to establish!
Please note that we encourage users to share custom routines and evaluation metrics if they think they might be useful for the community. In our online documentation, there is a dedicated section for these contributions. If you are interested, please contact us on GitHub or via email.
Alignment
For this tutorial, we are using the same alignment as in Tutorial 1. Identical to Tutorial 1, the alignment is imported and annotated and a Task
object is created.
[2]:
# read in the alignment
aln = hm.Alignment('data/single/aln_1.fasta_aln')
# annotate the alignment
aln.get_sequence('ARAF').annotate(
seq_type = 'sequence')
aln.get_sequence('3NY5').annotate(
seq_type = 'structure',
pdb_code = '3NY5',
begin_res = '1',
begin_chain = 'A',
end_res = '81',
end_chain = 'A')
# initialize task object
t = hm.Task(
task_name = 'Tutorial4',
target = 'ARAF',
alignment = aln,
overwrite = True)
Defining custom routines
As an example for a custom routine, we will implement a LoopModel
class from modeller
[1,2] loosely following this tutorial on the modeller
web page (in the section Loop Refining).
[3]:
class Routine_loopmodel(hm.routines.Routine): # (1)
'''
Custom routine for modeller loop modelling.
'''
def __init__(self, alignment, target, templates, tag, n_models=1, n_loop_models=1): # (2)
hm.routines.Routine.__init__(self, alignment, target, templates, tag)
self.routine = 'loopmodel' # string identifier of routine
self.n_models = n_models
self.n_loop_models = n_loop_models
def generate_models(self): # (3)
# (4) process alignment
self.alignment.select_sequences([self.target] + self.templates)
self.alignment.remove_redundant_gaps()
# write alignemnt to temporary file
self.alignment.write_pir('.tmp.pir')
# (5) define custom loop model class
class MyLoop(LoopModel):
# set residues that will be refined by loop modelling
def select_loop_atoms(self):
return Selection(self.residue_range('18:A', '22:A'))
with contextlib.redirect_stdout(None): # (6) suppress modeller output to stdout
# (7) set up modeller environment
env = environ()
env.io.hetatm = True
# initialize model
m = MyLoop(env,
alnfile='.tmp.pir',
knowns=self.templates,
sequence=self.target)
# set modelling parameters
m.blank_single_chain = False
m.starting_model = 1
m.ending_model = self.n_models
m.loop.starting_model = 1
m.loop.ending_model = self.n_loop_models
# make models
m.make()
# (8) capture output
for pdb in glob.glob('{}.BL*.pdb'.format(self.target)):
self.models.append(
hm.Model(os.path.realpath(os.path.expanduser(pdb)),
self.tag, self.routine))
# (9) rename files with method from hm.routines.Routine
self._rename_models()
# (10) clean up
self._remove_files(
'{}.B99*.pdb'.format(self.target),
'{}.D00*'.format(self.target),
'{}.DL*'.format(self.target),
'{}.IL*'.format(self.target),
'{}.ini'.format(self.target),
'{}.lrsr'.format(self.target),
'{}.rsr'.format(self.target),
'{}.sch'.format(self.target),
'.tmp*')
The lines of code in the definition of the custom routine above that are marked with numbers get special comments here:
Our custom routine in this example inherits from a parent class
Routine
defined inhomelette
. This is not strictly necessary, however, the parent class has a few useful functions already implemented that we will make use of (see steps 2, 9, 10)Every routine needs to accept these arguments:
alignment
,target
,templates
,tag
. In our case, we just hand them through to the parent methodRoutine.__init__
that saves them as attributes, as well as introduces the attributeself.models
where models will be deposited after generation.Every routine needs a
generate_models
method. Usually, functionality for, you guessed it, model generation is packed in there.modeller
requires the aligment as a file in PIR format. The following few lines of code format the alignment and then produce the required file.The following lines follow closely the
modeller
tutorial for loop modelling. This part implements a customLoopModel
class that defines a specific set of residue to be considered for loop modelling.modeller
writes a lot of output to stdout, and usingcontextlib
is a way to suppress this output. If you want to see all the output frommodeller
, either delete thewith
statement or writewith contextlib.redirect_stdout(sys.stdout):
instead.The following lines follow closely the
modeller
tutorial for loop modelling. This part initializes the model and generates the models requested.The final models generated will be called
ARAF.BL00010001.pdb
and so on. These lines of code find these PDB files and add them to theRoutine_loopmodel.models
list asModel
s. After execution by aTask
objects,Model
objects in this list will be added to theTask.models
list.Models generated will be renamed according to the tag given using the parent class method
Routine._rename_models
.Temporary files from modeller as well as the temporary alignment file are removed from the folder using the parent class method
Routine._remove_files
.
Now, after implementing the routine, let’s try it out in practice. As explained in Tutorial 2, we will be using the Task.execute_routine
interface for that:
[4]:
# perform modelling
t.execute_routine(
tag = 'custom_loop',
routine = Routine_loopmodel,
templates = ['3NY5'],
template_location = './data/single',
n_models = 2,
n_loop_models = 2)
[5]:
# check generated models
t.models
[5]:
[<homelette.organization.Model at 0x7f211ff3fa30>,
<homelette.organization.Model at 0x7f211ff54a30>,
<homelette.organization.Model at 0x7f211ff54d90>,
<homelette.organization.Model at 0x7f211ff54e20>]
In practice, a valid routine only needs to adhere to a small number of formal criteria to fit in the homelette
framework:
It needs to be an object.
It needs to have an
__init__
method that can handle the named argumentsalignment
,target
,templates
andtag
.It needs a
generate_models
method.It needs an attribute
models
in which generated models are stored asModel
objects in list.
Any object that satisfies these criteria can be used in the framework.
Defining custom evaluations
As an example for a custom evaluation, we will implement a sample evaluation that counts the number of residues in the models.
[6]:
class Evaluation_countresidues():
'''
Custom evaluation: counting CA atoms
'''
def __init__(self, model, quiet=True): # (1)
self.model = model
self.output = dict()
# (2) perform evaluation
self.evaluate()
# (3) update model.evaluation
self.model.evaluation.update(self.output)
def evaluate(self): # (4)
# (5) parse model pdb
pdb = self.model.parse_pdb()
# count number of CA atoms in PDB
n_residues = pdb['name'].eq('CA').sum()
# append to output
self.output['n_residues'] = n_residues
The lines of code marked with numbers in the definiton of the custom evaluation get special comments here:
The
__init__
function takes exactly 2 arguments:model
andquiet
.quiet
is a boolean value indicating whether output to stdout should be suppressed (not applicable in this case).All evaluation metrics are executed upon initialization.
The
custom_evaluation.output
dictionary is merged with theModel.evaluation
dictionary to make the output of our evaluation metrics available to the model.Here we define the function where the actual evaluation takes place.
For the actual evaluation, we make use of the
Model.parse_pdb
method, which parses the PDB file associated to a specific model object to apandas
data frame. This can be useful for a number of evaluations (access residues, coordinates, etc.)
Note
If more arguments are required for a custom evaluation, we recomment to store them as attributes in the Model
objects and then access these attributes while running the evaluation.
Now we apply our custom evaluation to our previously generated models using the Task.evaluate_models
interface (for more details, see Tutorial 3):
[7]:
t.evaluate_models(Evaluation_countresidues)
t.get_evaluation()
[7]:
model | tag | routine | n_residues | |
---|---|---|---|---|
0 | custom_loop_1.pdb | custom_loop | loopmodel | 73 |
1 | custom_loop_2.pdb | custom_loop | loopmodel | 73 |
2 | custom_loop_3.pdb | custom_loop | loopmodel | 73 |
3 | custom_loop_4.pdb | custom_loop | loopmodel | 73 |
In practice, the formal requirements for a custom evaluation are the following:
It has to be an object.
__init__
has the two argumentsmodel
andquiet
. More arguments would work in conjunction withTask.evaluate_models
only if defaults are set and used. We recommend storing more arguments as attributes in theModel
object and then accessing them during the evaluation.It executes evaluation on initialization.
On finishing the evaluation, it updates the
Model.evaluation
dictionary with the results of the evaluation.
Further reading
Congratulations on finishing the tutorial on extending homelette
.
Please take again notice that on our online documentation, there is a page collecting user-submitted custom routines and evaluation metrics. User are encouraged to share if they implemented something which they might think could be useful for the community. If you are interested, please contact us on GitHub or via email.
There are more tutorials which might interest you:
Tutorial 1: Learn about the basics of
homelette
.Tutorial 2: Learn more about already implemented routines for homology modelling.
Tutorial 3: Learn about the evaluation metrics available with
homelette
.Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.
Tutorial 6: Learn about modelling protein complexes.
Tutorial 7: Learn about assembling custom pipelines.
Tutorial 8: Learn about automated template identification, alignment generation and template processing.
References
[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626
[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3
Session Info
[8]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette 1.4
modeller 10.4
pandas 1.5.3
session_info 1.0.0
-----
PIL 7.0.0
altmod NA
anyio NA
asttokens NA
attr 19.3.0
babel 2.12.1
backcall 0.2.0
certifi 2022.12.07
chardet 3.0.4
charset_normalizer 3.1.0
comm 0.1.2
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.6
decorator 4.4.2
executing 1.2.0
fastjsonschema NA
idna 3.4
importlib_metadata NA
importlib_resources NA
ipykernel 6.21.3
ipython_genutils 0.2.0
jedi 0.18.2
jinja2 3.1.2
json5 NA
jsonschema 4.17.3
jupyter_events 0.6.3
jupyter_server 2.4.0
jupyterlab_server 2.20.0
kiwisolver 1.0.1
markupsafe 2.1.2
matplotlib 3.1.2
more_itertools NA
mpl_toolkits NA
nbformat 5.7.3
numexpr 2.8.4
numpy 1.24.2
ost 2.3.1
packaging 20.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.1.1
prometheus_client NA
promod3 3.2.1
prompt_toolkit 3.0.38
psutil 5.5.1
ptyprocess 0.7.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.14.0
pyparsing 2.4.6
pyrsistent NA
pythonjsonlogger NA
pytz 2022.7.1
qmean NA
requests 2.28.2
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
send2trash NA
sitecustomize NA
six 1.12.0
sniffio 1.3.0
stack_data 0.6.2
swig_runtime_data4 NA
tornado 6.2
traitlets 5.9.0
urllib3 1.26.15
wcwidth NA
websocket 1.5.1
yaml 6.0
zipp NA
zmq 25.0.1
-----
IPython 8.11.0
jupyter_client 8.0.3
jupyter_core 5.2.0
jupyterlab 3.6.1
notebook 6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:36