Tutorial 6: Complex Modelling
[1]:
import homelette as hm
Introduction
Welcome to the 6th tutorial on homelette
about homology modelling of complex structures.
There are multiple issues about modelling protein complexes that make it a separate topic from the homology modelling of single structures:
Usually, a complex structure is required as a template.
Not all modelling programs can perform complex modelling.
Not all evaluation metrics developed for homology modelling are applicable to complex structures.
You need multiple alignments.
homelette
is able to use modeller
based modelling routines for complex modelling [1,2], and has some specific classes in place that make complex modelling easier to the user: - A function to assemble appropriate complex alignments - Special modelling classes for complex modelling - Special evaluation metrics for complex modelling
For this tutorial, we will build models for ARAF in complex with HRAS. As a template, we will use the structures [4G0N] (https://www.rcsb.org/structure/4G0N)(RAF1 in complex with HRAS) and 3NY5 (BRAF).
Alignment
Since all current modelling routines for protein complexes are modeller
based, an alignment according to the modeller
specification has to be constructed. homelette
has the helper function assemble_complex_aln
in the homelette.alignment
submodule that is able to do that:
[2]:
?hm.alignment.assemble_complex_aln
Signature:
hm.alignment.assemble_complex_aln(
*args: Type[ForwardRef('Alignment')],
names: dict,
) -> Type[ForwardRef('Alignment')]
Docstring:
Assemble complex alignments compatible with MODELLER from individual
alignments.
Parameters
----------
*args : Alignment
The input alignments
names : dict
Dictionary instructing how sequences in the different alignment objects
are supposed to be arranged in the complex alignment. The keys are the
names of the sequences in the output alignments. The values are
iterables of the sequence names from the input alignments in the order
they are supposed to appaer in the output alignment. Any value that can
not be found in the alignment signals that this position in the complex
alignment should be filled with gaps.
Returns
-------
Alignment
Assembled complex alignment
Examples
--------
>>> aln1 = hm.Alignment(None)
>>> aln1.sequences = {
... 'seq1_1': hm.alignment.Sequence('seq1_1', 'HELLO'),
... 'seq2_1': hm.alignment.Sequence('seq2_1', 'H---I'),
... 'seq3_1': hm.alignment.Sequence('seq3_1', '-HI--')
... }
>>> aln2 = hm.Alignment(None)
>>> aln2.sequences = {
... 'seq2_2': hm.alignment.Sequence('seq2_2', 'KITTY'),
... 'seq1_2': hm.alignment.Sequence('seq1_2', 'WORLD')
... }
>>> names = {'seq1': ('seq1_1', 'seq1_2'),
... 'seq2': ('seq2_1', 'seq2_2'),
... 'seq3': ('seq3_1', 'gaps')
... }
>>> aln_assembled = hm.alignment.assemble_complex_aln(
... aln1, aln2, names=names)
>>> aln_assembled.print_clustal()
seq1 HELLO/WORLD
seq2 H---I/KITTY
seq3 -HI--/-----
File: /usr/local/src/homelette-1.4/homelette/alignment.py
Type: function
In our case, we assemble an alignment from two different alignments, aln_1
which contains ARAF, RAF1 (4G0N) and BRAF (3NY5) and aln_2
which contains an HRAS sequence and the HRAS sequence from 4G0N.
[3]:
# import single alignments
aln1_file = 'data/complex/aln_eff.fasta_aln'
aln2_file = 'data/complex/aln_ras.fasta_aln'
aln_1 = hm.Alignment(aln1_file)
aln_2 = hm.Alignment(aln2_file)
# build dictionary that indicates how sequences should be assembled
names = {
'ARAF': ('ARAF', 'HRAS'),
'4G0N': ('4G0N', '4G0N'),
'3NY5': ('3NY5', ''),
}
# assemble alignment
aln = hm.alignment.assemble_complex_aln(aln_1, aln_2, names=names)
aln.remove_redundant_gaps()
aln.print_clustal(line_wrap=70)
ARAF ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLI---KGRKTVTAWDTAIAPLD
4G0N -TSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLI
3NY5 HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ------KKPIGWDTDISWLT
ARAF GEELIVEVL------/MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLD
4G0N GEELQVDFL------/MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLD
3NY5 GEELHVEVLENVPLT/------------------------------------------------------
ARAF ILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAART
4G0N ILDTAGQEE--AMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAART
3NY5 ----------------------------------------------------------------------
ARAF VESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQ-
4G0N VESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH
3NY5 ------------------------------------------
After assembling the complex alignment, we annotate it as usual:
[4]:
# annotate alignment
aln.get_sequence('ARAF').annotate(seq_type='sequence')
aln.get_sequence('4G0N').annotate(seq_type = 'structure',
pdb_code = '4G0N',
begin_res = '1',
begin_chain = 'A')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
pdb_code = '3NY5',
begin_res = '1',
begin_chain = 'A')
Modelling
There are 4 routines available specifically for complex modelling based on modeller
[1,2] and altmod
[3]. They run with the same parameters as their counterparts for single structure modelling, except that they handle naming of new chains and residue numbers a bit differently.
The following routines are available for complex modelling:
Routine_complex_automodel_default
Routine_complex_automodel_slow
Routine_complex_altmod_default
Routine_complex_altmod_slow
[5]:
# initialize task object
t = hm.Task(task_name='Tutorial6',
alignment=aln,
target='ARAF',
overwrite=True)
Modelling can be performed with Task.execute_routine
as usual.
[6]:
# generate models based on a complex template
t.execute_routine(tag='automodel_' + '4G0N',
routine=hm.routines.Routine_complex_automodel_default,
templates = ['4G0N'],
template_location='data/complex/',
n_models=20,
n_threads=5)
Not all templates have to be complex templates, it is perfectly applicable to mix complex templates and single templates. However, at least one complex template should be used in order to convey information about the orientation of the proteins to each other.
[7]:
# generate models based on a complex and a single template
t.execute_routine(tag='automodel_' + '_'.join(['4G0N', '3NY5']),
routine=hm.routines.Routine_complex_automodel_default,
templates = ['4G0N', '3NY5'],
template_location='data/complex',
n_models=20,
n_threads=5)
Evaluation
Not all evaluation metrics are designed to evaluate complex structures. For example, the SOAP score has different statistical potentials for single proteins (Evaluation_soap_protein
) and for protein complexes (Evaluation_soap_pp
) [4].
[8]:
# perform evaluation
t.evaluate_models(hm.evaluation.Evaluation_mol_probity,
hm.evaluation.Evaluation_soap_pp,
n_threads=5)
[9]:
# show a bit of the evaluation
t.get_evaluation().sort_values(by='soap_pp_all').head()
[9]:
model | tag | routine | mp_score | soap_pp_all | soap_pp_atom | soap_pp_pair | |
---|---|---|---|---|---|---|---|
32 | automodel_4G0N_3NY5_13.pdb | automodel_4G0N_3NY5 | complex_automodel_default | 2.25 | -9502.636719 | -7770.577637 | -1732.059326 |
39 | automodel_4G0N_3NY5_20.pdb | automodel_4G0N_3NY5 | complex_automodel_default | 2.15 | -9486.243164 | -7656.946777 | -1829.296143 |
28 | automodel_4G0N_3NY5_9.pdb | automodel_4G0N_3NY5 | complex_automodel_default | 2.46 | -9475.368164 | -7769.337891 | -1706.030396 |
29 | automodel_4G0N_3NY5_10.pdb | automodel_4G0N_3NY5 | complex_automodel_default | 2.72 | -9458.609375 | -7647.797852 | -1810.811646 |
9 | automodel_4G0N_10.pdb | automodel_4G0N | complex_automodel_default | 2.39 | -9405.662109 | -7718.845215 | -1686.817139 |
Further reading
Congratulation on finishing the tutorial about complex modelling in homelette
. The following tutorials might also be of interest to you:
Tutorial 1: Learn about the basics of
homelette
.Tutorial 2: Learn more about already implemented routines for homology modelling.
Tutorial 3: Learn about the evaluation metrics available with
homelette
.Tutorial 4: Learn about extending
homelette
’s functionality by defining your own modelling routines and evaluation metrics.Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.
Tutorial 7: Learn about assembling custom pipelines.
Tutorial 8: Learn about automated template identification, alignment generation and template processing.
References
[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626
[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3
[3] Janson, G., Grottesi, A., Pietrosanto, M., Ausiello, G., Guarguaglini, G., & Paiardini, A. (2019). Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling. PLoS Computational Biology, 15(12), e1007219. https://doi.org/10.1371/journal.pcbi.1007219
[4] Dong, G. Q., Fan, H., Schneidman-Duhovny, D., Webb, B., Sali, A., & Tramontano, A. (2013). Optimized atomic statistical potentials: Assessment of protein interfaces and loops. Bioinformatics, 29(24), 3158–3166. https://doi.org/10.1093/bioinformatics/btt560
Session Info
[10]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette 1.4
pandas 1.5.3
session_info 1.0.0
-----
PIL 7.0.0
altmod NA
anyio NA
asttokens NA
attr 19.3.0
babel 2.12.1
backcall 0.2.0
certifi 2022.12.07
chardet 3.0.4
charset_normalizer 3.1.0
comm 0.1.2
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.6
decorator 4.4.2
executing 1.2.0
fastjsonschema NA
idna 3.4
importlib_metadata NA
importlib_resources NA
ipykernel 6.21.3
ipython_genutils 0.2.0
jedi 0.18.2
jinja2 3.1.2
json5 NA
jsonschema 4.17.3
jupyter_events 0.6.3
jupyter_server 2.4.0
jupyterlab_server 2.20.0
kiwisolver 1.0.1
markupsafe 2.1.2
matplotlib 3.1.2
modeller 10.4
more_itertools NA
mpl_toolkits NA
nbformat 5.7.3
numexpr 2.8.4
numpy 1.24.2
ost 2.3.1
packaging 20.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.1.1
prometheus_client NA
promod3 3.2.1
prompt_toolkit 3.0.38
psutil 5.5.1
ptyprocess 0.7.0
pure_eval 0.2.2
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.14.0
pyparsing 2.4.6
pyrsistent NA
pythonjsonlogger NA
pytz 2022.7.1
qmean NA
requests 2.28.2
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
send2trash NA
sitecustomize NA
six 1.12.0
sniffio 1.3.0
stack_data 0.6.2
swig_runtime_data4 NA
tornado 6.2
traitlets 5.9.0
urllib3 1.26.15
wcwidth NA
websocket 1.5.1
yaml 6.0
zipp NA
zmq 25.0.1
-----
IPython 8.11.0
jupyter_client 8.0.3
jupyter_core 5.2.0
jupyterlab 3.6.1
notebook 6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:40