Tutorial 6: Complex Modelling

[1]:
import homelette as hm

Introduction

Welcome to the 6th tutorial on homelette about homology modelling of complex structures.

There are multiple issues about modelling protein complexes that make it a separate topic from the homology modelling of single structures:

  • Usually, a complex structure is required as a template.

  • Not all modelling programs can perform complex modelling.

  • Not all evaluation metrics developed for homology modelling are applicable to complex structures.

  • You need multiple alignments.

homelette is able to use modeller based modelling routines for complex modelling [1,2], and has some specific classes in place that make complex modelling easier to the user: - A function to assemble appropriate complex alignments - Special modelling classes for complex modelling - Special evaluation metrics for complex modelling

For this tutorial, we will build models for ARAF in complex with HRAS. As a template, we will use the structures [4G0N] (https://www.rcsb.org/structure/4G0N)(RAF1 in complex with HRAS) and 3NY5 (BRAF).

Alignment

Since all current modelling routines for protein complexes are modeller based, an alignment according to the modeller specification has to be constructed. homelette has the helper function assemble_complex_aln in the homelette.alignment submodule that is able to do that:

[2]:
?hm.alignment.assemble_complex_aln
Signature:
hm.alignment.assemble_complex_aln(
    *args: Type[ForwardRef('Alignment')],
    names: dict,
) -> Type[ForwardRef('Alignment')]
Docstring:
Assemble complex alignments compatible with MODELLER from individual
alignments.

Parameters
----------
*args : Alignment
    The input alignments
names : dict
    Dictionary instructing how sequences in the different alignment objects
    are supposed to be arranged in the complex alignment. The keys are the
    names of the sequences in the output alignments. The values are
    iterables of the sequence names from the input alignments in the order
    they are supposed to appaer in the output alignment. Any value that can
    not be found in the alignment signals that this position in the complex
    alignment should be filled with gaps.

Returns
-------
Alignment
    Assembled complex alignment

Examples
--------
>>> aln1 = hm.Alignment(None)
>>> aln1.sequences = {
...     'seq1_1': hm.alignment.Sequence('seq1_1', 'HELLO'),
...     'seq2_1': hm.alignment.Sequence('seq2_1', 'H---I'),
...     'seq3_1': hm.alignment.Sequence('seq3_1', '-HI--')
...     }
>>> aln2 = hm.Alignment(None)
>>> aln2.sequences = {
...     'seq2_2': hm.alignment.Sequence('seq2_2', 'KITTY'),
...     'seq1_2': hm.alignment.Sequence('seq1_2', 'WORLD')
...     }
>>> names = {'seq1': ('seq1_1', 'seq1_2'),
...          'seq2': ('seq2_1', 'seq2_2'),
...          'seq3': ('seq3_1', 'gaps')
...     }
>>> aln_assembled = hm.alignment.assemble_complex_aln(
...     aln1, aln2, names=names)
>>> aln_assembled.print_clustal()
seq1        HELLO/WORLD
seq2        H---I/KITTY
seq3        -HI--/-----
File:      /usr/local/src/homelette-1.4/homelette/alignment.py
Type:      function

In our case, we assemble an alignment from two different alignments, aln_1 which contains ARAF, RAF1 (4G0N) and BRAF (3NY5) and aln_2 which contains an HRAS sequence and the HRAS sequence from 4G0N.

[3]:
# import single alignments
aln1_file = 'data/complex/aln_eff.fasta_aln'
aln2_file = 'data/complex/aln_ras.fasta_aln'

aln_1 = hm.Alignment(aln1_file)
aln_2 = hm.Alignment(aln2_file)

# build dictionary that indicates how sequences should be assembled
names = {
    'ARAF': ('ARAF', 'HRAS'),
    '4G0N': ('4G0N', '4G0N'),
    '3NY5': ('3NY5', ''),
}

# assemble alignment
aln = hm.alignment.assemble_complex_aln(aln_1, aln_2, names=names)
aln.remove_redundant_gaps()
aln.print_clustal(line_wrap=70)
ARAF        ---GTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLI---KGRKTVTAWDTAIAPLD
4G0N        -TSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLI
3NY5        HQKPIVRVFLPNKQRTVVPARCGVTVRDSLKKAL--RGLIPECCAVYRIQ------KKPIGWDTDISWLT


ARAF        GEELIVEVL------/MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLD
4G0N        GEELQVDFL------/MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLD
3NY5        GEELHVEVLENVPLT/------------------------------------------------------


ARAF        ILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAART
4G0N        ILDTAGQEE--AMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAART
3NY5        ----------------------------------------------------------------------


ARAF        VESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQ-
4G0N        VESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH
3NY5        ------------------------------------------


After assembling the complex alignment, we annotate it as usual:

[4]:
# annotate alignment
aln.get_sequence('ARAF').annotate(seq_type='sequence')
aln.get_sequence('4G0N').annotate(seq_type = 'structure',
                              pdb_code = '4G0N',
                              begin_res = '1',
                              begin_chain = 'A')
aln.get_sequence('3NY5').annotate(seq_type = 'structure',
                              pdb_code = '3NY5',
                              begin_res = '1',
                              begin_chain = 'A')

Modelling

There are 4 routines available specifically for complex modelling based on modeller [1,2] and altmod [3]. They run with the same parameters as their counterparts for single structure modelling, except that they handle naming of new chains and residue numbers a bit differently.

The following routines are available for complex modelling:

  • Routine_complex_automodel_default

  • Routine_complex_automodel_slow

  • Routine_complex_altmod_default

  • Routine_complex_altmod_slow

[5]:
# initialize task object
t = hm.Task(task_name='Tutorial6',
            alignment=aln,
            target='ARAF',
            overwrite=True)

Modelling can be performed with Task.execute_routine as usual.

[6]:
# generate models based on a complex template
t.execute_routine(tag='automodel_' + '4G0N',
                  routine=hm.routines.Routine_complex_automodel_default,
                  templates = ['4G0N'],
                  template_location='data/complex/',
                  n_models=20,
                  n_threads=5)

Not all templates have to be complex templates, it is perfectly applicable to mix complex templates and single templates. However, at least one complex template should be used in order to convey information about the orientation of the proteins to each other.

[7]:
# generate models based on a complex and a single template
t.execute_routine(tag='automodel_' + '_'.join(['4G0N', '3NY5']),
                  routine=hm.routines.Routine_complex_automodel_default,
                  templates = ['4G0N', '3NY5'],
                  template_location='data/complex',
                  n_models=20,
                  n_threads=5)

Evaluation

Not all evaluation metrics are designed to evaluate complex structures. For example, the SOAP score has different statistical potentials for single proteins (Evaluation_soap_protein) and for protein complexes (Evaluation_soap_pp) [4].

[8]:
# perform evaluation
t.evaluate_models(hm.evaluation.Evaluation_mol_probity,
                  hm.evaluation.Evaluation_soap_pp,
                  n_threads=5)
[9]:
# show a bit of the evaluation
t.get_evaluation().sort_values(by='soap_pp_all').head()
[9]:
model tag routine mp_score soap_pp_all soap_pp_atom soap_pp_pair
32 automodel_4G0N_3NY5_13.pdb automodel_4G0N_3NY5 complex_automodel_default 2.25 -9502.636719 -7770.577637 -1732.059326
39 automodel_4G0N_3NY5_20.pdb automodel_4G0N_3NY5 complex_automodel_default 2.15 -9486.243164 -7656.946777 -1829.296143
28 automodel_4G0N_3NY5_9.pdb automodel_4G0N_3NY5 complex_automodel_default 2.46 -9475.368164 -7769.337891 -1706.030396
29 automodel_4G0N_3NY5_10.pdb automodel_4G0N_3NY5 complex_automodel_default 2.72 -9458.609375 -7647.797852 -1810.811646
9 automodel_4G0N_10.pdb automodel_4G0N complex_automodel_default 2.39 -9405.662109 -7718.845215 -1686.817139

Further reading

Congratulation on finishing the tutorial about complex modelling in homelette. The following tutorials might also be of interest to you:

  • Tutorial 1: Learn about the basics of homelette.

  • Tutorial 2: Learn more about already implemented routines for homology modelling.

  • Tutorial 3: Learn about the evaluation metrics available with homelette.

  • Tutorial 4: Learn about extending homelette’s functionality by defining your own modelling routines and evaluation metrics.

  • Tutorial 5: Learn about how to use parallelization in order to generate and evaluate models more efficiently.

  • Tutorial 7: Learn about assembling custom pipelines.

  • Tutorial 8: Learn about automated template identification, alignment generation and template processing.

References

[1] Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234(3), 779–815. https://doi.org/10.1006/jmbi.1993.1626

[2] Webb, B., & Sali, A. (2016). Comparative Protein Structure Modeling Using MODELLER. Current Protocols in Bioinformatics, 54(1), 5.6.1-5.6.37. https://doi.org/10.1002/cpbi.3

[3] Janson, G., Grottesi, A., Pietrosanto, M., Ausiello, G., Guarguaglini, G., & Paiardini, A. (2019). Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling. PLoS Computational Biology, 15(12), e1007219. https://doi.org/10.1371/journal.pcbi.1007219

[4] Dong, G. Q., Fan, H., Schneidman-Duhovny, D., Webb, B., Sali, A., & Tramontano, A. (2013). Optimized atomic statistical potentials: Assessment of protein interfaces and loops. Bioinformatics, 29(24), 3158–3166. https://doi.org/10.1093/bioinformatics/btt560

Session Info

[10]:
# session info
import session_info
session_info.show(html = False, dependencies = True)
-----
homelette           1.4
pandas              1.5.3
session_info        1.0.0
-----
PIL                 7.0.0
altmod              NA
anyio               NA
asttokens           NA
attr                19.3.0
babel               2.12.1
backcall            0.2.0
certifi             2022.12.07
chardet             3.0.4
charset_normalizer  3.1.0
comm                0.1.2
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
debugpy             1.6.6
decorator           4.4.2
executing           1.2.0
fastjsonschema      NA
idna                3.4
importlib_metadata  NA
importlib_resources NA
ipykernel           6.21.3
ipython_genutils    0.2.0
jedi                0.18.2
jinja2              3.1.2
json5               NA
jsonschema          4.17.3
jupyter_events      0.6.3
jupyter_server      2.4.0
jupyterlab_server   2.20.0
kiwisolver          1.0.1
markupsafe          2.1.2
matplotlib          3.1.2
modeller            10.4
more_itertools      NA
mpl_toolkits        NA
nbformat            5.7.3
numexpr             2.8.4
numpy               1.24.2
ost                 2.3.1
packaging           20.3
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
platformdirs        3.1.1
prometheus_client   NA
promod3             3.2.1
prompt_toolkit      3.0.38
psutil              5.5.1
ptyprocess          0.7.0
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.14.0
pyparsing           2.4.6
pyrsistent          NA
pythonjsonlogger    NA
pytz                2022.7.1
qmean               NA
requests            2.28.2
rfc3339_validator   0.1.4
rfc3986_validator   0.1.1
send2trash          NA
sitecustomize       NA
six                 1.12.0
sniffio             1.3.0
stack_data          0.6.2
swig_runtime_data4  NA
tornado             6.2
traitlets           5.9.0
urllib3             1.26.15
wcwidth             NA
websocket           1.5.1
yaml                6.0
zipp                NA
zmq                 25.0.1
-----
IPython             8.11.0
jupyter_client      8.0.3
jupyter_core        5.2.0
jupyterlab          3.6.1
notebook            6.5.3
-----
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-4.15.0-206-generic-x86_64-with-glibc2.29
-----
Session information updated at 2023-03-15 23:40