homelette.organization

The homelette.organization submodule contains classes for organizing workflows.

Task is an object orchestrating model generation and evaluation.

Model is an object used for storing information about generated models.

Tutorials

For an introduction to homelette’s workflow, Tutorial 1 is useful. Assembling custom pipelines is discussed in Tutorial 7.

Classes

The following classes are part of this submodule:


class homelette.organization.Task(task_name: str, target: str, alignment: Type[Alignment], task_directory: str = None, overwrite: bool = False)

Class for directing modelling and evaluation.

It is designed for the modelling of one target sequence from one or multiple templates.

If an already existing folder with models is specified, the Task object will load those models in automatically. In this case, it can also be used exclusively for evaluation purposes.

Parameters:
  • task_name (str) – The name of the task

  • target (str) – The identifier of the protein to model

  • alignment (Alignment) – The alignment object that will be used for modelling

  • task_directory (str, optional) – The directory that will be used for this modelling task (default is creating a new one based on the task_name)

  • overwrite (bool, optional) – Boolean value determining if an already existing task_directory should be overwriten. If a directory already exists for a given task_name or task_directory, this will determine whether the directory and all its contents will be overwritten (True), or whether the contained models will be imported (False) (default is False)

Variables:
  • task_name (str) – The name of the task

  • task_directory (str) – The directory that will be used for this modelling task (default is to use the task_name)

  • target (str) – The identifier of the protein to model

  • alignment (Alignment) – The alignment object that will be used for modelling

  • models (list) – List of models generated or imported by this task

  • routines (list) – List of modelling routines executed by this task

Return type:

None

execute_routine(tag: str, routine: Type[routines.Routine], templates: Iterable, template_location: str = '.', **kwargs) None

Generates homology models using a specified modelling routine

Parameters:
  • tag (str) – The identifier associated with this combination of routine and template(s). Has to be unique between all routines executed by the same task object

  • routine (Routine) – The routine object used to generate the models

  • templates (list) – The iterable containing the identifier(s) of the template(s) used for model generation

  • template_location (str, optional) – The location of the template PDB files. They should be named according to their identifiers in the alignment (i.e. for a sequence named “1WXN” to be used as a template, it is expected that there will be a PDB file named “1WXN.pdb” in the specified template location (default is current working directory)

  • **kwargs – Named parameters passed directly on to the Routine object when the modelling is performed. Please check the documentation in order to make sure that the parameters passed on are available with the Routine object you intend to use

Return type:

None

evaluate_models(*args: Type[evaluation.Evaluation], n_threads: int = 1) None

Evaluates models using one or multiple evaluation metrics

Parameters:
  • *args (Evaluation) – Evaluation objects that will be applied to the models

  • n_threads (int, optional) – Number of threads used for model evaluation (default is 1, which deactivates parallelization)

Return type:

None

get_evaluation() pandas.DataFrame

Return evaluation for all models as pandas dataframe.

Returns:

Dataframe containing all model evaluation

Return type:

pd.DataFrame

class homelette.organization.Model(model_file: str, tag: str, routine: str)

Interface used to interact with created protein structure models.

Parameters:
  • model_file (str) – The file location of the PDB file for this model

  • tag (str) – The tag that was used when generating this model (see Task.execute_routine for more details)

  • routine (str) – The name of the routine that was used to generate this model

Variables:
  • model_file (str) – The file location of the PDB file for this model

  • tag (str) – The tag that was used when generating this model (see Task.execute_routine for more details)

  • routine (str) – The name of the routine that was used to generate this model

  • info (dict) – Dictionary that can be used to store metadata about the model (i.e. for some evaluation metrics)

Return type:

None

parse_pdb() pandas.DataFrame

Parses ATOM and HETATM records in PDB file to pandas dataframe Useful for giving some evaluations methods access to data from the PDB file.

Return type:

pd.DataFrame

Notes

Information is extracted according to the PDB file specification (version 3.30) and columns are named accordingly. See https://www.wwpdb.org/documentation/file-format for more information.

get_sequence() str

Retrieve the 1-letter amino acid sequence of the PDB file associated with the Model object.

Returns:

Amino acid sequence

Return type:

str

rename(new_name: str) None

Rename the PDB file associated with the Model object.

Parameters:

new_name (str) – New name of PDB file

Return type:

None