homelette.organization
The homelette.organization
submodule contains classes for organizing
workflows.
Task
is an object orchestrating model generation and evaluation.
Model
is an object used for storing information about generated
models.
Tutorials
For an introduction to homelette’s workflow, Tutorial 1 is useful. Assembling custom pipelines is discussed in Tutorial 7.
Classes
The following classes are part of this submodule:
- class homelette.organization.Task(task_name: str, target: str, alignment: Type[Alignment], task_directory: str = None, overwrite: bool = False)
Class for directing modelling and evaluation.
It is designed for the modelling of one target sequence from one or multiple templates.
If an already existing folder with models is specified, the Task object will load those models in automatically. In this case, it can also be used exclusively for evaluation purposes.
- Parameters:
task_name (str) – The name of the task
target (str) – The identifier of the protein to model
alignment (Alignment) – The alignment object that will be used for modelling
task_directory (str, optional) – The directory that will be used for this modelling task (default is creating a new one based on the task_name)
overwrite (bool, optional) – Boolean value determining if an already existing task_directory should be overwriten. If a directory already exists for a given task_name or task_directory, this will determine whether the directory and all its contents will be overwritten (True), or whether the contained models will be imported (False) (default is False)
- Variables:
task_name (str) – The name of the task
task_directory (str) – The directory that will be used for this modelling task (default is to use the task_name)
target (str) – The identifier of the protein to model
alignment (Alignment) – The alignment object that will be used for modelling
models (list) – List of models generated or imported by this task
routines (list) – List of modelling routines executed by this task
- Return type:
None
- execute_routine(tag: str, routine: Type[routines.Routine], templates: Iterable, template_location: str = '.', **kwargs) None
Generates homology models using a specified modelling routine
- Parameters:
tag (str) – The identifier associated with this combination of routine and template(s). Has to be unique between all routines executed by the same task object
routine (Routine) – The routine object used to generate the models
templates (list) – The iterable containing the identifier(s) of the template(s) used for model generation
template_location (str, optional) – The location of the template PDB files. They should be named according to their identifiers in the alignment (i.e. for a sequence named “1WXN” to be used as a template, it is expected that there will be a PDB file named “1WXN.pdb” in the specified template location (default is current working directory)
**kwargs – Named parameters passed directly on to the Routine object when the modelling is performed. Please check the documentation in order to make sure that the parameters passed on are available with the Routine object you intend to use
- Return type:
None
- evaluate_models(*args: Type[evaluation.Evaluation], n_threads: int = 1) None
Evaluates models using one or multiple evaluation metrics
- Parameters:
*args (Evaluation) – Evaluation objects that will be applied to the models
n_threads (int, optional) – Number of threads used for model evaluation (default is 1, which deactivates parallelization)
- Return type:
None
- get_evaluation() pandas.DataFrame
Return evaluation for all models as pandas dataframe.
- Returns:
Dataframe containing all model evaluation
- Return type:
pd.DataFrame
- class homelette.organization.Model(model_file: str, tag: str, routine: str)
Interface used to interact with created protein structure models.
- Parameters:
model_file (str) – The file location of the PDB file for this model
tag (str) – The tag that was used when generating this model (see
Task.execute_routine
for more details)routine (str) – The name of the routine that was used to generate this model
- Variables:
model_file (str) – The file location of the PDB file for this model
tag (str) – The tag that was used when generating this model (see Task.execute_routine for more details)
routine (str) – The name of the routine that was used to generate this model
info (dict) – Dictionary that can be used to store metadata about the model (i.e. for some evaluation metrics)
- Return type:
None
- parse_pdb() pandas.DataFrame
Parses ATOM and HETATM records in PDB file to pandas dataframe Useful for giving some evaluations methods access to data from the PDB file.
- Return type:
pd.DataFrame
Notes
Information is extracted according to the PDB file specification (version 3.30) and columns are named accordingly. See https://www.wwpdb.org/documentation/file-format for more information.
- get_sequence() str
Retrieve the 1-letter amino acid sequence of the PDB file associated with the Model object.
- Returns:
Amino acid sequence
- Return type:
str
- rename(new_name: str) None
Rename the PDB file associated with the Model object.
- Parameters:
new_name (str) – New name of PDB file
- Return type:
None