homelette.pdb_io

The homelette.pdb_io submodule contains an object for parsing and manipulating PDB files. There are several constructor function that can read PDB files or download them from the internet.

Functions and classes

Functions and classes present in homelette.pdb_io are listed below:


homelette.pdb_io.read_pdb(file_name: str) PdbObject

Reads PDB from file.

Parameters:

file_name (str) – PDB file name

Return type:

PdbObject

Notes

If a PDB file with multiple MODELs is read, only the first model will be conserved.

homelette.pdb_io.download_pdb(pdbid: str) PdbObject

Download PDB from the RCSB.

Parameters:

pdbid (str) – PDB identifier

Return type:

PdbObject

Notes

If a PDB file with multiple MODELs is read, only the first model will be conserved.

class homelette.pdb_io.PdbObject(lines: Iterable)

Object encapsulating functionality regarding the processing of PDB files

Parameters:

lines (Iterable) – The lines of the PDB

Variables:

lines – The lines of the PDB, filtered for ATOM and HETATM records

Return type:

None

Notes

Please contruct instances of PdbObject using the constructor functions.

If a PDB file with multiple MODELs is read, only the first model will be conserved.

write_pdb(file_name) None

Write PDB to file.

Parameters:

file_name (str) – The name of the file to write the PDB to.

Return type:

None

parse_to_pd() pandas.DataFrame

Parses PDB to pandas dataframe.

Return type:

pd.DataFrame

Notes

Information is extracted according to the PDB file specification (version 3.30) and columns are named accordingly. See https://www.wwpdb.org/documentation/file-format for more information.

get_sequence(ignore_missing: bool = True) str

Retrieve the 1-letter amino acid sequence of the PDB, grouped by chain.

Parameters:

ignore_missing (bool) – Changes behaviour with regards to unmodelled residues. If True, they will be ignored for generating the sequence (default). If False, they will be represented in the sequence with the character X.

Returns:

Amino acid sequence

Return type:

str

get_chains() list

Extract all chains present in the PDB.

Return type:

list

transform_extract_chain(chain) PdbObject

Extract chain from PDB.

Parameters:

chain (str) – The chain ID to be extracted.

Return type:

PdbObject

transform_renumber_residues(starting_res: int = 1) PdbObject

Renumber residues in PDB.

Parameters:

starting_res (int) – Residue number to start renumbering at (default 1)

Return type:

PdbObject

Notes

Missing residues in the PDB (i.e. unmodelled) will not be considered in the renumbering. If multiple chains are present in the PDB, numbering will be continued from one chain to the next one.

transform_change_chain_id(new_chain_id) PdbObject

Replace chain ID for every entry in PDB.

Parameters:

new_chain_id (str) – New chain ID.

Return type:

PdbObject

transform_remove_hetatm() PdbObject

Remove all HETATM entries from PDB.

Return type:

PdbObject

transform_filter_res_name(selection: Iterable, mode: str = 'out') PdbObject

Filter PDB by residue name.

Parameters:
  • selection (Iterable) – For which residue names to filter

  • mode (str) – Filtering mode. If mode = “out”, the selection will be filtered out (default). If mode = “in”, everything except the selection will be filtered out.

Return type:

PdbObject

transform_filter_res_seq(lower: int, upper: int) PdbObject

Filter PDB by residue number.

Parameters:
  • lower (int) – Lower bound of range to filter with.

  • upper (int) – Upper bound of range to filter with, inclusive.

Return type:

PdbObject

transform_concat(*others: PdbObject) PdbObject

Concat PDB with other PDBs.

Parameters:

*others ('PdbObject) – Any number of PDBs.

Return type:

PdbObject