`homelette.pdb_io`

The homelette.pdb_io submodule contains an object for parsing and manipulating PDB files. There are several constructor function that can read PDB files or download them from the internet.

Functions and classes

Functions and classes present in homelette.pdb_io are listed below:

PdbObject read_pdb() download_pdb()

homelette.pdb_io.read_pdb(file_name: str) → PdbObject

Reads PDB from file.

Parameters:: file_name (str) – PDB file name
Return type:: PdbObject

Notes

If a PDB file with multiple MODELs is read, only the first model will be conserved.

homelette.pdb_io.download_pdb(pdbid: str) → PdbObject

Download PDB from the RCSB.

Parameters:: pdbid (str) – PDB identifier
Return type:: PdbObject

Notes

If a PDB file with multiple MODELs is read, only the first model will be conserved.

class homelette.pdb_io.PdbObject(lines: Iterable)

Object encapsulating functionality regarding the processing of PDB files

Parameters:: lines (Iterable) – The lines of the PDB
Variables:: lines – The lines of the PDB, filtered for ATOM and HETATM records
Return type:: None

See also

read_pdb, download_pdb

Notes

Please contruct instances of PdbObject using the constructor functions.

If a PDB file with multiple MODELs is read, only the first model will be conserved.

write_pdb(file_name) → None

Write PDB to file.

Parameters:: file_name (str) – The name of the file to write the PDB to.
Return type:: None

parse_to_pd() → pandas.DataFrame

Parses PDB to pandas dataframe.

Return type:: pd.DataFrame

Notes

Information is extracted according to the PDB file specification (version 3.30) and columns are named accordingly. See https://www.wwpdb.org/documentation/file-format for more information.

get_sequence(ignore_missing: bool = True) → str

Retrieve the 1-letter amino acid sequence of the PDB, grouped by chain.

Parameters:: ignore_missing (bool) – Changes behaviour with regards to unmodelled residues. If True, they will be ignored for generating the sequence (default). If False, they will be represented in the sequence with the character X.
Returns:: Amino acid sequence
Return type:: str

get_chains() → list

Extract all chains present in the PDB.

Return type:: list

transform_extract_chain(chain) → PdbObject

Extract chain from PDB.

Parameters:: chain (str) – The chain ID to be extracted.
Return type:: PdbObject

transform_renumber_residues(starting_res: int = 1) → PdbObject

Renumber residues in PDB.

Parameters:: starting_res (int) – Residue number to start renumbering at (default 1)
Return type:: PdbObject

Notes

Missing residues in the PDB (i.e. unmodelled) will not be considered in the renumbering. If multiple chains are present in the PDB, numbering will be continued from one chain to the next one.

transform_change_chain_id(new_chain_id) → PdbObject

Replace chain ID for every entry in PDB.

Parameters:: new_chain_id (str) – New chain ID.
Return type:: PdbObject

transform_remove_hetatm() → PdbObject

Remove all HETATM entries from PDB.

Return type:: PdbObject

transform_filter_res_name(selection: Iterable, mode: str = 'out') → PdbObject

Filter PDB by residue name.

Parameters:

selection (Iterable) – For which residue names to filter
mode (str) – Filtering mode. If mode = “out”, the selection will be filtered out (default). If mode = “in”, everything except the selection will be filtered out.

Return type:

PdbObject

transform_filter_res_seq(lower: int, upper: int) → PdbObject

Filter PDB by residue number.

Parameters:

lower (int) – Lower bound of range to filter with.
upper (int) – Upper bound of range to filter with, inclusive.

Return type:

PdbObject

transform_concat(*others: PdbObject) → PdbObject

Concat PDB with other PDBs.

Parameters:: *others ('PdbObject) – Any number of PDBs.
Return type:: PdbObject

homelette.pdb_io

Functions and classes

`homelette.pdb_io`