mdsa_tools.Data_gen_hbond

A module for creating and manipulating systems representations of molecular dynamics trajectories. Most MD groups typically have access to HPC resources, which makes tasks like high-dimensional clustering tractable. On standard workstations, we recommend down-sampling or masking datasets before use.

For AMBER users we would also recommend the CPPTRAJ_IMPORT module which can import the results of the hbond command in series form and maps atomic hbond counts to the residue level.

IMPORTANT NOTE is that this module expects your

See Also

mdsa_tools.Cpptraj_import.cpptraj_hbond_import

Classes

TrajectoryProcessor([trajectory_path, ...])

class mdsa_tools.Data_gen_hbond.TrajectoryProcessor(trajectory_path=None, topology_path=None, one_indexed=None, preloaded_trajectory=None)

Bases: object

Parameters:

Trajectory_path:str: A path to a trajectory file in various formats admitted by mdtraj.
Topology_path:str: A path to the topology pertaining to the trajectory you would like to load

Attributes:

system_representationnp.ndarray or None: Array of adjacency matrices representing residue–residue interactions for each frame of the trajectory. Shape = (n_frames, n_residues+1, n_residues+1). Initialized as None until create_system_representations is called.
filtered_representationnp.ndarray or None: Subset of the system representation containing only residues of interest. Generated by create_filtered_representations. Useful for focused analyses.
feature_matrixnp.ndarray or None: Matrix representation of the system suitable for downstream dimensionality reduction and clustering workflows. Placeholder attribute, populated by analysis routines.
topologymdtraj.Topology: MDTraj topology object corresponding to the loaded trajectory. Provides residue and atom indexing used throughout representation building.

Methods

`Process_trajectory`(trajectory, array_template)	Processes an individual frame of template array and fills in hydrogen bonding values.
`create_attributes`(trajectory[, granularity, ...])	returns atom to residue dictionary and template array for processing
`create_filtered_representations`(residues_to_keep)	Filters arrray representations to contain only residues of interest
`create_system_representations`([trajectory, ...])	Wraps operations for creating systems representations into a nice single method

Notes

Unless the file is in a trajectory format that includes its topology information please include it as its seperate argument top or else it will throw an error.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> tp.trajectory
<mdtraj.Trajectory with 1000 frames, 2000 atoms>

Process_trajectory(trajectory, array_template, atom_to_residue=None, granularity=None, one_indexed=None) → ndarray

Processes an individual frame of template array and fills in hydrogen bonding values.

Parameters:

trajectory:md.trajectory:: An MDTraj trajectory object that is used for computing adjacency matrices directly from trajectories.
array_template:np.ndarray,shape=(n_residues,n_residues,n_frames): This is an empty array of shape (n_residues,n_residues,n_frames) where we have n_frames worth of adjacency matrices of size n_residues*n_residues
atom_to_residue:Dict, Dict[atom_index]=residue_index: Dictionary containing atom to residue index mappings

Returns:

array_template:np.ndarray,shape=(n_frames,n_residues,n_residues): A reference to the original array. It is updating the same array in memory but, in theory it is done for throughness.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> atom_to_residue, template = tp.create_attributes(tp.trajectory)
>>> filled = tp.Process_trajectory(tp.trajectory, template, atom_to_residue)
>>> filled.shape
(1000, 495, 495)

create_attributes(trajectory, granularity=None, one_indexed=None) → Tuple[ndarray, Dict]

returns atom to residue dictionary and template array for processing

Parameters:

trajectory:mdtraj.Trajectory

Returns:

atom_to_residue:Dict, atom_to_residue[atom_index]=residue_index: Dictionary containing atom to residue mappings
template_array: np.ndarray, shape=(n_frames,n_residues,n_residues): returns array containing adjacency matrices for every frame. Shape is dependent on residues in trajectory and number of frames.

Notes

This atom to residue dictionary is important as the function we will use for extracting hydrogen bonding information returns hydrogen bonds at the atomic level, and we need it at the residue level for this particular “systems” representation.

The template array is so we only create one datastructure to modify later improving efficiency.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> atom_to_residue, template = tp.create_attributes(tp.trajectory)
>>> len(atom_to_residue)
2000
>>> template.shape
(1000, 495, 495)

create_filtered_representations(residues_to_keep, systems_representation=None)

Filters arrray representations to contain only residues of interest

Parameters:

systems_representation: np.ndarray, shape=(n_frames,n_residues,n_residues): Array containing adjacency matrices for every frame. Shape is dependent on residues in trajectory and number of frames.
res_of_interest:: An array detailing residues of interest

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> tp.create_system_representations()
>>> filtered = tp.create_filtered_representations(residues_to_keep=[10, 20, 30])
>>> filtered.shape
(1000, 4, 4)

create_system_representations(trajectory=None, granularity=None)

Wraps operations for creating systems representations into a nice single method

Parameters:

trajectory:mdtraj.Trajectory:: An mdtraj trajectory object that should have in theory been created when you load in the class but, can also be included in the argument
granularity:str,default=

Returns:

Systems: np.ndarray, shape=(n_frames,n_residues,n_residues): returns array containing adjacency matrices for every frame. Shape is dependent on residues in trajectory and number of frames.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.pdb")
>>> systems = tp.create_system_representations()
>>> systems.shape
(1000, 495, 495)