mdsa_tools.Data_gen_hbond

A module for creating and manipulating systems representations of molecular dynamics trajectories. Most MD groups typically have access to HPC resources, which makes tasks like high-dimensional clustering tractable. On standard workstations, we recommend down-sampling or masking datasets before use.

For AMBER users we would also recommend the CPPTRAJ_IMPORT module which can import the results of the hbond command in series form and maps atomic hbond counts to the residue level.

IMPORTANT NOTE is that this module expects your

See Also

mdsa_tools.Cpptraj_import.cpptraj_hbond_import

Classes

TrajectoryProcessor([trajectory_path, ...])

class mdsa_tools.Data_gen_hbond.TrajectoryProcessor(trajectory_path=None, topology_path=None, one_indexed=None, preloaded_trajectory=None)

Bases: object

Parameters:
Trajectory_path:str

A path to a trajectory file in various formats admitted by mdtraj.

Topology_path:str

A path to the topology pertaining to the trajectory you would like to load

Attributes:
system_representationnp.ndarray or None

Array of adjacency matrices representing residue–residue interactions for each frame of the trajectory. Shape = (n_frames, n_residues+1, n_residues+1). Initialized as None until create_system_representations is called.

filtered_representationnp.ndarray or None

Subset of the system representation containing only residues of interest. Generated by create_filtered_representations. Useful for focused analyses.

feature_matrixnp.ndarray or None

Matrix representation of the system suitable for downstream dimensionality reduction and clustering workflows. Placeholder attribute, populated by analysis routines.

topologymdtraj.Topology

MDTraj topology object corresponding to the loaded trajectory. Provides residue and atom indexing used throughout representation building.

Methods

Process_trajectory(trajectory, array_template)

Processes an individual frame of template array and fills in hydrogen bonding values.

create_attributes(trajectory[, granularity, ...])

returns atom to residue dictionary and template array for processing

create_filtered_representations(residues_to_keep)

Filters arrray representations to contain only residues of interest

create_system_representations([trajectory, ...])

Wraps operations for creating systems representations into a nice single method

Notes

Unless the file is in a trajectory format that includes its topology information please include it as its seperate argument top or else it will throw an error.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> tp.trajectory
<mdtraj.Trajectory with 1000 frames, 2000 atoms>
Process_trajectory(trajectory, array_template, atom_to_residue=None, granularity=None, one_indexed=None) ndarray

Processes an individual frame of template array and fills in hydrogen bonding values.

Parameters:
trajectory:md.trajectory:

An MDTraj trajectory object that is used for computing adjacency matrices directly from trajectories.

array_template:np.ndarray,shape=(n_residues,n_residues,n_frames)

This is an empty array of shape (n_residues,n_residues,n_frames) where we have n_frames worth of adjacency matrices of size n_residues*n_residues

atom_to_residue:Dict, Dict[atom_index]=residue_index

Dictionary containing atom to residue index mappings

Returns:
array_template:np.ndarray,shape=(n_frames,n_residues,n_residues)

A reference to the original array. It is updating the same array in memory but, in theory it is done for throughness.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> atom_to_residue, template = tp.create_attributes(tp.trajectory)
>>> filled = tp.Process_trajectory(tp.trajectory, template, atom_to_residue)
>>> filled.shape
(1000, 495, 495)
create_attributes(trajectory, granularity=None, one_indexed=None) Tuple[ndarray, Dict]

returns atom to residue dictionary and template array for processing

Parameters:
trajectory:mdtraj.Trajectory
Returns:
atom_to_residue:Dict, atom_to_residue[atom_index]=residue_index

Dictionary containing atom to residue mappings

template_array: np.ndarray, shape=(n_frames,n_residues,n_residues)

returns array containing adjacency matrices for every frame. Shape is dependent on residues in trajectory and number of frames.

Notes

This atom to residue dictionary is important as the function we will use for extracting hydrogen bonding information returns hydrogen bonds at the atomic level, and we need it at the residue level for this particular “systems” representation.

The template array is so we only create one datastructure to modify later improving efficiency.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> atom_to_residue, template = tp.create_attributes(tp.trajectory)
>>> len(atom_to_residue)
2000
>>> template.shape
(1000, 495, 495)
create_filtered_representations(residues_to_keep, systems_representation=None)

Filters arrray representations to contain only residues of interest

Parameters:
systems_representation: np.ndarray, shape=(n_frames,n_residues,n_residues)

Array containing adjacency matrices for every frame. Shape is dependent on residues in trajectory and number of frames.

res_of_interest:

An array detailing residues of interest

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.prmtop")
>>> tp.create_system_representations()
>>> filtered = tp.create_filtered_representations(residues_to_keep=[10, 20, 30])
>>> filtered.shape
(1000, 4, 4)
create_system_representations(trajectory=None, granularity=None)

Wraps operations for creating systems representations into a nice single method

Parameters:
trajectory:mdtraj.Trajectory:

An mdtraj trajectory object that should have in theory been created when you load in the class but, can also be included in the argument

granularity:str,default=
Returns:
Systems: np.ndarray, shape=(n_frames,n_residues,n_residues)

returns array containing adjacency matrices for every frame. Shape is dependent on residues in trajectory and number of frames.

Examples

>>> tp = TrajectoryProcessor("traj.mdcrd", "topology.pdb")
>>> systems = tp.create_system_representations()
>>> systems.shape
(1000, 495, 495)