mdsa_tools.Cpptraj_import
This module provides utilities for importing and structuring hydrogen bond data generated by cpptraj.
It parses hbond … out <file> series output tables, extracts residue–residue hydrogen-bond pairs from the header, and loads the corresponding time series into adjacency-matrix representations. These system-level matrices can then be used directly for downstream analyses (i.e., clustering, dimensionality reduction, visualization).
Rather than functioning as a generic wrapper, the focus here is on:
Mapping atom-level cpptraj outputs to residue-level indices.
Constructing per-frame residue×residue H-bond matrices.
Providing template arrays for efficient modification and integration with the broader mdsa_tools pipeline.
See Also
- mdsa_tools.Viz.visualize_reduction
Plot PCA/UMAP embeddings of reduced systems.
- mdsa_tools.Data_gen_hbond.create_system_representations
Build system-level adjacency matrices from trajectories.
Classes
|
Lightweight loader for cpptraj hydrogen-bond (series) output. |
- class mdsa_tools.Cpptraj_import.cpptraj_hbond_import(filepath, topology, res_of_interest)
Bases:
objectLightweight loader for cpptraj hydrogen-bond (series) output.
Init takes the filepath to the desired data file and the associated topology. After construction, the header is parsed to determine residue–residue index pairs and the time-series data are loaded into memory.
- Parameters:
- filepathstr or pathlib.Path
Path to a cpptraj hbond … out <file> series output table.
- topologystr or pathlib.Path
Path to a topology file readable by MDTraj (i.e., AMBER prmtop).
- Attributes:
- indiceslist of tuple of int
Parsed residue index pairs (res1, res2) (1-based, AMBER resSeq style) in the same order as data columns (excluding #Frame).
- datanp.ndarray of int, shape=(n_frames, n_pairs)
Binary/indicator series where each column corresponds to a header pair in indices. Rows are frames.
- topologymdtraj.Topology
Loaded MDTraj topology associated with the series table.
Methods
edgelist_single_frame([topology, granularity])Create an upper‑triangle residue–residue edge template for one frame.
extract_headers([filepath])Parse the cpptraj hydrogen-bond header to get residue–residue pairs.
iterate_frames([data, headers])Map each frame's series values onto an edge‑vector scaffold.
lookup_table_from_edgelist([edge_list_template])Build a fast
(i, j) → rowlookup for the edge template.- Returns:
- None
Object is initialized with parsed attributes.
Notes
This class assumes a header where the first field is #Frame followed by columns named like <prefix>_<res1>@<atom1>_<res2>@<atom2>.
Examples
>>> loader = cpptraj_hbond_import("hbonds.dat", "system.prmtop") >>> loader.indices[:3] [(12, 34), (12, 35), (25, 30)] >>> loader.data.shape (n_frames, n_pairs)
- edgelist_single_frame(topology=None, granularity=None)
Create an upper‑triangle residue–residue edge template for one frame.
- Parameters:
- topologymdtraj.Topology or str or pathlib.Path or None, optional
- Topology object or path. If ``None``, uses ``self.topology``.
- granularity{‘residue’}, optional
- Placeholder for future atom‑ or group‑level variants. Only residue‑
- level edges are constructed at the moment.
- Returns:
- np.ndarray of int, shape (E, 2)
- Each row is a pair
(i, j)withi < jusing 0‑based MDTraj - residue indices. The set corresponds to the upper triangle of an
n_residues × n_residuesmatrix.
Notes
Use
lookup_table_from_edgelist()to convert(i, j)pairs to contiguous row indices for vectorized time‑series storage.
- extract_headers(filepath=None)
Parse the cpptraj hydrogen-bond header to get residue–residue pairs.
This reads only the first line of a cpptraj hbond … out <file> series table and extracts the residue indices for each H-bond column. It expects a leading #Frame column followed by columns named like <prefix>_<res1>@<atom1>_<res2>@<atom2> (i.e., HB_12@N_34@O). The returned pairs are 1-based residue indices (AMBER resSeq style), ordered exactly as the data columns appear in the file.
- Parameters:
- filepathstr or pathlib.Path
Path to the cpptraj hbond series output file. Must contain a header line beginning with #Frame and column names formatted as described above.
- Returns:
- indiceslist of tuple of int
A list of (res1, res2) residue index pairs (1-based) corresponding to the non-#Frame columns in the header, in column order. These indices are intended to be used later to place column values into a residue×residue adjacency matrix at positions [res1-1, res2-1].
- Raises:
- FileNotFoundError
If filepath does not exist.
- ValueError
If a header token cannot be parsed into integer residue indices.
Notes
Only the first line is inspected; data lines are not parsed here.
Column names must contain at least three underscore-separated tokens: a freeform prefix, <res1>@<atom1>, and <res2>@<atom2>.
Examples
Suppose the header line looks like:
#Frame HB_12@N_34@O HB_12@N_35@O HB_25@O_30@HThen:
indices = obj.extract_headers("hbonds.dat") # indices == [(12, 34), (12, 35), (25, 30)]
- iterate_frames(data=None, headers=None)
Map each frame’s series values onto an edge‑vector scaffold.
- Parameters:
- datanp.ndarray or None, shape (n_frames, n_pairs)
- Integer/binary cpptraj series. Defaults to ``self.data``.
- headerslist[tuple[int, int]] or None
- Header residue pairs `(res1, res2)` (1‑based). Defaults to
- ``self.residuelevel_indices``.
- Returns:
- None
- lookup_table_from_edgelist(edge_list_template=None)
Build a fast
(i, j) → rowlookup for the edge template.- Parameters:
- edge_list_templatenp.ndarray of int or None, shape (E, 2)
- Output of :meth:`edgelist_single_frame`. If ``None``, a template is
- generated from the current topology.
- Returns:
- np.ndarray of int, shape (n_residues, n_residues)
- Dense table
pair2rowwherepair2row[i, j]gives the row index - into the edge list for pair
(i, j)(0‑based). Symmetric with - diagonal set to
-1as a sentinel for “no mapping”.