mdsa_tools.Cpptraj_import

This module provides utilities for importing and structuring hydrogen bond data generated by cpptraj.

It parses hbond … out <file> series output tables, extracts residue–residue hydrogen-bond pairs from the header, and loads the corresponding time series into adjacency-matrix representations. These system-level matrices can then be used directly for downstream analyses (i.e., clustering, dimensionality reduction, visualization).

Rather than functioning as a generic wrapper, the focus here is on:

  • Mapping atom-level cpptraj outputs to residue-level indices.

  • Constructing per-frame residue×residue H-bond matrices.

  • Providing template arrays for efficient modification and integration with the broader mdsa_tools pipeline.

See Also

mdsa_tools.Viz.visualize_reduction

Plot PCA/UMAP embeddings of reduced systems.

mdsa_tools.Data_gen_hbond.create_system_representations

Build system-level adjacency matrices from trajectories.

Classes

cpptraj_hbond_import(filepath, topology, ...)

Lightweight loader for cpptraj hydrogen-bond (series) output.

class mdsa_tools.Cpptraj_import.cpptraj_hbond_import(filepath, topology, res_of_interest)

Bases: object

Lightweight loader for cpptraj hydrogen-bond (series) output.

Init takes the filepath to the desired data file and the associated topology. After construction, the header is parsed to determine residue–residue index pairs and the time-series data are loaded into memory.

Parameters:
filepathstr or pathlib.Path

Path to a cpptraj hbond … out <file> series output table.

topologystr or pathlib.Path

Path to a topology file readable by MDTraj (i.e., AMBER prmtop).

Attributes:
indiceslist of tuple of int

Parsed residue index pairs (res1, res2) (1-based, AMBER resSeq style) in the same order as data columns (excluding #Frame).

datanp.ndarray of int, shape=(n_frames, n_pairs)

Binary/indicator series where each column corresponds to a header pair in indices. Rows are frames.

topologymdtraj.Topology

Loaded MDTraj topology associated with the series table.

Methods

edgelist_single_frame([topology, granularity])

Create an upper‑triangle residue–residue edge template for one frame.

extract_headers([filepath])

Parse the cpptraj hydrogen-bond header to get residue–residue pairs.

iterate_frames([data, headers])

Map each frame's series values onto an edge‑vector scaffold.

lookup_table_from_edgelist([edge_list_template])

Build a fast (i, j) row lookup for the edge template.

Returns:
None

Object is initialized with parsed attributes.

Notes

This class assumes a header where the first field is #Frame followed by columns named like <prefix>_<res1>@<atom1>_<res2>@<atom2>.

Examples

>>> loader = cpptraj_hbond_import("hbonds.dat", "system.prmtop")
>>> loader.indices[:3]
[(12, 34), (12, 35), (25, 30)]
>>> loader.data.shape
(n_frames, n_pairs)
edgelist_single_frame(topology=None, granularity=None)

Create an upper‑triangle residue–residue edge template for one frame.

Parameters:
topologymdtraj.Topology or str or pathlib.Path or None, optional
Topology object or path. If ``None``, uses ``self.topology``.
granularity{‘residue’}, optional
Placeholder for future atom‑ or group‑level variants. Only residue‑
level edges are constructed at the moment.
Returns:
np.ndarray of int, shape (E, 2)
Each row is a pair (i, j) with i < j using 0‑based MDTraj
residue indices. The set corresponds to the upper triangle of an
n_residues × n_residues matrix.

Notes

Use lookup_table_from_edgelist() to convert (i, j) pairs to contiguous row indices for vectorized time‑series storage.

extract_headers(filepath=None)

Parse the cpptraj hydrogen-bond header to get residue–residue pairs.

This reads only the first line of a cpptraj hbond … out <file> series table and extracts the residue indices for each H-bond column. It expects a leading #Frame column followed by columns named like <prefix>_<res1>@<atom1>_<res2>@<atom2> (i.e., HB_12@N_34@O). The returned pairs are 1-based residue indices (AMBER resSeq style), ordered exactly as the data columns appear in the file.

Parameters:
filepathstr or pathlib.Path

Path to the cpptraj hbond series output file. Must contain a header line beginning with #Frame and column names formatted as described above.

Returns:
indiceslist of tuple of int

A list of (res1, res2) residue index pairs (1-based) corresponding to the non-#Frame columns in the header, in column order. These indices are intended to be used later to place column values into a residue×residue adjacency matrix at positions [res1-1, res2-1].

Raises:
FileNotFoundError

If filepath does not exist.

ValueError

If a header token cannot be parsed into integer residue indices.

Notes

  • Only the first line is inspected; data lines are not parsed here.

  • Column names must contain at least three underscore-separated tokens: a freeform prefix, <res1>@<atom1>, and <res2>@<atom2>.

Examples

Suppose the header line looks like:

#Frame HB_12@N_34@O HB_12@N_35@O HB_25@O_30@H

Then:

indices = obj.extract_headers("hbonds.dat")
# indices == [(12, 34), (12, 35), (25, 30)]
iterate_frames(data=None, headers=None)

Map each frame’s series values onto an edge‑vector scaffold.

Parameters:
datanp.ndarray or None, shape (n_frames, n_pairs)
Integer/binary cpptraj series. Defaults to ``self.data``.
headerslist[tuple[int, int]] or None
Header residue pairs `(res1, res2)` (1‑based). Defaults to
``self.residuelevel_indices``.
Returns:
None
lookup_table_from_edgelist(edge_list_template=None)

Build a fast (i, j) row lookup for the edge template.

Parameters:
edge_list_templatenp.ndarray of int or None, shape (E, 2)
Output of :meth:`edgelist_single_frame`. If ``None``, a template is
generated from the current topology.
Returns:
np.ndarray of int, shape (n_residues, n_residues)
Dense table pair2row where pair2row[i, j] gives the row index
into the edge list for pair (i, j) (0‑based). Symmetric with
diagonal set to -1 as a sentinel for “no mapping”.