mdsa_tools.subdomain_explorations

Use results of systems analysis to explore potential preferred structural conformations

Clustering PCA/UMAP embeddings at different target dimensions.
Pulling H-bond values via systems_analysis.extract_hbond_values()
and using those in replicate maps instead of k-means labels.
Cohesion over time, transition matrices, implied timescales.

See Also

mdsa_tools.Cpptraj_import.cpptraj_hbond_import

Classes

subdomain_explorations([labels, centers, ...])

Use results of systems analysis to explore potential preferred structural conformations

class mdsa_tools.subdomain_explorations.subdomain_explorations(labels=None, centers=None, reduced_coordinates=None, frame_scale=None)

Bases: object

Use results of systems analysis to explore potential preferred structural conformations

Clustering PCA/UMAP embeddings at different target dimensions.
Pulling H-bond values via systems_analysis.extract_hbond_values() and using those in replicate maps instead of k-means labels.
Cohesion over time, transition matrices.

Attributes:

labelsarray-like of int or None: Cluster labels per frame (0-based).
centersnp.ndarray or None: Cluster centers in the same space as reduced coordinates.
reduced_coordinatesnp.ndarray or None: Low-dimensional embedding coordinates (e.g., PCA/UMAP).
frame_scalelist[int] or None: Number of frames per replicate.

Methods

`create_transition_probability_matrix`([...])	Build a row-normalized transition matrix from labels (no cross-replicate jumps).
`evaluate_cohesion_shrinkingwindow`([labels, ...])	Shrinking-from-the-start window (aka keep the tail).
`evaluate_cohesion_slidingwindow`([labels, ...])	Fixed-size sliding window per replicate.
`rmsd_from_centers`([X, labels, centers])	Per-cluster RMSD of points to their assigned cluster center.

Notes

Intentionally lightweight: common artifacts are stashed so you don’t have to pass them to every call.

create_transition_probability_matrix(labels=None, frame_list=None, lag=None)

Build a row-normalized transition matrix from labels (no cross-replicate jumps).

Parameters:

labelsarray-like, optional: Override stored labels. Integer states per frame (0-based).
frame_listlist[int], optional: Override stored frame_scale. Frames per replicate.
lagint, default 1: Transition lag (frames).

Returns:

np.ndarray: (n_states+1, n_states+1) with header row/col for state ids.

Notes

Rows with zero outgoing counts are all zeros.
Prints raw counts pre-normalization for sanity check.

evaluate_cohesion_shrinkingwindow(labels=None, centers=None, reduced_coordinates=None, frame_scale=None, step_size=None)

Shrinking-from-the-start window (aka keep the tail).

At step j, drop the first creepingstart frames of each replicate and use the rest.

Parameters:

labels, centers, reduced_coordinates, frame_scaleoptional: Override stored attributes.
step_sizeint, default 10: How much to move the left edge each step.

Returns:

pandas.DataFrame: Columns: [‘cluster’, ‘rmsd’, ‘window’].

Notes

Complements the sliding-window view—asks whether cohesion improves as you toss early frames.

evaluate_cohesion_slidingwindow(labels=None, centers=None, reduced_coordinates=None, frame_scale=None, step_size=None)

Fixed-size sliding window per replicate.

At window j, take a slice of length step_size from each replicate, concatenate, then compute per-cluster RMSD to centers for that slice. Advance by step_size each step.

Parameters:

labels, centers, reduced_coordinates, frame_scaleoptional: Override stored attributes.
step_sizeint, default 10: Window length (in frames) and hop size.

Returns:

pandas.DataFrame: Columns: [‘cluster’, ‘rmsd’, ‘window’] where window is 1-based.

Notes

Replicates shorter than the current window contribute nothing.
Windows never cross replicate boundaries.
Handy for checking “settling”/drift of clusters over time.

rmsd_from_centers(X=None, labels=None, centers=None)

Per-cluster RMSD of points to their assigned cluster center.

Parameters:

Xnp.ndarray, shape (n_samples, n_dims), optional: Points in embedding space (PCA/UMAP). Defaults to stored coordinates.
labelsarray-like of int, shape (n_samples,), optional: Cluster labels for each row of X. Defaults to stored labels.
centersnp.ndarray, shape (n_states, n_dims), optional: Cluster centers. Defaults to stored centers.

Returns:

np.ndarray of shape (n_present_states, 2): Columns: (cluster_id, rmsd). Cluster ids as int, rmsd as float.

Notes

Uses Euclidean norm in the embedding space; no cluster-size weighting.