mdsa_tools.subdomain_explorations

Use results of systems analysis to explore potential preferred structural conformations

  • Clustering PCA/UMAP embeddings at different target dimensions.

  • Pulling H-bond values via systems_analysis.extract_hbond_values()

    and using those in replicate maps instead of k-means labels.

  • Cohesion over time, transition matrices, implied timescales.

See Also

mdsa_tools.Cpptraj_import.cpptraj_hbond_import

Classes

subdomain_explorations([labels, centers, ...])

Use results of systems analysis to explore potential preferred structural conformations

class mdsa_tools.subdomain_explorations.subdomain_explorations(labels=None, centers=None, reduced_coordinates=None, frame_scale=None)

Bases: object

Use results of systems analysis to explore potential preferred structural conformations

  • Clustering PCA/UMAP embeddings at different target dimensions.

  • Pulling H-bond values via systems_analysis.extract_hbond_values() and using those in replicate maps instead of k-means labels.

  • Cohesion over time, transition matrices.

Attributes:
labelsarray-like of int or None

Cluster labels per frame (0-based).

centersnp.ndarray or None

Cluster centers in the same space as reduced coordinates.

reduced_coordinatesnp.ndarray or None

Low-dimensional embedding coordinates (e.g., PCA/UMAP).

frame_scalelist[int] or None

Number of frames per replicate.

Methods

create_transition_probability_matrix([...])

Build a row-normalized transition matrix from labels (no cross-replicate jumps).

evaluate_cohesion_shrinkingwindow([labels, ...])

Shrinking-from-the-start window (aka keep the tail).

evaluate_cohesion_slidingwindow([labels, ...])

Fixed-size sliding window per replicate.

rmsd_from_centers([X, labels, centers])

Per-cluster RMSD of points to their assigned cluster center.

Notes

Intentionally lightweight: common artifacts are stashed so you don’t have to pass them to every call.

create_transition_probability_matrix(labels=None, frame_list=None, lag=None)

Build a row-normalized transition matrix from labels (no cross-replicate jumps).

Parameters:
labelsarray-like, optional

Override stored labels. Integer states per frame (0-based).

frame_listlist[int], optional

Override stored frame_scale. Frames per replicate.

lagint, default 1

Transition lag (frames).

Returns:
np.ndarray

(n_states+1, n_states+1) with header row/col for state ids.

Notes

  • Rows with zero outgoing counts are all zeros.

  • Prints raw counts pre-normalization for sanity check.

evaluate_cohesion_shrinkingwindow(labels=None, centers=None, reduced_coordinates=None, frame_scale=None, step_size=None)

Shrinking-from-the-start window (aka keep the tail).

At step j, drop the first creepingstart frames of each replicate and use the rest.

Parameters:
labels, centers, reduced_coordinates, frame_scaleoptional

Override stored attributes.

step_sizeint, default 10

How much to move the left edge each step.

Returns:
pandas.DataFrame

Columns: [‘cluster’, ‘rmsd’, ‘window’].

Notes

Complements the sliding-window view—asks whether cohesion improves as you toss early frames.

evaluate_cohesion_slidingwindow(labels=None, centers=None, reduced_coordinates=None, frame_scale=None, step_size=None)

Fixed-size sliding window per replicate.

At window j, take a slice of length step_size from each replicate, concatenate, then compute per-cluster RMSD to centers for that slice. Advance by step_size each step.

Parameters:
labels, centers, reduced_coordinates, frame_scaleoptional

Override stored attributes.

step_sizeint, default 10

Window length (in frames) and hop size.

Returns:
pandas.DataFrame

Columns: [‘cluster’, ‘rmsd’, ‘window’] where window is 1-based.

Notes

  • Replicates shorter than the current window contribute nothing.

  • Windows never cross replicate boundaries.

  • Handy for checking “settling”/drift of clusters over time.

rmsd_from_centers(X=None, labels=None, centers=None)

Per-cluster RMSD of points to their assigned cluster center.

Parameters:
Xnp.ndarray, shape (n_samples, n_dims), optional

Points in embedding space (PCA/UMAP). Defaults to stored coordinates.

labelsarray-like of int, shape (n_samples,), optional

Cluster labels for each row of X. Defaults to stored labels.

centersnp.ndarray, shape (n_states, n_dims), optional

Cluster centers. Defaults to stored centers.

Returns:
np.ndarray of shape (n_present_states, 2)

Columns: (cluster_id, rmsd). Cluster ids as int, rmsd as float.

Notes

Uses Euclidean norm in the embedding space; no cluster-size weighting.