mdsa_tools.subdomain_explorations
Use results of systems analysis to explore potential preferred structural conformations
Clustering PCA/UMAP embeddings at different target dimensions.
- Pulling H-bond values via
systems_analysis.extract_hbond_values() and using those in replicate maps instead of k-means labels.
- Pulling H-bond values via
Cohesion over time, transition matrices, implied timescales.
See Also
mdsa_tools.Cpptraj_import.cpptraj_hbond_import
Classes
|
Use results of systems analysis to explore potential preferred structural conformations |
- class mdsa_tools.subdomain_explorations.subdomain_explorations(labels=None, centers=None, reduced_coordinates=None, frame_scale=None)
Bases:
objectUse results of systems analysis to explore potential preferred structural conformations
Clustering PCA/UMAP embeddings at different target dimensions.
Pulling H-bond values via
systems_analysis.extract_hbond_values()and using those in replicate maps instead of k-means labels.Cohesion over time, transition matrices.
- Attributes:
- labelsarray-like of int or None
Cluster labels per frame (0-based).
- centersnp.ndarray or None
Cluster centers in the same space as reduced coordinates.
- reduced_coordinatesnp.ndarray or None
Low-dimensional embedding coordinates (e.g., PCA/UMAP).
- frame_scalelist[int] or None
Number of frames per replicate.
Methods
Build a row-normalized transition matrix from labels (no cross-replicate jumps).
evaluate_cohesion_shrinkingwindow([labels, ...])Shrinking-from-the-start window (aka keep the tail).
evaluate_cohesion_slidingwindow([labels, ...])Fixed-size sliding window per replicate.
rmsd_from_centers([X, labels, centers])Per-cluster RMSD of points to their assigned cluster center.
Notes
Intentionally lightweight: common artifacts are stashed so you don’t have to pass them to every call.
- create_transition_probability_matrix(labels=None, frame_list=None, lag=None)
Build a row-normalized transition matrix from labels (no cross-replicate jumps).
- Parameters:
- labelsarray-like, optional
Override stored labels. Integer states per frame (0-based).
- frame_listlist[int], optional
Override stored frame_scale. Frames per replicate.
- lagint, default 1
Transition lag (frames).
- Returns:
- np.ndarray
(n_states+1, n_states+1) with header row/col for state ids.
Notes
Rows with zero outgoing counts are all zeros.
Prints raw counts pre-normalization for sanity check.
- evaluate_cohesion_shrinkingwindow(labels=None, centers=None, reduced_coordinates=None, frame_scale=None, step_size=None)
Shrinking-from-the-start window (aka keep the tail).
At step j, drop the first creepingstart frames of each replicate and use the rest.
- Parameters:
- labels, centers, reduced_coordinates, frame_scaleoptional
Override stored attributes.
- step_sizeint, default 10
How much to move the left edge each step.
- Returns:
- pandas.DataFrame
Columns: [‘cluster’, ‘rmsd’, ‘window’].
Notes
Complements the sliding-window view—asks whether cohesion improves as you toss early frames.
- evaluate_cohesion_slidingwindow(labels=None, centers=None, reduced_coordinates=None, frame_scale=None, step_size=None)
Fixed-size sliding window per replicate.
At window j, take a slice of length step_size from each replicate, concatenate, then compute per-cluster RMSD to centers for that slice. Advance by step_size each step.
- Parameters:
- labels, centers, reduced_coordinates, frame_scaleoptional
Override stored attributes.
- step_sizeint, default 10
Window length (in frames) and hop size.
- Returns:
- pandas.DataFrame
Columns: [‘cluster’, ‘rmsd’, ‘window’] where window is 1-based.
Notes
Replicates shorter than the current window contribute nothing.
Windows never cross replicate boundaries.
Handy for checking “settling”/drift of clusters over time.
- rmsd_from_centers(X=None, labels=None, centers=None)
Per-cluster RMSD of points to their assigned cluster center.
- Parameters:
- Xnp.ndarray, shape (n_samples, n_dims), optional
Points in embedding space (PCA/UMAP). Defaults to stored coordinates.
- labelsarray-like of int, shape (n_samples,), optional
Cluster labels for each row of X. Defaults to stored labels.
- centersnp.ndarray, shape (n_states, n_dims), optional
Cluster centers. Defaults to stored centers.
- Returns:
- np.ndarray of shape (n_present_states, 2)
Columns: (cluster_id, rmsd). Cluster ids as int, rmsd as float.
Notes
Uses Euclidean norm in the embedding space; no cluster-size weighting.