graphtoolbox.utils.attention¶
Functions
|
Animate attention matrices (mean or standard deviation) across groups of graphs. |
|
Project attention values from edge-list format to dense adjacency tensor. |
|
Compute per-head, per-layer attention mean and standard deviation mapped to adjacency matrices. |
Compute the cosine similarity matrix between rows of a matrix. |
|
|
Fuse attention maps hierarchically across heads and layers using spectral fusion. |
|
Load and assemble attention weights dumped in batch files produced by a model. |
|
Compute the symmetric normalized Laplacian of an adjacency matrix. |
|
Perform Principal Component Analysis (PCA) on a selected attention head across graphs. |
|
PCA on the global mean attention matrix (averaged over layers and heads). |
|
Perform PCA independently for each (layer, head) pair on dense attention matrices. |
|
Plot heatmaps of the average and standard deviation attention matrices for each layer and attention head. |
|
Plot a grid of component heatmaps. |
|
Plot explained variance ratio of PCA components. |
|
Plot the first max_k eigenvalues of a Laplacian and estimate the optimal number of clusters via the spectral gap. |
|
Compute the spectral embedding (first k eigenvectors of the Laplacian). |
|
Perform spectral fusion by computing the average cosine similarity of the spectral embeddings of several adjacency matrices. |
|
Perform UMAP-based dimensionality reduction on a selected attention head across graphs. |
- graphtoolbox.utils.attention.load_attention_batches(directory_path: str) tuple[Tensor, Tensor][source][source]¶
Load and assemble attention weights dumped in batch files produced by a model.
- Parameters:
directory_path (str) – Path to a directory containing attention dump files. The function expects files named like “num_batch{n}.pt” (for example “num_batch0.pt”). Files are processed in ascending numeric order.
- Returns:
A tuple (all_attentions, edge_index) - all_attentions: tensor of shape [L, H, G_total, E_graph] - edge_index: tensor of shape [2, E_graph]
- Return type:
tuple[torch.Tensor, torch.Tensor]
- Raises:
RuntimeError – If no valid attention dump files are found in the directory.
ValueError – If sizes are inconsistent within a file.
- graphtoolbox.utils.attention.compute_attention_statistics(all_attentions: Tensor, edge_index: Tensor) tuple[Tensor, Tensor][source][source]¶
Compute per-head, per-layer attention mean and standard deviation mapped to adjacency matrices.
- Parameters:
all_attentions (torch.Tensor) – Attention values with shape (L, H, G, E) where L = number of layers, H = number of heads, G = number of graphs, E = number of edges.
edge_index (torch.Tensor or array-like) – Edge indices in shape (2, E) or (E, 2). Node ids need not be contiguous; they will be compacted to a contiguous range.
- Returns:
(mean_adj, std_adj) both tensors of shape (L, H, n_used, n_used), where n_used is the number of unique nodes present in edge_index. Entries for absent node pairs are zero.
- Return type:
tuple[torch.Tensor, torch.Tensor]
- Raises:
ValueError – If edge_index does not have shape (2, E) or (E, 2).
- graphtoolbox.utils.attention.plot_attention_statistics(avg_attn: Tensor, std_attn: Tensor, **kwargs) None[source][source]¶
Plot heatmaps of the average and standard deviation attention matrices for each layer and attention head.
- Parameters:
avg_attn (torch.Tensor) – Mean attention matrices to be visualized, shape [L, H, N, N].
std_attn (torch.Tensor) – Standard deviation matrices to be visualized, shape [L, H, N, N].
kwargs (dict) – Optional plotting parameters: - figsize: base size for a single head (width, height) - fontsize: title font size
- Returns:
None
- Return type:
None
- graphtoolbox.utils.attention.animate_grouped_attention(all_attentions: Tensor, edge_index: Tensor, group_variable: list | ndarray, group_name: str = 'Group', interval: int = 1000, mode: str = 'mean', save: bool = True, **kwargs)[source][source]¶
Animate attention matrices (mean or standard deviation) across groups of graphs.
- Parameters:
all_attentions (torch.Tensor of shape [L, H, G, E]) – Attention scores for all layers (L), heads (H), graphs (G), and edges (E).
edge_index (torch.Tensor of shape [2, E]) – Edge indices indicating source and target nodes.
group_variable (array-like of length G) – Group identifier for each graph (e.g., time step, class, or cluster ID).
group_name (str, optional (default="Group")) – Name of the group variable to display in the animation title.
interval (int, optional (default=1000)) – Time interval between frames in milliseconds.
mode ({"mean", "std"}, optional (default="mean")) – Statistic to visualize: either the mean or standard deviation of attention scores.
save (bool, optional (default=True)) – Save the figure to .gif.
- Returns:
HTML animation of attention matrices for each group, rendered in Jupyter notebooks.
- Return type:
IPython.display.HTML
- graphtoolbox.utils.attention.pca_analysis_attention(all_attentions: Tensor, edge_index: Tensor, layer_idx: int = 0, head_idx: int = 0, n_components: int = 10) None[source][source]¶
Perform Principal Component Analysis (PCA) on a selected attention head across graphs.
This function visualizes: - The explained variance of the principal components (PCs), - A 2D projection of the data on the first two PCs, - Heatmaps of the top principal components reshaped into attention matrices.
- Parameters:
all_attentions (torch.Tensor of shape [L, H, G, E] or [G, E]) – Attention values. Can be the full tensor from a model or already flattened for a given head.
edge_index (torch.Tensor of shape [2, E] or [E, 2]) – Edge indices (source and target nodes).
layer_idx (int, optional (default=0)) – Index of the attention layer to analyze.
head_idx (int, optional (default=0)) – Index of the attention head to analyze.
n_components (int, optional (default=10)) – Number of principal components to extract and visualize.
- Returns:
Displays plots directly.
- Return type:
None
- graphtoolbox.utils.attention.umap_analysis_attention(all_attentions: Tensor, edge_index: Tensor, layer_idx: int = 0, head_idx: int = 0, n_neighbors: int = 15, min_dist: float = 0.1, n_components: int = 2) None[source][source]¶
Perform UMAP-based dimensionality reduction on a selected attention head across graphs.
This function visualizes: - A low-dimensional embedding of attention vectors using UMAP, - A 2D scatter plot colored cyclically to reflect graph ordering (e.g., temporal).
- Parameters:
all_attentions (torch.Tensor of shape [L, H, G, E] or [G, E]) – Attention weights either for the entire model or already extracted for a given head.
edge_index (torch.Tensor of shape [2, E]) – Edge indices (source and target nodes), required to infer node count if needed.
layer_idx (int, optional (default=0)) – Index of the attention layer to analyze.
head_idx (int, optional (default=0)) – Index of the attention head to analyze.
n_neighbors (int, optional (default=15)) – Number of neighbors for the UMAP algorithm (controls local/global structure).
min_dist (float, optional (default=0.1)) – Minimum distance between embedded points (controls tightness of clusters).
n_components (int, optional (default=2)) – Number of output dimensions (typically 2 for visualization).
- Returns:
Displays a UMAP projection plot.
- Return type:
None
- graphtoolbox.utils.attention.normalized_laplacian(A: Tensor, eps: float = 1e-05) Tensor[source][source]¶
Compute the symmetric normalized Laplacian of an adjacency matrix.
- Parameters:
A (torch.Tensor of shape [N, N]) – Adjacency matrix of the graph (must be square and 2D).
eps (float, optional) – Small value added to avoid division by zero.
- Returns:
L – Symmetric normalized Laplacian matrix: L = I - D^{-1/2} A D^{-1/2}.
- Return type:
torch.Tensor of shape [N, N]
- graphtoolbox.utils.attention.plot_spectral_gap(L: Tensor, max_k: int) tuple[ndarray, int][source][source]¶
Plot the first max_k eigenvalues of a Laplacian and estimate the optimal number of clusters via the spectral gap.
- Parameters:
L (torch.Tensor of shape [N, N]) – Laplacian matrix.
max_k (int) – Number of smallest eigenvalues to consider.
- Returns:
eigvals (np.ndarray) – First max_k eigenvalues of the Laplacian.
optimal_k (int) – Estimated number of clusters based on the largest spectral gap (elbow method).
- graphtoolbox.utils.attention.spectral_embedding(L: Tensor, k: int, plot: bool = False) Tensor[source][source]¶
Compute the spectral embedding (first k eigenvectors of the Laplacian).
- Parameters:
L (torch.Tensor of shape [N, N]) – Laplacian matrix.
k (int) – Number of leading eigenvectors to return.
plot (bool, optional) – If True, plots the spectrum of eigenvalues.
- Returns:
embedding – Matrix of the first k eigenvectors.
- Return type:
torch.Tensor of shape [N, k]
- graphtoolbox.utils.attention.cosine_similarity_matrix(X: Tensor) Tensor[source][source]¶
Compute the cosine similarity matrix between rows of a matrix.
- Parameters:
X (torch.Tensor of shape [N, d]) – Input feature matrix.
- Returns:
similarity – Cosine similarity between all pairs of rows in X.
- Return type:
torch.Tensor of shape [N, N]
- graphtoolbox.utils.attention.spectral_fusion(lA: list[Tensor], k: int, **kwargs) Tensor[source][source]¶
Perform spectral fusion by computing the average cosine similarity of the spectral embeddings of several adjacency matrices.
- Parameters:
lA (list of torch.Tensor [N, N]) – List of adjacency matrices to fuse.
k (int) – Number of eigenvectors to use for each spectral embedding.
**kwargs (dict) – Optional arguments passed to spectral_embedding.
- Returns:
A_fused – Fused similarity matrix.
- Return type:
torch.Tensor of shape [N, N]
- graphtoolbox.utils.attention.hierarchical_attention_fusion(attn_tensor: Tensor, k: int, **kwargs) Tensor[source][source]¶
Fuse attention maps hierarchically across heads and layers using spectral fusion.
- Parameters:
attn_tensor (torch.Tensor of shape [L, H, N, N]) – Attention matrices for L layers and H heads.
k (int) – Number of eigenvectors used in spectral embeddings.
**kwargs (dict) – Optional arguments passed to spectral_fusion.
- Returns:
A_final – Final fused similarity matrix after hierarchical fusion.
- Return type:
torch.Tensor of shape [N, N]
- graphtoolbox.utils.attention.attention_to_dense(all_attentions: Tensor, edge_index: Tensor, num_nodes: int = 12) Tensor[source][source]¶
Project attention values from edge-list format to dense adjacency tensor.
The function maps an attention tensor of shape [L, H, G, E] (layers, heads, graphs, edges) into a dense tensor of shape [L, H, G, N, N] by scattering edge values on (source, target) indices given by edge_index.
- Parameters:
all_attentions (torch.Tensor) – Attention tensor with shape [L, H, G, E].
edge_index (torch.Tensor) – Edge indices; accepted shapes are [2, E] or [E, 2].
num_nodes (int) – Number of nodes N for the output dense adjacency matrices.
- Returns:
Dense attention tensor of shape [L, H, G, N, N].
- Return type:
torch.Tensor
- graphtoolbox.utils.attention.pca_per_head(all_attentions: Tensor, edge_index: Tensor, num_nodes: int, n_components: int = 10)[source][source]¶
Perform PCA independently for each (layer, head) pair on dense attention matrices.
The function converts the edge-form attention tensor [L, H, G, E] into dense [L, H, G, N, N], reshapes each (layer, head) block into a [G, N*N] matrix and runs PCA, returning per-(layer,head) results.
- Parameters:
all_attentions (torch.Tensor) – Attention tensor of shape [L, H, G, E].
edge_index (torch.Tensor) – Edge indices, used to project edges into NxN dense format.
num_nodes (int) – Number of nodes N used for dense matrices.
n_components (int) – Maximum number of PCA components to compute per head.
- Returns:
A list of dicts, one per (layer, head), each containing: - “layer”: int - “head”: int - “explained_variance”: array of explained variance ratios - “components”: numpy array shaped [k, N, N] of principal components - “scores”: numpy array shaped [G, k] of transformed samples - “mean_matrix”: numpy array [N, N] the mean attention matrix across G
- Return type:
list[dict]
- graphtoolbox.utils.attention.pca_global_mean(all_attentions: Tensor, edge_index: Tensor, num_nodes: int, n_components: int = 10)[source][source]¶
PCA on the global mean attention matrix (averaged over layers and heads).
- Parameters:
all_attentions (torch.Tensor) – Attention tensor [L, H, G, E].
edge_index (torch.Tensor) – Edge indices used to form dense matrices.
num_nodes (int) – Number of nodes N.
n_components (int) – Number of PCA components.
- Returns:
Dictionary containing PCA results similar to pca_per_head.
- Return type:
dict
- graphtoolbox.utils.attention.plot_explained_variance(explained, title='Explained variance', **kwargs)[source][source]¶
Plot explained variance ratio of PCA components.
- Parameters:
explained (array-like) – Iterable of explained variance ratios.
title (str) – Plot title.
kwargs (dict) – Optional plotting parameters (figsize).
- Returns:
None
- Return type:
None
- graphtoolbox.utils.attention.plot_components(components: ndarray, **kwargs)[source][source]¶
Plot a grid of component heatmaps.
- Parameters:
components (numpy.ndarray) – Array of principal components with shape [k, N, N].
kwargs (dict) – Optional arguments: - max_cols: maximum columns in the grid (default 5) - cmap: colormap (default ‘rocket_r’) - suptitle: overall figure title - figsize: figure size override
- Returns:
None
- Return type:
None