graphtoolbox.utils.attention¶

Functions

`animate_grouped_attention`(all_attentions, ...)	Animate attention matrices (mean or standard deviation) across groups of graphs.
`attention_to_dense`(all_attentions, edge_index)	Project attention values from edge-list format to dense adjacency tensor.
`compute_attention_statistics`(all_attentions, ...)	Compute per-head, per-layer attention mean and standard deviation mapped to adjacency matrices.
`cosine_similarity_matrix`(X)	Compute the cosine similarity matrix between rows of a matrix.
`hierarchical_attention_fusion`(attn_tensor, ...)	Fuse attention maps hierarchically across heads and layers using spectral fusion.
`load_attention_batches`(directory_path)	Load and assemble attention weights dumped in batch files produced by a model.
`normalized_laplacian`(A[, eps])	Compute the symmetric normalized Laplacian of an adjacency matrix.
`pca_analysis_attention`(all_attentions, ...)	Perform Principal Component Analysis (PCA) on a selected attention head across graphs.
`pca_global_mean`(all_attentions, edge_index, ...)	PCA on the global mean attention matrix (averaged over layers and heads).
`pca_per_head`(all_attentions, edge_index, ...)	Perform PCA independently for each (layer, head) pair on dense attention matrices.
`plot_attention_statistics`(avg_attn, ...)	Plot heatmaps of the average and standard deviation attention matrices for each layer and attention head.
`plot_components`(components, **kwargs)	Plot a grid of component heatmaps.
`plot_explained_variance`(explained[, title])	Plot explained variance ratio of PCA components.
`plot_spectral_gap`(L, max_k)	Plot the first max_k eigenvalues of a Laplacian and estimate the optimal number of clusters via the spectral gap.
`spectral_embedding`(L, k[, plot])	Compute the spectral embedding (first k eigenvectors of the Laplacian).
`spectral_fusion`(lA, k, **kwargs)	Perform spectral fusion by computing the average cosine similarity of the spectral embeddings of several adjacency matrices.
`umap_analysis_attention`(all_attentions, ...)	Perform UMAP-based dimensionality reduction on a selected attention head across graphs.

graphtoolbox.utils.attention.load_attention_batches(directory_path: str) → tuple[Tensor, Tensor][source][source]¶

Load and assemble attention weights dumped in batch files produced by a model.

Parameters:

directory_path (str) – Path to a directory containing attention dump files. The function expects files named like “num_batch{n}.pt” (for example “num_batch0.pt”). Files are processed in ascending numeric order.

Returns:

A tuple (all_attentions, edge_index) - all_attentions: tensor of shape [L, H, G_total, E_graph] - edge_index: tensor of shape [2, E_graph]

Return type:

tuple[torch.Tensor, torch.Tensor]

Raises:

RuntimeError – If no valid attention dump files are found in the directory.
ValueError – If sizes are inconsistent within a file.

graphtoolbox.utils.attention.compute_attention_statistics(all_attentions: Tensor, edge_index: Tensor) → tuple[Tensor, Tensor][source][source]¶

Compute per-head, per-layer attention mean and standard deviation mapped to adjacency matrices.

Parameters:

all_attentions (torch.Tensor) – Attention values with shape (L, H, G, E) where L = number of layers, H = number of heads, G = number of graphs, E = number of edges.
edge_index (torch.Tensor or array-like) – Edge indices in shape (2, E) or (E, 2). Node ids need not be contiguous; they will be compacted to a contiguous range.

Returns:

(mean_adj, std_adj) both tensors of shape (L, H, n_used, n_used), where n_used is the number of unique nodes present in edge_index. Entries for absent node pairs are zero.

Return type:

tuple[torch.Tensor, torch.Tensor]

Raises:

ValueError – If edge_index does not have shape (2, E) or (E, 2).

graphtoolbox.utils.attention.plot_attention_statistics(avg_attn: Tensor, std_attn: Tensor, **kwargs) → None[source][source]¶

Plot heatmaps of the average and standard deviation attention matrices for each layer and attention head.

Parameters:

avg_attn (torch.Tensor) – Mean attention matrices to be visualized, shape [L, H, N, N].
std_attn (torch.Tensor) – Standard deviation matrices to be visualized, shape [L, H, N, N].
kwargs (dict) – Optional plotting parameters: - figsize: base size for a single head (width, height) - fontsize: title font size

Returns:

None

Return type:

None

graphtoolbox.utils.attention.animate_grouped_attention(all_attentions: Tensor, edge_index: Tensor, group_variable: list | ndarray, group_name: str = 'Group', interval: int = 1000, mode: str = 'mean', save: bool = True, **kwargs)[source][source]¶

Animate attention matrices (mean or standard deviation) across groups of graphs.

Parameters:

all_attentions (torch.Tensor of shape [L, H, G, E]) – Attention scores for all layers (L), heads (H), graphs (G), and edges (E).
edge_index (torch.Tensor of shape [2, E]) – Edge indices indicating source and target nodes.
group_variable (array-like of length G) – Group identifier for each graph (e.g., time step, class, or cluster ID).
group_name (str, optional (default="Group")) – Name of the group variable to display in the animation title.
interval (int, optional (default=1000)) – Time interval between frames in milliseconds.
mode ({"mean", "std"}, optional (default="mean")) – Statistic to visualize: either the mean or standard deviation of attention scores.
save (bool, optional (default=True)) – Save the figure to .gif.

Returns:

HTML animation of attention matrices for each group, rendered in Jupyter notebooks.

Return type:

IPython.display.HTML

graphtoolbox.utils.attention.pca_analysis_attention(all_attentions: Tensor, edge_index: Tensor, layer_idx: int = 0, head_idx: int = 0, n_components: int = 10) → None[source][source]¶

Perform Principal Component Analysis (PCA) on a selected attention head across graphs.

This function visualizes: - The explained variance of the principal components (PCs), - A 2D projection of the data on the first two PCs, - Heatmaps of the top principal components reshaped into attention matrices.

Parameters:

all_attentions (torch.Tensor of shape [L, H, G, E] or [G, E]) – Attention values. Can be the full tensor from a model or already flattened for a given head.
edge_index (torch.Tensor of shape [2, E] or [E, 2]) – Edge indices (source and target nodes).
layer_idx (int, optional (default=0)) – Index of the attention layer to analyze.
head_idx (int, optional (default=0)) – Index of the attention head to analyze.
n_components (int, optional (default=10)) – Number of principal components to extract and visualize.

Returns:

Displays plots directly.

Return type:

None

graphtoolbox.utils.attention.umap_analysis_attention(all_attentions: Tensor, edge_index: Tensor, layer_idx: int = 0, head_idx: int = 0, n_neighbors: int = 15, min_dist: float = 0.1, n_components: int = 2) → None[source][source]¶

Perform UMAP-based dimensionality reduction on a selected attention head across graphs.

This function visualizes: - A low-dimensional embedding of attention vectors using UMAP, - A 2D scatter plot colored cyclically to reflect graph ordering (e.g., temporal).

Parameters:

all_attentions (torch.Tensor of shape [L, H, G, E] or [G, E]) – Attention weights either for the entire model or already extracted for a given head.
edge_index (torch.Tensor of shape [2, E]) – Edge indices (source and target nodes), required to infer node count if needed.
layer_idx (int, optional (default=0)) – Index of the attention layer to analyze.
head_idx (int, optional (default=0)) – Index of the attention head to analyze.
n_neighbors (int, optional (default=15)) – Number of neighbors for the UMAP algorithm (controls local/global structure).
min_dist (float, optional (default=0.1)) – Minimum distance between embedded points (controls tightness of clusters).
n_components (int, optional (default=2)) – Number of output dimensions (typically 2 for visualization).

Returns:

Displays a UMAP projection plot.

Return type:

None

graphtoolbox.utils.attention.normalized_laplacian(A: Tensor, eps: float = 1e-05) → Tensor[source][source]¶

Compute the symmetric normalized Laplacian of an adjacency matrix.

Parameters:

A (torch.Tensor of shape [N, N]) – Adjacency matrix of the graph (must be square and 2D).
eps (float, optional) – Small value added to avoid division by zero.

Returns:

L – Symmetric normalized Laplacian matrix: L = I - D^{-1/2} A D^{-1/2}.

Return type:

torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.plot_spectral_gap(L: Tensor, max_k: int) → tuple[ndarray, int][source][source]¶

Plot the first max_k eigenvalues of a Laplacian and estimate the optimal number of clusters via the spectral gap.

Parameters:

L (torch.Tensor of shape [N, N]) – Laplacian matrix.
max_k (int) – Number of smallest eigenvalues to consider.

Returns:

eigvals (np.ndarray) – First max_k eigenvalues of the Laplacian.
optimal_k (int) – Estimated number of clusters based on the largest spectral gap (elbow method).

graphtoolbox.utils.attention.spectral_embedding(L: Tensor, k: int, plot: bool = False) → Tensor[source][source]¶

Compute the spectral embedding (first k eigenvectors of the Laplacian).

Parameters:

L (torch.Tensor of shape [N, N]) – Laplacian matrix.
k (int) – Number of leading eigenvectors to return.
plot (bool, optional) – If True, plots the spectrum of eigenvalues.

Returns:

embedding – Matrix of the first k eigenvectors.

Return type:

torch.Tensor of shape [N, k]

graphtoolbox.utils.attention.cosine_similarity_matrix(X: Tensor) → Tensor[source][source]¶

Compute the cosine similarity matrix between rows of a matrix.

Parameters:: X (torch.Tensor of shape [N, d]) – Input feature matrix.
Returns:: similarity – Cosine similarity between all pairs of rows in X.
Return type:: torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.spectral_fusion(lA: list[Tensor], k: int, **kwargs) → Tensor[source][source]¶

Perform spectral fusion by computing the average cosine similarity of the spectral embeddings of several adjacency matrices.

Parameters:

lA (list of torch.Tensor [N, N]) – List of adjacency matrices to fuse.
k (int) – Number of eigenvectors to use for each spectral embedding.
**kwargs (dict) – Optional arguments passed to spectral_embedding.

Returns:

A_fused – Fused similarity matrix.

Return type:

torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.hierarchical_attention_fusion(attn_tensor: Tensor, k: int, **kwargs) → Tensor[source][source]¶

Fuse attention maps hierarchically across heads and layers using spectral fusion.

Parameters:

attn_tensor (torch.Tensor of shape [L, H, N, N]) – Attention matrices for L layers and H heads.
k (int) – Number of eigenvectors used in spectral embeddings.
**kwargs (dict) – Optional arguments passed to spectral_fusion.

Returns:

A_final – Final fused similarity matrix after hierarchical fusion.

Return type:

torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.attention_to_dense(all_attentions: Tensor, edge_index: Tensor, num_nodes: int = 12) → Tensor[source][source]¶

Project attention values from edge-list format to dense adjacency tensor.

The function maps an attention tensor of shape [L, H, G, E] (layers, heads, graphs, edges) into a dense tensor of shape [L, H, G, N, N] by scattering edge values on (source, target) indices given by edge_index.

Parameters:

all_attentions (torch.Tensor) – Attention tensor with shape [L, H, G, E].
edge_index (torch.Tensor) – Edge indices; accepted shapes are [2, E] or [E, 2].
num_nodes (int) – Number of nodes N for the output dense adjacency matrices.

Returns:

Dense attention tensor of shape [L, H, G, N, N].

Return type:

torch.Tensor

graphtoolbox.utils.attention.pca_per_head(all_attentions: Tensor, edge_index: Tensor, num_nodes: int, n_components: int = 10)[source][source]¶

Perform PCA independently for each (layer, head) pair on dense attention matrices.

The function converts the edge-form attention tensor [L, H, G, E] into dense [L, H, G, N, N], reshapes each (layer, head) block into a [G, N*N] matrix and runs PCA, returning per-(layer,head) results.

Parameters:

all_attentions (torch.Tensor) – Attention tensor of shape [L, H, G, E].
edge_index (torch.Tensor) – Edge indices, used to project edges into NxN dense format.
num_nodes (int) – Number of nodes N used for dense matrices.
n_components (int) – Maximum number of PCA components to compute per head.

Returns:

A list of dicts, one per (layer, head), each containing: - “layer”: int - “head”: int - “explained_variance”: array of explained variance ratios - “components”: numpy array shaped [k, N, N] of principal components - “scores”: numpy array shaped [G, k] of transformed samples - “mean_matrix”: numpy array [N, N] the mean attention matrix across G

Return type:

list[dict]

graphtoolbox.utils.attention.pca_global_mean(all_attentions: Tensor, edge_index: Tensor, num_nodes: int, n_components: int = 10)[source][source]¶

PCA on the global mean attention matrix (averaged over layers and heads).

Parameters:

all_attentions (torch.Tensor) – Attention tensor [L, H, G, E].
edge_index (torch.Tensor) – Edge indices used to form dense matrices.
num_nodes (int) – Number of nodes N.
n_components (int) – Number of PCA components.

Returns:

Dictionary containing PCA results similar to pca_per_head.

Return type:

dict

graphtoolbox.utils.attention.plot_explained_variance(explained, title='Explained variance', **kwargs)[source][source]¶

Plot explained variance ratio of PCA components.

Parameters:

explained (array-like) – Iterable of explained variance ratios.
title (str) – Plot title.
kwargs (dict) – Optional plotting parameters (figsize).

Returns:

None

Return type:

None

graphtoolbox.utils.attention.plot_components(components: ndarray, **kwargs)[source][source]¶

Plot a grid of component heatmaps.

Parameters:

components (numpy.ndarray) – Array of principal components with shape [k, N, N].
kwargs (dict) – Optional arguments: - max_cols: maximum columns in the grid (default 5) - cmap: colormap (default ‘rocket_r’) - suptitle: overall figure title - figsize: figure size override

Returns:

None

Return type:

None