graphtoolbox.utils.attention

Functions

animate_grouped_attention(all_attentions, ...)

Animate attention matrices (mean or standard deviation) across groups of graphs.

attention_to_dense(all_attentions, edge_index)

Project attention values from edge-list format to dense adjacency tensor.

compute_attention_statistics(all_attentions, ...)

Compute per-head, per-layer attention mean and standard deviation mapped to adjacency matrices.

cosine_similarity_matrix(X)

Compute the cosine similarity matrix between rows of a matrix.

hierarchical_attention_fusion(attn_tensor, ...)

Fuse attention maps hierarchically across heads and layers using spectral fusion.

load_attention_batches(directory_path)

Load and assemble attention weights dumped in batch files produced by a model.

normalized_laplacian(A[, eps])

Compute the symmetric normalized Laplacian of an adjacency matrix.

pca_analysis_attention(all_attentions, ...)

Perform Principal Component Analysis (PCA) on a selected attention head across graphs.

pca_global_mean(all_attentions, edge_index, ...)

PCA on the global mean attention matrix (averaged over layers and heads).

pca_per_head(all_attentions, edge_index, ...)

Perform PCA independently for each (layer, head) pair on dense attention matrices.

plot_attention_statistics(avg_attn, ...)

Plot heatmaps of the average and standard deviation attention matrices for each layer and attention head.

plot_components(components, **kwargs)

Plot a grid of component heatmaps.

plot_explained_variance(explained[, title])

Plot explained variance ratio of PCA components.

plot_spectral_gap(L, max_k)

Plot the first max_k eigenvalues of a Laplacian and estimate the optimal number of clusters via the spectral gap.

spectral_embedding(L, k[, plot])

Compute the spectral embedding (first k eigenvectors of the Laplacian).

spectral_fusion(lA, k, **kwargs)

Perform spectral fusion by computing the average cosine similarity of the spectral embeddings of several adjacency matrices.

umap_analysis_attention(all_attentions, ...)

Perform UMAP-based dimensionality reduction on a selected attention head across graphs.

graphtoolbox.utils.attention.load_attention_batches(directory_path: str) tuple[Tensor, Tensor][source][source]

Load and assemble attention weights dumped in batch files produced by a model.

Parameters:

directory_path (str) – Path to a directory containing attention dump files. The function expects files named like “num_batch{n}.pt” (for example “num_batch0.pt”). Files are processed in ascending numeric order.

Returns:

A tuple (all_attentions, edge_index) - all_attentions: tensor of shape [L, H, G_total, E_graph] - edge_index: tensor of shape [2, E_graph]

Return type:

tuple[torch.Tensor, torch.Tensor]

Raises:
  • RuntimeError – If no valid attention dump files are found in the directory.

  • ValueError – If sizes are inconsistent within a file.

graphtoolbox.utils.attention.compute_attention_statistics(all_attentions: Tensor, edge_index: Tensor) tuple[Tensor, Tensor][source][source]

Compute per-head, per-layer attention mean and standard deviation mapped to adjacency matrices.

Parameters:
  • all_attentions (torch.Tensor) – Attention values with shape (L, H, G, E) where L = number of layers, H = number of heads, G = number of graphs, E = number of edges.

  • edge_index (torch.Tensor or array-like) – Edge indices in shape (2, E) or (E, 2). Node ids need not be contiguous; they will be compacted to a contiguous range.

Returns:

(mean_adj, std_adj) both tensors of shape (L, H, n_used, n_used), where n_used is the number of unique nodes present in edge_index. Entries for absent node pairs are zero.

Return type:

tuple[torch.Tensor, torch.Tensor]

Raises:

ValueError – If edge_index does not have shape (2, E) or (E, 2).

graphtoolbox.utils.attention.plot_attention_statistics(avg_attn: Tensor, std_attn: Tensor, **kwargs) None[source][source]

Plot heatmaps of the average and standard deviation attention matrices for each layer and attention head.

Parameters:
  • avg_attn (torch.Tensor) – Mean attention matrices to be visualized, shape [L, H, N, N].

  • std_attn (torch.Tensor) – Standard deviation matrices to be visualized, shape [L, H, N, N].

  • kwargs (dict) – Optional plotting parameters: - figsize: base size for a single head (width, height) - fontsize: title font size

Returns:

None

Return type:

None

graphtoolbox.utils.attention.animate_grouped_attention(all_attentions: Tensor, edge_index: Tensor, group_variable: list | ndarray, group_name: str = 'Group', interval: int = 1000, mode: str = 'mean', save: bool = True, **kwargs)[source][source]

Animate attention matrices (mean or standard deviation) across groups of graphs.

Parameters:
  • all_attentions (torch.Tensor of shape [L, H, G, E]) – Attention scores for all layers (L), heads (H), graphs (G), and edges (E).

  • edge_index (torch.Tensor of shape [2, E]) – Edge indices indicating source and target nodes.

  • group_variable (array-like of length G) – Group identifier for each graph (e.g., time step, class, or cluster ID).

  • group_name (str, optional (default="Group")) – Name of the group variable to display in the animation title.

  • interval (int, optional (default=1000)) – Time interval between frames in milliseconds.

  • mode ({"mean", "std"}, optional (default="mean")) – Statistic to visualize: either the mean or standard deviation of attention scores.

  • save (bool, optional (default=True)) – Save the figure to .gif.

Returns:

HTML animation of attention matrices for each group, rendered in Jupyter notebooks.

Return type:

IPython.display.HTML

graphtoolbox.utils.attention.pca_analysis_attention(all_attentions: Tensor, edge_index: Tensor, layer_idx: int = 0, head_idx: int = 0, n_components: int = 10) None[source][source]

Perform Principal Component Analysis (PCA) on a selected attention head across graphs.

This function visualizes: - The explained variance of the principal components (PCs), - A 2D projection of the data on the first two PCs, - Heatmaps of the top principal components reshaped into attention matrices.

Parameters:
  • all_attentions (torch.Tensor of shape [L, H, G, E] or [G, E]) – Attention values. Can be the full tensor from a model or already flattened for a given head.

  • edge_index (torch.Tensor of shape [2, E] or [E, 2]) – Edge indices (source and target nodes).

  • layer_idx (int, optional (default=0)) – Index of the attention layer to analyze.

  • head_idx (int, optional (default=0)) – Index of the attention head to analyze.

  • n_components (int, optional (default=10)) – Number of principal components to extract and visualize.

Returns:

Displays plots directly.

Return type:

None

graphtoolbox.utils.attention.umap_analysis_attention(all_attentions: Tensor, edge_index: Tensor, layer_idx: int = 0, head_idx: int = 0, n_neighbors: int = 15, min_dist: float = 0.1, n_components: int = 2) None[source][source]

Perform UMAP-based dimensionality reduction on a selected attention head across graphs.

This function visualizes: - A low-dimensional embedding of attention vectors using UMAP, - A 2D scatter plot colored cyclically to reflect graph ordering (e.g., temporal).

Parameters:
  • all_attentions (torch.Tensor of shape [L, H, G, E] or [G, E]) – Attention weights either for the entire model or already extracted for a given head.

  • edge_index (torch.Tensor of shape [2, E]) – Edge indices (source and target nodes), required to infer node count if needed.

  • layer_idx (int, optional (default=0)) – Index of the attention layer to analyze.

  • head_idx (int, optional (default=0)) – Index of the attention head to analyze.

  • n_neighbors (int, optional (default=15)) – Number of neighbors for the UMAP algorithm (controls local/global structure).

  • min_dist (float, optional (default=0.1)) – Minimum distance between embedded points (controls tightness of clusters).

  • n_components (int, optional (default=2)) – Number of output dimensions (typically 2 for visualization).

Returns:

Displays a UMAP projection plot.

Return type:

None

graphtoolbox.utils.attention.normalized_laplacian(A: Tensor, eps: float = 1e-05) Tensor[source][source]

Compute the symmetric normalized Laplacian of an adjacency matrix.

Parameters:
  • A (torch.Tensor of shape [N, N]) – Adjacency matrix of the graph (must be square and 2D).

  • eps (float, optional) – Small value added to avoid division by zero.

Returns:

L – Symmetric normalized Laplacian matrix: L = I - D^{-1/2} A D^{-1/2}.

Return type:

torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.plot_spectral_gap(L: Tensor, max_k: int) tuple[ndarray, int][source][source]

Plot the first max_k eigenvalues of a Laplacian and estimate the optimal number of clusters via the spectral gap.

Parameters:
  • L (torch.Tensor of shape [N, N]) – Laplacian matrix.

  • max_k (int) – Number of smallest eigenvalues to consider.

Returns:

  • eigvals (np.ndarray) – First max_k eigenvalues of the Laplacian.

  • optimal_k (int) – Estimated number of clusters based on the largest spectral gap (elbow method).

graphtoolbox.utils.attention.spectral_embedding(L: Tensor, k: int, plot: bool = False) Tensor[source][source]

Compute the spectral embedding (first k eigenvectors of the Laplacian).

Parameters:
  • L (torch.Tensor of shape [N, N]) – Laplacian matrix.

  • k (int) – Number of leading eigenvectors to return.

  • plot (bool, optional) – If True, plots the spectrum of eigenvalues.

Returns:

embedding – Matrix of the first k eigenvectors.

Return type:

torch.Tensor of shape [N, k]

graphtoolbox.utils.attention.cosine_similarity_matrix(X: Tensor) Tensor[source][source]

Compute the cosine similarity matrix between rows of a matrix.

Parameters:

X (torch.Tensor of shape [N, d]) – Input feature matrix.

Returns:

similarity – Cosine similarity between all pairs of rows in X.

Return type:

torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.spectral_fusion(lA: list[Tensor], k: int, **kwargs) Tensor[source][source]

Perform spectral fusion by computing the average cosine similarity of the spectral embeddings of several adjacency matrices.

Parameters:
  • lA (list of torch.Tensor [N, N]) – List of adjacency matrices to fuse.

  • k (int) – Number of eigenvectors to use for each spectral embedding.

  • **kwargs (dict) – Optional arguments passed to spectral_embedding.

Returns:

A_fused – Fused similarity matrix.

Return type:

torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.hierarchical_attention_fusion(attn_tensor: Tensor, k: int, **kwargs) Tensor[source][source]

Fuse attention maps hierarchically across heads and layers using spectral fusion.

Parameters:
  • attn_tensor (torch.Tensor of shape [L, H, N, N]) – Attention matrices for L layers and H heads.

  • k (int) – Number of eigenvectors used in spectral embeddings.

  • **kwargs (dict) – Optional arguments passed to spectral_fusion.

Returns:

A_final – Final fused similarity matrix after hierarchical fusion.

Return type:

torch.Tensor of shape [N, N]

graphtoolbox.utils.attention.attention_to_dense(all_attentions: Tensor, edge_index: Tensor, num_nodes: int = 12) Tensor[source][source]

Project attention values from edge-list format to dense adjacency tensor.

The function maps an attention tensor of shape [L, H, G, E] (layers, heads, graphs, edges) into a dense tensor of shape [L, H, G, N, N] by scattering edge values on (source, target) indices given by edge_index.

Parameters:
  • all_attentions (torch.Tensor) – Attention tensor with shape [L, H, G, E].

  • edge_index (torch.Tensor) – Edge indices; accepted shapes are [2, E] or [E, 2].

  • num_nodes (int) – Number of nodes N for the output dense adjacency matrices.

Returns:

Dense attention tensor of shape [L, H, G, N, N].

Return type:

torch.Tensor

graphtoolbox.utils.attention.pca_per_head(all_attentions: Tensor, edge_index: Tensor, num_nodes: int, n_components: int = 10)[source][source]

Perform PCA independently for each (layer, head) pair on dense attention matrices.

The function converts the edge-form attention tensor [L, H, G, E] into dense [L, H, G, N, N], reshapes each (layer, head) block into a [G, N*N] matrix and runs PCA, returning per-(layer,head) results.

Parameters:
  • all_attentions (torch.Tensor) – Attention tensor of shape [L, H, G, E].

  • edge_index (torch.Tensor) – Edge indices, used to project edges into NxN dense format.

  • num_nodes (int) – Number of nodes N used for dense matrices.

  • n_components (int) – Maximum number of PCA components to compute per head.

Returns:

A list of dicts, one per (layer, head), each containing: - “layer”: int - “head”: int - “explained_variance”: array of explained variance ratios - “components”: numpy array shaped [k, N, N] of principal components - “scores”: numpy array shaped [G, k] of transformed samples - “mean_matrix”: numpy array [N, N] the mean attention matrix across G

Return type:

list[dict]

graphtoolbox.utils.attention.pca_global_mean(all_attentions: Tensor, edge_index: Tensor, num_nodes: int, n_components: int = 10)[source][source]

PCA on the global mean attention matrix (averaged over layers and heads).

Parameters:
  • all_attentions (torch.Tensor) – Attention tensor [L, H, G, E].

  • edge_index (torch.Tensor) – Edge indices used to form dense matrices.

  • num_nodes (int) – Number of nodes N.

  • n_components (int) – Number of PCA components.

Returns:

Dictionary containing PCA results similar to pca_per_head.

Return type:

dict

graphtoolbox.utils.attention.plot_explained_variance(explained, title='Explained variance', **kwargs)[source][source]

Plot explained variance ratio of PCA components.

Parameters:
  • explained (array-like) – Iterable of explained variance ratios.

  • title (str) – Plot title.

  • kwargs (dict) – Optional plotting parameters (figsize).

Returns:

None

Return type:

None

graphtoolbox.utils.attention.plot_components(components: ndarray, **kwargs)[source][source]

Plot a grid of component heatmaps.

Parameters:
  • components (numpy.ndarray) – Array of principal components with shape [k, N, N].

  • kwargs (dict) – Optional arguments: - max_cols: maximum columns in the grid (default 5) - cmap: colormap (default ‘rocket_r’) - suptitle: overall figure title - figsize: figure size override

Returns:

None

Return type:

None