Functional Attention:
From Pairwise Affinities to Functional Correspondences

Jiefang Xiao1,2, Maolin Gao1,2, Simon Weber3, Guandao Yang4, Daniel Cremers1,2
1TU Munich   2MCML   3PIXL, University of Oxford   4ECE, UT Austin
ICML 2026
Research result visualization

Functional Attention(FuncAttn) reinterprets attention as a functional correspondence between learned bases. Queries, keys, and values are projected into a compact spectral space, where a regularized least-squares solve yields a k×k linear operator that transports information between two spaces, reducing complexity from O(n²) to O(k²) with k ≪ n.

Abstract

Learning mappings between infinite-dimensional function spaces, or operator learning, is essential for many machine learning applications. Although transformer-based operators are popular, they often rely on token-wise attention. These methods treat continuous fields as discrete tokens and usually ignore the global functional structure. We introduce Functional Attention, which reinterprets attention as a functional correspondence between adaptive bases. Inspired by geometric functional maps, our method replaces softmax affinities with structured linear operators. This yields a compact, generalizable, resolution-invariant representation that explicitly captures global dependencies. Experiments demonstrate that Functional Attention can match state-of-the-art performance in many operator learning tasks, including solving PDEs, 3D segmentation, and regression, while remaining robust to varying discretizations.

Evaluation

PDE Benchmarks

We report quantitative results on PDE benchmarks. Relative L2 loss to ground truth (×100, ↓) is reported. The best results are in bold and the second best are underlined; “/” indicates that the method is not applicable. Our method, FuncAttn, reaches state-of-the-art results and outperforms prior methods on almost all datasets.

Quantitative results on PDE benchmarks

RNA Point Cloud Segmentation

We evaluate on RNA point cloud segmentation, where “xyz” and “hks” indicate whether the network input is the xyz coordinates or heat kernel signatures. Our method achieves the best segmentation accuracy.

RNA point cloud segmentation results

OOD Generalization on AirfRANS

We report out-of-distribution (OOD) generalization on AirfRANS. The relative error of the lift coefficient (CL, %) and the Spearman’s rank correlation (ρL, %) are reported, with all values scaled by 100. Our method achieves the best generalization performance on both OOD Reynolds and OOD Angles settings.

OOD generalization on AirfRANS results

2D Darcy Flow with a Triangular Notch Domain

We further evaluate on 2D Darcy flow with a triangular notch domain. Relative L2 error (%, ↓) is reported; † denotes our reproduction using the released code under a comparable parameter budget. Our method achieves the best performance on this singular-domain task.

2D Darcy flow with a triangular notch domain results

Poster

BibTeX

@misc{xiao2026functionalattentionpairwiseaffinities,
      title={Functional Attention: From Pairwise Affinities to Functional Correspondences}, 
      author={Jiefang Xiao and Maolin Gao and Simon Weber and Guandao Yang and Daniel Cremers},
      year={2026},
      eprint={2605.31559},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.31559}, 
}