TritonSigmoid: A fast, padding-aware sigmoid attention kernel for GPUs [R]
Our take
# Our Take: TritonSigmoid Signals a New Era for GPU-Accelerated Biological Modeling
The open-sourcing of TritonSigmoid marks a significant milestone in the intersection of high-performance computing and computational biology. Developed specifically for single-cell foundation models, this padding-aware sigmoid attention kernel addresses a fundamental limitation in how neural networks process genomic data: the rigid competitive nature of softmax attention. When modeling gene expression patterns, where a single gene can be regulated by multiple transcription factors simultaneously, the traditional softmax approach forces tokens to compete for attention rather than allowing them to contribute cooperatively. Sigmoid attention fundamentally changes this dynamic, enabling models to attend strongly to many relevant genes in parallel rather than forcing a normalized distribution that suppresses all but the highest-scoring connections.
The performance benchmarks are striking. Achieving 515 TFLOPS on an H100 GPU compared to FlashAttention-2's 361 TFLOPS represents a meaningful leap in computational efficiency, but the story extends far beyond raw speed. The kernel's native handling of variable-length sequences—where cells may express anywhere from 200 to over 16,000 genes—means researchers are no longer padding their data to uniform lengths and wasting compute on empty positions. This efficiency gain is particularly valuable in single-cell genomics, where dataset heterogeneity is a persistent challenge. As we explore in How AI Agents Will Transform Data Science Work in 2026, the broader trend toward specialized, domain-aware computational tools is reshaping how researchers approach complex biological questions.
What makes this release particularly compelling is the empirical validation across six held-out datasets. Lower validation loss, 25 percent better cell-type separation, and—perhaps most remarkably—stable training conditions where softmax attention catastrophically diverges all point to a fundamental improvement in how these models learn biological patterns. For researchers working with single-cell data, cell-type classification is a cornerstone task, and any method that improves separation between cell populations represents a meaningful advance. The stability finding is especially noteworthy: when a different attention mechanism simply fails to train on certain data configurations while sigmoid succeeds, it suggests the approach is capturing something structurally important about the underlying biology rather than just optimizing a different loss landscape.
The implications extend beyond single-cell genomics. Foundation models in biology are rapidly becoming a dominant paradigm, and the ability to efficiently process variable-length sequences with non-competitive attention opens possibilities for other domains where token interactions are many-to-many rather than one-to-many. Drug discovery, protein language models, and multi-modal biological data integration could all benefit from this architectural insight. The decision to open-source both the paper and implementation reflects a welcome commitment to reproducibility and community engagement that the field needs more of. As foundation models continue to scale across biological applications, watching how sigmoid attention and similar non-competitive mechanisms reshape model architectures will be well worth attention. The question now is whether the broader research community will adopt this approach and build upon it, or if alternative formulations will emerge to address the remaining challenges in biological sequence modeling.
We are open-sourcing TritonSigmoid — a fast, padding-aware sigmoid attention kernel for GPUs.
We built this for single-cell foundation models, where every cell is represented as a sequence of genes. A single gene can be regulated by multiple transcription factors at once. Softmax forces them to compete for attention, but sigmoid lets the model attend strongly to many genes (tokens) simultaneously. Because cells express anywhere from 200 to 16,000+ genes (tokens), the kernel handles variable-length padding natively so you're not wasting compute on empty positions.
What we found during our experiments:
• Hardware: Up to 515 TFLOPS on H100 (vs. FlashAttention-2 at 361, FlashSigmoid at 440)
• Accuracy: Lower validation loss than softmax attention across 6 held-out datasets
• Representation: 25% better cell-type separation
• Stability: Stable training where softmax catastrophically diverges
We would welcome any discussion or feedback.
Links to our work:
Paper: https://arxiv.org/abs/2604.27124
Code: https://github.com/MSDLLCpapers/triton-sigmoid
[link] [comments]
Read on the original site
Open the publisher's page for the full experience