2 min readfrom Machine Learning

Looking for arXiv endorsement (cs.CV) to post my ViT positional embeddings paper [R]

Our take

Hello everyone, I am seeking an endorsement for my paper submission to arXiv in the cs.CV (computer vision) or cs.LG category. Titled "Positional Encodings in Vision Transformers: A Geometric Account of Spatial Organization and Robustness," this work explores how various positional encoding schemes influence the representations within Vision Transformers. Through innovative metrics and controlled interventions, I demonstrate how spatial organization is affected by these embeddings. For further details, please refer to the full paper [here](https://github.com/mahmoud-mannes/neurips-geometry-paper/blob/main/p

Hi everyone,

I'm looking for someone to endorse me for arXiv submission in cs.CV (computer vision) or cs.LG. I have a completed paper and want to upload it as a preprint.

About the paper:

Title: Positional Encodings in Vision Transformers: A Geometric Account of Spatial Organization and Robustness

Summary: This paper investigates how different positional encoding schemes (learned absolute, sinusoidal, and rotary) shape the internal representations of Vision Transformers. We introduce a metric called Spatial Similarity Distance Correlation (SSDC) to quantify spatial structure in token representations. Using controlled interventions (random permutation at inference, random permutation training, and positional magnitude scaling), we show that:

  1. ViTs develop non‑trivial spatial structure even without positional embeddings, but this structure is content‑driven and collapses under token permutation.

  2. All positional encodings shift models toward index‑anchored spatial organization that persists under content disruption.

  3. Robustness to distributional shifts (JPEG compression, Gaussian blur) is primarily associated with the presence of a stable positional reference frame and correlates directly with SSDC as measured under intervention.

The paper includes experiments on ImageNet‑100 with ViT‑S models, multiple random seeds, and full statistical reporting.

PDF available at: https://github.com/mahmoud-mannes/neurips-geometry-paper/blob/main/paper/main.pdf

submitted by /u/Octacinth
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#rows.com#financial modeling with spreadsheets#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#AI-driven spreadsheet solutions#Vision Transformers#arXiv#cs.CV#Positional Encodings#positional embeddings#computer vision#Spatial Similarity Distance Correlation#endorsement#Robustness#ImageNet-100#preprint#token representations#ViT-S models#Spatial Organization