May 13, 2026•1 min read•from Machine Learning

Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias [R][P]

Our take

In this blog post, I share my year-long journey of predicting company growth through 10-K filings, which ultimately fell short. However, I found excitement in refining encoder transformers to enhance their numerical capabilities. By MLM-training a modified ModernBERT, I created a compelling sequence embedder, despite encountering challenges with JEPA and a peculiar frequency bias in auto-encoders. My exploration also revealed insights into the effective dimensionality of transformer outputs. For a deeper dive, check out "Learning, Fast and Slow: Towards LLMs That Adapt Continually."

The journey of exploring AI-driven solutions for real-world challenges is often fraught with complexity and unexpected outcomes. In the recent article titled Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias, a researcher shares their year-long effort to predict company growth using the text of 10-K filings. Despite the lack of success in achieving their initial goal, they unearthed valuable insights regarding the capabilities and limitations of encoder transformers, particularly in their interaction with numerical data. This experience highlights a critical narrative in the AI landscape: the importance of experimentation, even when results do not align with expectations.

The author's exploration into making encoder transformers adept at handling numerical data illustrates a significant advancement in model training. By bypassing traditional tokenization methods that often struggle with numbers, the researcher successfully adapted a modified version of ModernBERT to deliver promising results. This approach not only opens doors for more nuanced data interpretation but also raises questions about how we can further refine machine learning models to bridge the gap between textual analysis and numerical forecasting. As we delve deeper into the realms of AI and machine learning, understanding these intersections will be crucial for innovators seeking to harness the full potential of their tools. This sentiment echoes themes found in related articles, such as Text auto-encoders have a strange frequency bias, where similar explorations reveal the intricacies of model behavior.

However, the author's experience with JEPA highlights a key challenge in the current landscape of AI research: the unpredictability of outcomes. Despite their innovative efforts, the JEPA approach fell short, while the auto-encoder setup yielded better results. This disparity serves as a reminder that not every innovative solution will succeed, and it underscores the importance of adopting a flexible mindset when experimenting with new technologies. The introduction of a Contrastive Loss term to mitigate the frequency bias observed in the decoder is an excellent example of adaptive learning in practice. It showcases how iterating on one’s methods—rather than rigidly adhering to a single approach—can lead to breakthroughs that enhance model performance and reliability.

As we reflect on the implications of this journey, it becomes evident that the pursuit of knowledge in AI is as much about the process as it is about the outcomes. The author's admission that their year-long endeavor felt like "how to waste 1,000 hours and $400" resonates with many in the field who face similar trials. This honesty enriches the discourse around AI and encourages others to embrace their own experimental journeys, understanding that failures often pave the way for significant learnings and future successes.

Looking forward, the question arises: how can we better leverage the insights gleaned from these explorations to foster a culture of innovation that embraces risks and learns from failures? As AI technology continues to evolve, the ability to adapt, experiment, and iterate will be paramount for those striving to transform their data management practices and drive meaningful change in their organizations. The ongoing exploration of models like the modified ModernBERT and the lessons learned from the author's journey will undoubtedly shape the next wave of advancements in AI-native spreadsheet technology and beyond.

Hi guys!

I've spent 1y trying to predict company growth from the full text of their 10-k filings.

It completely failed.

But I've had a lot of fun playing with encoder transformers and making them good at numbers (bypassing the tokenizer/prediction head when it sees one). I've MLM-trained a modified ModernBERT for this and it works really well. The model is available on HF: https://huggingface.co/edereynal/financial_bert

Then, I've made this MLM-trained model into a nice sequence embedder.

I've experimented with JEPA, but it failed.

The auto-encoder setup worked much better. But I encountered a strange frequency bias, where the decoder only cared about high-frequency information, and I had to mitigate it by adding a Contrastive Loss term.

I also investigated the tendency of transformers to have a low effective-dimensionality output space (compared to its input embedding space).

So, here's the technical blog post, that reads a bit like "how to waste 1,000 hours and $400 trying to solve an unsolvable real-world problem, but having a lot of fun along the way":

https://www.eloidereynal.com/p/i-spent-1-year-trying-to-predict

submitted by /u/Academic_Sleep1118
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article →

Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias [R][P]submitted by /u/Academic_Sleep1118 [link] [comments]

Tagged with

#rows.com#financial modeling with spreadsheets#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#real-time data collaboration#real-time collaboration#embedding model#ModernBERT#10-k filings#Text JEPA#auto-encoders#MLM-trained model#frequency bias#contrastive loss#transformers#sequence embedder#decoder#effective-dimensionality#company growth prediction

Training a number-aware embedding model + Text JEPA doesn't work too well + Text auto-encoders have a strange frequency bias [R][P]

Related Articles

Tagged with