I shrank a transformer until every number fitted on the screen and made the weights editable [R]
Our take
The recent demonstration by software engineer Daniel Gochin of a fully functional, albeit miniature, transformer model built within a spreadsheet and then rendered as a shareable web page is a remarkable feat of pedagogical clarity and a testament to the power of hands-on learning. It's easy to get lost in the abstractions of Large Language Models (LLMs), dealing with APIs and high-level frameworks, but Gochin’s project pulls back the curtain, exposing the fundamental matrix multiplications that underpin this transformative technology. This approach is particularly relevant given the current fervor surrounding AI, as evidenced by articles like Why Wall Street thinks US memory maker Micron is the next Nvidia, highlighting the industry’s scramble to identify and invest in AI-adjacent opportunities. Understanding the building blocks, as Gochin demonstrates, is crucial for navigating this landscape, not just for engineers but for anyone seeking a deeper grasp of AI's inner workings. The fact that he's doing this while simultaneously soliciting feedback underscores a refreshing commitment to open learning and knowledge sharing.
What’s particularly compelling about Gochin’s implementation is its deliberate simplicity. By shrinking the vocabulary to just six words and utilizing 3-dimensional embeddings, he’s created a model that’s entirely visible and editable on a single screen. The ability to adjust the weights and observe the immediate impact on predictions is an incredibly effective learning tool. It avoids the black box perception that often surrounds complex AI systems, allowing users to intuitively grasp the relationship between parameters and output. Consider, for instance, the recent exploration of algorithmic bias as detailed in I Pitted XGBoost Against Logistic Regression on 358 Matches. Gochin’s approach, while focused on a simplified transformer, offers a similar clarity—a chance to directly observe how changes propagate through the system. It’s a powerful counterpoint to the increasingly complex and often opaque nature of modern AI development. The absence of a build step and external libraries further emphasizes the accessibility of this educational tool.
The project’s current state, with forward propagation complete and backward propagation planned, signals a journey of continued learning and refinement. While Gochin acknowledges he’s not an ML researcher, his initiative demonstrates that a deep understanding of core AI concepts is attainable even without formal training. This resonates with a broader trend of democratizing AI knowledge and empowering individuals to engage with this technology on a more fundamental level. The simplicity of the tool also makes it ideal for educators looking to introduce students to the principles of deep learning without requiring significant computational resources or specialized software. The fact that "randomizing" the weights immediately produces nonsense is a particularly insightful demonstration of the crucial role of training in the entire process. It highlights that the seemingly magical abilities of LLMs are the direct result of painstaking optimization, a process missing in this illustrative example.
Looking ahead, it’s intriguing to consider how this approach to visualizing and manipulating AI models could evolve. Could similar spreadsheet-based or web-based interfaces be developed for other machine learning architectures, such as convolutional neural networks or reinforcement learning agents? The current focus on backward propagation represents a natural next step, but the potential extends to exploring different optimization algorithms, network topologies, and even incorporating datasets of varying sizes. Ultimately, Gochin's project poses a fundamental question: how can we make the inner workings of increasingly complex AI systems more transparent and accessible, fostering a deeper understanding and greater trust in these powerful technologies?
I've been teaching myself how LLMs actually work, not at the API level, but down to the matrix multiplications. To force myself to really understand the forward pass, I first built a complete transformer by hand in a spreadsheet from embeddings through to the loss. Then I turned the forward pass into a web page so it's easier to share.
It's a full transformer (single attention head, single block) shrunk to the smallest size where every single number still fits on screen: a 6-word vocabulary, 3-dimensional embeddings. It reads four words and predicts the next one, and it walks through the whole thing top to bottom: word vectors, Q/K/V, attention scores, the causal mask, softmax, the feed-forward network, logits, and the final probabilities.
The part I found most useful for my own understanding: the weights and word vectors are editable, and everything downstream recomputes live. There's also a Randomize button that scrambles all the weights, and the prediction immediately turns to nonsense. That's the honest point of the whole thing: with random (untrained) weights the guess is meaningless, and training is the entire story this page deliberately leaves out.
It's a single self-contained HTML file, no libraries, no build step. Backward propagation (how the weights actually get good) is the next one I want to build.
Link: https://dgochin.github.io/transformer/
I'm not an ML researcher, I'm a software engineer learning this from the ground up, so if anything's wrong or could be explained better, I'd genuinely like to hear it. This was just my attempt of trying to understand the transformer in the most basic way.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience