1 min readfrom Machine Learning

Introducing AutoMuon, a one line drop in for AdamW [P]

Our take

Introducing AutoMuon, a Python package designed to seamlessly integrate the Muon optimizer as a drop-in replacement for AdamW within any PyTorch training pipeline. AutoMuon intelligently scans your model during initialization to determine the appropriate optimizer for each parameter, ensuring optimal performance across various components, including embeddings and biases. While primarily effective on standard architectures like transformers and CNNs, I welcome contributions to expand its capabilities. Join me in exploring the potential of Muon across diverse applications, from time series forecasting to genomics.

Hey everyone, I've been working on a small Python package called AutoMuon that makes the Muon optimizer usable as a drop-in replacement for AdamW in arbitrary PyTorch training pipelines.

The core idea is relatively simple: Muon works primarily on 2D weight matrices (linear projections, conv layers) on hidden states, but you still need AdamW for embeddings, norms, and biases, etc. AutoMuon scans your model at init, figures out the right optimizer for each parameter automatically.

I am open to PRs, especially for expanding the module-type exclusion list if you hit edge cases in your architecture. Would love to know if anyone tries it on something other than transformers or CNNs and what they find. I feel that it would likely struggle with fully custom architectures, like flash-linear-attention for instance, so that would require some user tuning.

I am planning to add more tests for time series forecasting, genomics, language modeling, etc. I want to see how generalizable Muon really is!

https://github.com/SkyeGunasekaran/automuon

pip install git+https://github.com/SkyeGunasekaran/automuon.git

submitted by /u/Skye7821
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#rows.com#financial modeling with spreadsheets#financial modeling#real-time data collaboration#real-time collaboration#natural language processing#AutoMuon#Muon optimizer#AdamW#PyTorch#training pipelines#2D weight matrices#linear projections#conv layers#hidden states#embeddings#norms