•2 min read•from Machine Learning
Trials and tribulations fine-tuning & deploying Gemma-4 [P]
Our take
Our ML team recently tackled the challenges of fine-tuning and deploying Gemma-4, and we want to share our insights. We encountered several hurdles, including issues with PEFT and custom layers, silent training failures with SFTTrainer, and complications with DeepSpeed ZeRO-3. These obstacles have shaped our approach, leading to effective workarounds that we documented in detail. We hope our experiences will empower your own journey with Gemma-4, making the process smoother and more productive. Dive into the blog for a comprehensive look at our findings!
Hey all,
Our ML team spent some time this week getting training and deployments working for Gemma-4, and wanted to document all the things we ran into along the way.
- PEFT doesn't recognize Gemma 4's custom layers. Google wrapped vision/audio projections in a new
ClippableLinearclass that doesn't inherit fromnn.Linear, so PEFT refuses to attach LoRA, even for text-only fine-tuning. Fix: unwrap the wrappers after loading weights but before calling PEFT. - SFTTrainer killed training silently. TRL hardcodes
use_cache=False, which breaks Gemma 4's KV-sharing attention. Loss never converges and there's no error, just garbage gradients. Fixed upstream in transformers v5.5.2+. - DeepSpeed ZeRO-3 saves half-empty adapters. Training loss looks perfect, but the saved LoRA file has zero-element tensors for half the layers. The model acts like it was never fine-tuned. Workaround: don't use DeepSpeed for LoRA on Gemma 4.
- No runtime LoRA serving anywhere. Sometimes it takes a minute for vLLM and SGLang to support runtime LoRAs for Gemma 4's multimodal architecture. You have to merge weights and remap state dict keys manually before serving.
Much more detail in the blog, but hopefully it's helpful in your Gemma-4 journey as well!
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Tagged with
#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#no-code spreadsheet solutions#row zero#google sheets#rows.com#real-time data collaboration#real-time collaboration#Gemma-4#PEFT#LoRA#SFTTrainer#DeepSpeed ZeRO-3#ClippableLinear#KV-sharing attention#training loss#multimodal architecture#weights#runtime LoRA serving