Formalizing statistical learning theory in Lean 4 [R]

Our take

In my Lean 4 project, I am formalizing key components of statistical learning theory, aiming to create a structured "theorem ladder" that enhances readability and pedagogical value. Current results include finite-class ERM bounds, Rademacher symmetrization, and PAC-Bayes bounds, among others. Unlike existing Lean SLT efforts that emphasize abstract probability, my focus is on explicit finite-sample approaches and coherent theorem chains. I welcome feedback on theorem organization, proof structure, naming decisions, and suggestions for future formalization targets. Your insights would be invaluable. Thank you, R. S

Formalizing statistical learning theory in Lean 4 [R]

I’ve been working on a Lean 4 project focused on formalizing parts of statistical learning theory:

FormalSLT repository

Current results include:

finite-class ERM bounds
Rademacher symmetrization
high-probability Rademacher bounds
Sauer–Shelah / VC-dimension bridge
finite scalar contraction
linear predictor bounds
finite PAC-Bayes bounds
algorithmic stability

The main idea is to build a readable and pedagogically structured “theorem ladder” for ML theory rather than just isolated declarations.

I’m trying to keep:

explicit assumptions
scoped theorem statements
zero sorry
close alignment with standard SLT presentations

Compared to some existing Lean SLT efforts that focus more heavily on empirical-process infrastructure and abstract probability machinery, this project is currently more focused on explicit finite-sample PAC/Rademacher/stability routes and readable end-to-end theorem chains.

I’d especially appreciate feedback on:

theorem organization
proof structure
naming/API decisions
useful next formalization targets

Thank you,
R. S

submitted by /u/trickyrex1
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article →

Tagged with

#rows.com#machine learning in spreadsheet applications#natural language processing for spreadsheets#generative AI for data analysis#row zero#Excel alternatives for data analysis#financial modeling with spreadsheets#no-code spreadsheet solutions#spreadsheet API integration#statistical learning theory#Lean 4#finite-class ERM bounds#Rademacher symmetrization#high-probability Rademacher bounds#Sauer-Shelah VC-dimension bridge#finite scalar contraction#linear predictor bounds#finite PAC-Bayes bounds#algorithmic stability#theorem ladder