NoTorch: Neural networks in pure C (2-file library, BitNet 1.58) [P]

Our take

Introducing NoTorch, a streamlined neural network training and inference library crafted entirely in pure C, designed to alleviate the frustrations of heavy installations like PyTorch. With just two files—`notorch.h` and `notorch.c`—spanning approximately 3,300 lines of code, it offers a full autograd system and 31 verified operations, making it both powerful and efficient. Experience seamless training on modest hardware, with capabilities for handling models up to 100 million parameters. Explore the GitHub repository for more insights and projects powered by NoTorch.

I'm tired of `pip install torch` eating 2.7 GB every time I want to train a 10m-param model, so I wrote NOTORCH: a complete neural network training/inference library in pure C. Two files (`notorch.h` + `notorch.c`, ~3300 LOC). No Python. Enough.

Compiles (under a second):

'''

cc -O2 notorch.c your_model.c -lm -o train

'''

**Example:** All we know Karpathy's nanoGPT, so for the sake of code I ported nanoGPT to NOTORCH and retrained from scratch on a Dracula corpus instead of Shakespeare (because enough of fairy tailes).

Same architecture, same training loop, zero PyTorch. Runs, converges, produces coherent-ish output. The link:

https://github.com/ariannamethod/nanoGPT-notorch

---

Core:

- Full autograd, 31 ops with finite-difference-verified backward

- Adam / AdamW / Chuck (our variant of Adam, dedicated to Chuck Norris RIP)

- BitNet b1.58 ternary quantization — forward + STE backward + BLAS `sgemm` fast path

- SwiGLU / GQA / RoPE / MHA / GEGLU / RMSNorm / LayerNorm

- BPE tokenizer, GGUF loader (F32/F16/Q4_0/Q5_0/Q8_0/Q4_K/Q6_K)

- LR schedules, NaN guard, gradient clipping/accumulation, checkpointing

- LoRA-style parameter freezing

- DPO / GRPO / knowledge-distillation training examples

- Apple Accelerate (macOS) / OpenBLAS (Linux) / CUDA

Brutal Reality Stress Check: two transformer trainings running concurrently on a poor **2019 Intel i5 MacBook, 8 GB RAM**, ~222 MB total for both. Not M1. Pre-AMX Intel. Import overhead: 0 ms (it's C). So even this 2019 calculator is able to handle this.

Limits: CPU-friendly up to ~100M params (let's be realistic); for bigger models you want a GPU. CUDA backend exists, CPU+BLAS is the daily driver.

GitHub repo:

https://github.com/ariannamethod/notorch

(the list of models trained on NOTORCH + projects built on it: see the README's "Projects powered by notorch" section)

Feedbacks, commits, criticism, thoughts, anything — yall are welcome.

submitted by /u/ataeff
[link] [comments]

Tagged with

#rows.com#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#row zero#real-time data collaboration#financial modeling with spreadsheets#no-code spreadsheet solutions#AI-powered spreadsheet#real-time collaboration#NoTorch#Neural networks#model training#C programming#autograd#CUDA#BitNet#inference library#Adam optimizer#ternary quantization