•1 min read•from Machine Learning
[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts
Our take
Introducing an innovative open-source prototype that applies Unix philosophy to machine learning pipelines, we have designed a modular system where each stage—such as PII redaction, chunking, deduplication, embeddings, and evaluation—functions independently with defined typed contracts. This architecture allows for easy swapping of components, enabling users to efficiently test and compare the impact of individual changes on retrieval performance. As we refine this prototype, we invite feedback on our design assumptions. Explore our repository for further details: [GitHub Repo](https://github.com/mloda-ai/rag_integration).
We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embeddings, eval) is its own plugin with a typed contract, like pipes between Unix tools. The motivation: we swapped a chunker and retrieval got worse, but could not isolate whether it was the chunking or something breaking downstream. With each stage independently swappable, you change one option, re-run eval, and compare precision/recall directly. ```python Feature("docs__pii_redacted__chunked__deduped__embedded__evaluated", options={ "redaction_method": "presidio", "chunking_method": "sentence", "embedding_method": "tfidf", }) ``` Each `__` is a stage boundary. Swap any piece, the rest stays the same. Still a prototype, not production. Looking for feedback on whether the design assumptions hold up. Repo: [https://github.com/mloda-ai/rag_integration](https://github.com/mloda-ai/rag_integration) [link] [comments]
Read on the original site
Open the publisher's page for the full experience
Tagged with
#rows.com#financial modeling with spreadsheets#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#self-service analytics tools#business intelligence tools#collaborative spreadsheet tools#no-code spreadsheet solutions#data visualization tools#data analysis tools#Unix philosophy#ML pipelines#retrieval pipelines#modular#PII redaction#swappable stages#chunking#typed contracts#dedup