Today, I’m launching DAAF, the Data Analyst Augmentation Framework: an open-source, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their expertise and accelerate data analysis by 5-10x -- * without * sacrificing scientific transparency, rigor, or reproducibility
Our take
Today, I’m launching DAAF, the Data Analyst Augmentation Framework: an open-source, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their expertise and accelerate data analysis by as much as 5-10x -- without sacrificing the transparency, rigor, or reproducibility demanded by our core scientific principles. And you (yes, YOU) can install and begin using it in as little as 10 minutes from a fresh computer with a high-usage Anthropic account (crucial accessibility caveat, it’s unfortunately very expensive!).
DAAF explicitly embraces the fact that LLM-based research assistants will never be perfect and can never be trusted as a matter of course. But by providing strict guardrails, enforcing best practices, and ensuring the highest levels of auditability possible, DAAF ensures that LLM research assistants can still be immensely valuable for critically-minded researchers capable of verifying and reviewing their work. In energetic and vocal opposition to deeply misguided attempts to replace human researchers, DAAF is intended to be a force-multiplying "exo-skeleton" for human researchers (i.e., firmly keeping humans-in-the-loop).
The base framework comes ready out-of-the-box to analyze any or all of the 40+ foundational public education datasets available via the Urban Institute Education Data Portal (https://educationdata.urban.org/documentation/), and is readily extensible to new data domains and methodologies with a suite of built-in tools to ingest new data sources and craft new Skill files at will!
With DAAF, you can go from a research question to a shockingly nuanced research report with sections for key findings, data/methodology, and limitations, as well as bespoke data visualizations, with only five minutes of active engagement time, plus the necessary time to fully review and audit the results (see my 10-minute video demo walkthrough). To that crucial end of facilitating expert human validation, all projects come complete with a fully reproducible, documented analytic code pipeline and consolidated analytic notebooks for exploration. Then: request revisions, rethink measures, conduct new subanalyses, run robustness checks, and even add additional deliverables like interactive dashboards, policymaker-focused briefs, and more -- all with just a quick ask to Claude. And all of this can be done *in parallel* with multiple projects simultaneously.
By open-sourcing DAAF under the GNU LGPLv3 license as a forever-free and open and extensible framework, I hope to provide a foundational resource that the entire community of researchers and data scientists can use, learn from, and extend via critical conversations and collaboration together. By pairing DAAF with an intensive array of educational materials, tutorials, blog deep-dives, and videos via project documentation and the DAAF Field Guide Substack (MUCH more to come!), I also hope to rapidly accelerate the readiness of the scientific community to genuinely and critically engage with AI disruption and transformation writ large.
I don't want to oversell it: DAAF is far from perfect (much more on that in the full README!). But it is already extremely useful, and my intention is that this is the worst that DAAF will ever be from now on given the rapid pace of AI progress and (hopefully) community contributions from here. What will tools like this look like by the end of next month? End of the year? In two years? Opus 4.6 and Codex 5.3 came out literally as I was writing this! The implications of this frontier, in my view, are equal parts existentially terrifying and potentially utopic. With that in mind – more than anything – I just hope all of this work can somehow be useful for my many peers and colleagues trying to "catch up" to this rapidly developing (and extremely scary) frontier.
Learn more about my vision for DAAF, what makes DAAF different from other attempts to create LLM research assistants, what DAAF currently can and cannot do as of today, how you can get involved, and how you can get started with DAAF yourself!
Never used Claude Code? No idea where you'd even start? My full installation guide walks you through every step -- but hopefully this video shows how quick a full DAAF installation can be from start-to-finish. Just 3mins!
So there it is. I am absolutely as surprised and concerned as you are, believe me. With all that in mind, I would *love* to hear what you think, what your questions are, what you’re seeing if you try testing it out, and absolutely every single critical thought you’re willing to share, so we can learn on this frontier together. Thanks for reading and engaging earnestly!
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Open-source AI data analyst - tutorial to set one up in ~45 minutesI’m one of the builders behind this, happy to answer questions or discuss better ways to approach this. There's a lot of hype around AI data analysts right now and honestly most of it is vague. We wanted to make something concrete, a tutorial that walks you through building one yourself using open-source tools. At least this way you can test something out without too much commitment. The way it works is that you run a few terminal commands that automatically imports your database schema and creates local yaml files that represent your tables, then analyzes your actual data and generates column descriptions, tags, quality checks, etc - basically a context layer that the AI can read before it writes any SQL. You connect it to your coding agent via Bruin MCP and write an AGENTS.md with your domain-specific context like business terms, data caveats, query guidelines (similar to an onboarding doc for new hires). It's definitely not magic and it won't revolutionize your existing workflows since data scientists already know how to do the more complex analysis, but there's always the boring part of just getting started and doing the initial analysis. We aimed to give you a guide to just start very quickly and just test it. I'm always happy to hear how you enrich your context layer, what kind of information you add. submitted by /u/PolicyDecent [link] [comments]
- Why we’re still using 1980s logic for 2026 data problems (and how I'm trying to fix it).Hi everyone, I’m a CSIE student in Taiwan, and I’ve spent the last semester obsessing over why "data organization" still feels like manual labor. We have incredible processing power, yet most of us are still stuck in the "Shovel Era", manually digging through rows, fixing broken VLOOKUPs, and praying our CSV imports don't break. I wanted to share three specific "Excel Pains" I’ve been researching while building my own organizer, and I’d love to hear if you’ve found better ways to handle them: 1. The "Syntax Trap" vs. Human Intent Most people spend 80% of their time worrying about where the comma goes in a nested IF statement and only 20% on what the data actually means. I believe we are moving toward a "Semantic Era" where the computer should understand that "March 26" and "03/26/26" are the same thing without us writing a regex script. 2. The "Final_v2_FINAL_ActuallyFinal.xlsx" Nightmare File organization usually falls apart because our tools don't track the lineage of data. When we move from a messy raw file to a "clean" one, we lose the context of the original. I've been experimenting with building a "Tractor" for this—a system where the AI maintains a "Kanban" of data states so you can see the evolution of your project visually. 3. The 2FA/Security Gap in Spreadsheets We put our lives into Excel files, but standard spreadsheets are notoriously easy to leak or lose. I’ve been implementing 2FA data protection into my workflow because "Data Organization" shouldn't just be about sorting; it should be about stewardship. The Project: Dxtreame Organizer To solve these, I’ve been building Dxtreame Organizer. It’s an AI-driven tool meant to bridge that gap between messy raw data and structured, formula-ready Excel sheets. Current Progress: I've got the AI sorting engine running, 2FA protection live, and I'm currently designing a graph-view to replace the "wall of numbers" we usually stare at. The Goal: I’m currently fundraising as an international student to scale the infrastructure. My vision is to get rid of the "reason to learn syntax" entirely, so we can focus on the Vision instead of the Code. I’m looking for brutally honest feedback: What is the one thing in Excel that makes you want to throw your laptop out a window? If an AI could "auto-clean" your files, what is the one thing you would NEVER trust it to do alone? Thanks for reading, I'm looking forward to the "logic vs. automation" debate in the comments! submitted by /u/Dxxx101 [link] [comments]