Open-source AI data analyst - tutorial to set one up in ~45 minutes
Our take

| I’m one of the builders behind this, happy to answer questions or discuss better ways to approach this. There's a lot of hype around AI data analysts right now and honestly most of it is vague. We wanted to make something concrete, a tutorial that walks you through building one yourself using open-source tools. At least this way you can test something out without too much commitment. The way it works is that you run a few terminal commands that automatically imports your database schema and creates local yaml files that represent your tables, then analyzes your actual data and generates column descriptions, tags, quality checks, etc - basically a context layer that the AI can read before it writes any SQL. You connect it to your coding agent via Bruin MCP and write an AGENTS.md with your domain-specific context like business terms, data caveats, query guidelines (similar to an onboarding doc for new hires). It's definitely not magic and it won't revolutionize your existing workflows since data scientists already know how to do the more complex analysis, but there's always the boring part of just getting started and doing the initial analysis. We aimed to give you a guide to just start very quickly and just test it. I'm always happy to hear how you enrich your context layer, what kind of information you add. [link] [comments] |
Read on the original site
Open the publisher's page for the full experience
Related Articles
- I built an open-source dashboard-as-code toolIt is a code-first tool for building and deploying dashboards using simple YAML and JSX files (and yes, that means load-time dynamic generations of charts, tabs, and values) - the best part is that it works natively with AI agents. Essentially it is an open standard, code-first, framework optimized for AI-native analysis and business intelligence. This is my answer to the whole AI dashboard and BI tools out there, but focusing more on the framework and semantic layer so that it works better with AI agents. Today's the first day of releasing this publicly, so please share your honest feedback, skepticism, and even roast it - and if you want, give the repo a star: https://github.com/bruin-data/dac submitted by /u/uncertainschrodinger [link] [comments]
- I stopped re-explaining my database schemas to AI agentsHi r/datascience 👋 I spent most of my career working with databases, and one thing that keeps bugging me is how hard it is for AI agents to work with them. Whenever I ask Claude or GPT about my data, it either invents schemas or hallucinates details. I then have to spend the next 10 messages re-explaining everything. To fix that, I built Statespace. It's a free and open-source library to quickly build and share data apps that any AI agent on your team can discover and use. So, how does it work? Initialize a project, then ask your coding agent to help you build your data app: $ claude "Help me document my schema and build tools to safely query it" Once ready, deploy and point any agent at it: $ claude "Break down revenue by region for Q1 using https://demo.statespace.app" Works with everything You can build and deploy data apps with: Any database - psql, duckdb, sqlite3, snowflake, bq. If it has a CLI or SDK, it works Any language - Python, TypeScript, or any script you already have Any file - CSVs, Parquets, JSONs, logs. Serve them as files that agents can read and query Why you'll love it Safe by default - tool constraints ensure agents can never run DROP TABLE or DELETE Self-describing - context lives in the app itself, not in a system prompt you have to maintain Shareable - deploy to a URL, wire up as an MCP server, and share it with teammates If you're tired of re-explaining your data to every agent, I really think Statespace could help. Would love your feedback! TL;DR Streamlit for AI --- GitHub: https://github.com/statespace-tech/statespace Docs: https://docs.statespace.com A ⭐ on GitHub really helps with visibility! submitted by /u/Durovilla [link] [comments]
- Built a dashboard to analyze how AI skills are showing up in data science job postings (open source)I've been scraping thousands of U.S. data science jobs for the past couple of months and writing about the findings in my newsletter. At some point, I figured the dashboard was more useful than anything I was writing, so I decided to open source it. Here's what it covers: Top skills companies are actually hiring for, ranked by frequency Skills broken down by category (ML/DL, GenAI, Cloud, MLOps, etc.) What % of roles now require AI skills, broken down by seniority level Salary premium for candidates with AI skills An interactive explorer where you can browse individual postings with matched skills highlighted The skill extraction is built on around 230 curated keyword groups, so it's pretty granular. Code and data are all in the repo if you want to fork it or dig into the methodology. https://ai-in-ds.streamlit.app/ I'm scraping weekly, and soon I will upload all of the raw data into Kaggle, for now, you can find the data in the repo P.S. By the way, I already mentioned it to Luke Barousse since some of these AI keyword groups could be worth adding into his dashboard. submitted by /u/avourakis [link] [comments]
- Building a Personal AI Agent in a couple of HoursI’ve been so surprised by how fast individual builders can now ship real and useful prototypes. Tools like Claude Code, Google AntiGravity, and the growing ecosystem around them have crossed a threshold: you can inspect what others are building online and realize just how fast you can build today. Over the past weeks, I’ve started […] The post Building a Personal AI Agent in a couple of Hours appeared first on Towards Data Science.