12 year analyst feeling like a dinosaur. Need advice on moving away from massive flat files without forcing Power BI on my team.
Our take
I’ve been a Reddit user for a long time, but I’ve recently hunted and scoured the internet for communities to help me with this problem and I heard that this was possibly the place to be.
I work as an Analyst in the CPG space (NWA area) and have been doing this for over a decade now. I know my way around a spreadsheet pretty well, but I feel like my technical skills kinda froze in time around 2020. Before COVID hit, I was finally learning proper Data Modeling and Power Pivot from a mentor, but then we got sent home, things changed and I’ve basically been surviving on VLOOKUPs and brute force ever since.
I actually inherited some pretty advanced VBA tools from that mentor back in the day, but they were built for the old legacy system (DSS/Retail Link). When the retailer migrated everything over to the new platforms (Luminate/Scintilla and Madrid), all that old automation effectively died. The new export formats and cell limits broke the old code, so I never really rebuilt them and I'm back to manual stitching.
I realized about 6 months ago that I am falling behind. I’m still building massive flat-file reports the hard way and it’s killing me.
The situation is basically this: I pull data from a vendor portal (think Unify style system) that has a hard export limit of like 3M cells. If I want to pull 52 weeks of history for 500+ items across hundreds of locations, I get the "cell count exceeds limit" error. So I have to pull it in chunks—Dollars, Units, PODs separately—and stitch them together manually. It is a massive pain.
To make it worse, my stakeholders are super old school. They want Excel files they can touch, pivot, and scribble on. If I send them a Power BI link, they won't even open it. So I need the flexibility of Excel, but the data volume is getting too big for the standard sheets I'm building.
My goal is to build a "set it and forget it" system. I want to be able to just drop those raw data exports into a folder and have Excel just "eat" them. I know Power Query is probably the answer for the stitching, but is Power Pivot still the best way to handle the data model part? I need it to handle the heavy lifting (52 weeks, store level data) but output into standard Pivot Tables that my buyers can still play with.
Also, side note on AI—everyone tells me to just use ChatGPT to write python scripts, but that’s not really what I need. I’m trying to get my data structure clean enough that I can feed the final Pivot Tables into an LLM to help me write the recaps/insights. Has anyone had luck with that workflow?
Any advice on where to start or a specific course to bridge the gap between "VLOOKUP guy" and "Data Model guy" would be awesome. Thanks.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- How to deal with a bulky spreadsheet that is starting to hit the limits of Excel?Hello all, I have been venturing on quite the Excel journey the past year or so. I made a corporate spreadsheet that is approaching 500k formulas and that is starting to get serious speed issues at this point. It is 2026, so I conversed with ChatGPT several times regarding the speed issue, but realized I am way better off asking the experts here anyways. What is the problem So, my spreadsheet imports flat databases with specific information regarding objects that need further analysing. The imported flat databases run from say A tot CC or something, from which I probably draw about 12-15 datafields that are used for further analysis. It 'may' be more in the future. Afterwards, said data gets 'enriched' (manually) by things that aren't in the database, also because said data needs a human eye that cannot be automated. So far, so good. Right now, each object gets analysed from several different angles. As it stands, my spreadsheet runs from A until NA or something on the Formula Page. Many columns receive data from preceding columns, that are in the turn the result of many (slightly complex) logical IF or IFS tests, many of which are nested 3 or 4 deep. Often, they work in conjunction with X.LOOKUP to retrieve values, as the columns on the formula page are not equal. For example: A until BC on the Formula Page may analyze 150 objects, BD until DD may analyse 100 objects (from the same dataset, so narrower), and so forths. Thus a lot of X.LOOKUP is required, also because the first 'block' comes up with values that need to be found with X.LOOKUP. Also, values need to be retrieved from the flat database 'import' page with X.LOOKUP. Finally, X.LOOKUP is an insurance compared to FILTER, as I am not fully convinced that empty values in the flat database always contain a space (" "). To get to the point I use many IF, IFS, AND, and if need be, OR, formulas. Thinks: tens of thousands, probably in excess of 100k. These are compounded with X.LOOKUP, or X.LOOKUP gets used copiously without those. Here too, think tens of thousands. These formulas are - as much as possible - in array format, even though I find it controversial to do that as I consider how it can create a chain of updates throughout the spreadsheet. 'Dependencies' is the name of the game, with one object receiving many possible alterations / adjustments due to manual input data, for which the spreadsheet needs to provide. Right now, when I update a value, it may take up to 4 seconds to update the spreadsheet, which is already beyond the annoyance point for me. This leads me to these (hopefully) simple questions: Is it smart to use array formulas, knowing that each thing I change should only impact that one object line (for example, row 488) and none other? It is important to mention that object 1 does not influence object 488, or any other. Any manual data field only effects the object in the row it is in. In my mind, array formulas do not make sense in that regard, as it can result in a cascade of updates, but apparantly array formulas are 'way more efficient'. Is use of a VBA library the way to go to reduce lag and create more of an instant spreadsheet again? I am not able to code in VBA yet, but I am in the slow process of learning it regardless. Alternatively: should I use LET whenever a repeated lookup is needed in the same formula? Really looking for to your answers! submitted by /u/EvolvedRevolution [link] [comments]
- Architecture for two separate projects: A Nationwide Template and an Automated 30+ Page ReportI am designing a data solution that needs to scale from a local proof-of-concept to a nationwide implementation. I need advice on the most stable "tool stack" to ensure reliability for non-technical end-users. Technical Proficiency / Skill Set: Power Query: Self-taught, comfortable with ETL and merging multiple data sources. VBA: Basic/Intermediate (primarily used for automation, often AI-assisted). Web/Coding: Very basic HTML/CSS knowledge, but no JavaScript/Python experience. Context: I have a logical workflow mindset, but I am seeking more of the "industry standard" to move away from manual, repetitive processes. Project A: The Nationwide Data Template (Standalone Tool) The Goal: Build a data setup to be implemented nationwide for other court locations. Input: Must process CSV files from a central authority (no direct database access due to high security). Users: Staff at all technical levels. They should only interact with 1-3 specific "result" sheets. The plan is that they will be told only to use one specific folder and of course the sheets will be protected. Constraints: I will have no access to their local folders or machines for troubleshooting. The tool must be "bulletproof" and handle data refreshes without user-side errors. Question: Is Power Query + VBA robust enough for a rollout with zero local support, or should I move the logic to Power Pivot/DAX to reduce the risk of users "breaking" the tool? Project B: The 30+ Page Reporting Nightmare (Internal Workflow) The Goal: Automate a 32-page yearly report that I personally produce. The Problem: Currently, I manually copy/paste tables and charts from Excel into Word. I started with keeping the chains but the document became so heavy it was near-crashing, and because it's so unstable, I've had to break the chains and make the data static. The Requirement: I need a smarter "bridge" between Excel and a formatted report (Word or PDF) to replace the manual "Ctrl+C / Ctrl+V" marathon. Question: What is the industry standard for this? Is this where I move to Power BI, or is there a reliable way to automate Word via Excel objects that won't crash the document? Technical Questions: Stability: For a rollout with zero local support, is a Power Query + VBA combination robust enough? Would moving the logic to Power Pivot/DAX improve stability and reduce "breakability" for end-users? Reporting Pipeline: What is the industry standard for pushing large amounts of dynamic Excel data into a formatted 30-page report? Is Power BI the logical successor here, or is there a reliable way to automate Word via Excel (e.g., Content Controls or specific VBA objects) that won't crash? Future-proofing: Given the need for a national standard, should I prioritize mastering Power BI or deepening Advanced VBA/Data Modeling within Excel to solve these specific deployment issues? Excel Version: Microsoft 365 (Build 17928.20440), Semi-Annual Enterprise Channel. submitted by /u/screaming-Zebra [link] [comments]
- Power query for a large datasetMy company uses a horrible format for its daily production sheets, but the data can be pulled through power query. I want to build a reporting tool for looking at any major trends that are currently missed. Ideally looking at part efficiency by machine type and some other descriptive data too like efficiency by shift manger etc. My problem is that even after cutting unnecessary columns and filtering unnecessary rows, it takes forever to load anything. ChatGPT isn’t all that helpful, I’d like some expert advice please! For info, rough number of rows of data is about 50,000 per year. I want to cover at least the last three years. Sheets are all saved into a folder by month, within a folder by year. submitted by /u/CanJesusSwimOnLand [link] [comments]
- Summarize simple use of Power Query for impatient experienced user with simple need?I'm a longtime Excel user, comfortable in using it for all kinds of text manipulation and basic number crunching, beginning to get a handle on tables and new 365 formulas. I've learned most of what I know from self study and enjoy it, but at the moment I'm a bit overwhelmed with trying to get started with Power Query. It seems like just the tool for what I need to do, but it's also designed for a lot more, and I'm having trouble getting my bearings since even the basic introductions assume that my needs are much more complex than they are and talk about a lot of options that are irrelevant to me. This is all I need to do: I have a workbook with a table where I keep track of work on inquiries sent to an online platform from which I can export submissions into Excel. The export contains a few columns I don't need, and lacks a few others that I add in the workbook, so what I've been doing is opening the export file, deleting unneeded columns, adding blank rows to the bottom of the table, copying the data from the export file, and pasting it in beginning with column D. Seems like just the kind of thing to automate with Power Query. But I don't want to keep the export file anywhere once I've appended its contents to the workbook. The data remains in the online platform if I need it, and storing any subsets would only waste space and cause confusion. I want to export the latest submissions, transform them with Power Query (I've already figured out how to do that), append them to my workbook, disconnect the two files, and delete the export. Sounds simple enough. But I can't seem to find out how to do this very simple thing without having to sit through long explanations of things I don't need to do and am too busy to wrap my head around right now. Can anybody point me to some simple step-by-step directions that just tell me how to do only what I need to do without bothering with all the other possible scenarios? Or is Power Query overkill for my needs? Is there another alternative? Thanks. submitted by /u/fastauntie [link] [comments]