excel with a lot of data - how to solve it
Our take
In the realm of data management, Excel remains an indispensable tool for many businesses, especially when handling substantial datasets. The outlined scenario of managing two large Excel tables—one for invoices and another for credit notes—illustrates a common challenge faced by professionals: navigating complexity while striving for efficiency. This company’s approach, with its reliance on multiple sheets, pivot tables, and intricate formulas, reflects a typical but cumbersome workflow. As users deal with the intricacies of data manipulation, it's crucial to seek out streamlined solutions that maintain productivity without overwhelming the software or the user.
The extensive use of formulas, such as those detailed in the original piece, showcases a deep engagement with Excel's capabilities. However, this complexity often leads to performance issues, including software crashes, especially when handling large datasets. This situation prompts an essential question: how can we simplify these processes? The solution lies in leveraging more modern tools and methodologies that are inherently more efficient. For instance, transitioning to AI-enabled spreadsheet technologies could significantly mitigate these challenges. By automating data retrieval and analysis, users can reduce reliance on extensive manual formulas and avoid the pitfalls of crashing systems. This shift would not only enhance user experience but also empower professionals to focus on data-driven decision-making rather than wrestling with cumbersome spreadsheets.
Moreover, the need for clarity in data management cannot be overstated. Users often find themselves overwhelmed by the sheer volume of data and the complexity of their Excel files. This is where the human-centered approach becomes vital. By prioritizing user outcomes, we can create environments that foster engagement and productivity. Tools that simplify data handling while providing intuitive interfaces are increasingly becoming essential in the modern workplace. As highlighted in articles like Why are my formulas 0? and Making a dynamic calendar in Excel, the demand for accessible solutions is palpable. Users are seeking not just functionality but also an experience that minimizes frustration and maximizes utility.
As we look toward the future of data management, the importance of innovation in spreadsheet technology becomes increasingly clear. The challenge presented by the original article serves as a reminder of the limitations of legacy tools in addressing modern data needs. Embracing advancements in AI and machine learning will be pivotal in transforming how we interact with data. The potential for these technologies to revolutionize data management processes is immense, making it essential for users to stay informed and ready to explore new solutions.
In conclusion, the journey from complex, error-prone spreadsheets to streamlined, intuitive data management tools is underway. As organizations continue to grapple with the challenges of large datasets, the adoption of innovative technologies will be key in fostering productivity and efficiency. The questions we must consider are: How can we further integrate AI into our data workflows? What new tools will emerge to meet these ongoing challenges? As we navigate this evolving landscape, the answers will define the future of data management and user experience.
I have 2 Excel tables, each with 5 sheets (split by branches). One table is a specification of invoices and the other is a specification of credit notes. Each is filled with up to 500 rows and 10 columns, but I only need 6 of them. I receive these two tables new every month.
These Excel files are specifications of purchased items or returned items.
In column B there is the branch name, in column C the item name, G is gender, H is the Intrastat country, J is the total price of the items, and K is the quantity.
The problem is that this company is not registered for the return of goods.
Currently, I have one large Excel file with around 19 sheets:
•5 invoices
•5 pivot tables created from those invoices
•5 credit notes
•1 instructions sheet
•1 codes sheet
•Retail: items summed by codes and branches
•Customs: here the items are split by customs tariffs
How the file works:
•Invoice sheets: for the pivot tables to work correctly, I first divide the total item amount to get the unit price (J/K). Then I add two columns with this formula: =INDEX(Šifre!A:A;MATCH(C2&H2&I2;Šifre!B:B&Šifre!E:E&Šifre!I:I;0)) this returns the item code. Then I copy everything and paste it as values (because of the large amount of data, otherwise Excel starts crashing).
•Credit note sheets: the procedure is the same as for invoices.
•Then I retrieve all this data to the “Retail” sheet using the formula: =SUMIF(MMB-R'$M:$M$E6'MB-R'!$J:$J)
•Customs sheet: approximately the same formula as above, but here I do not sum by branches, but by tariffs. Here everything is summed together by minus and plus: =IF(SUMIFS('na drobno'!N:N;'na drobno'!B:B;Y8;'na drobno'!C:C;X8)=0;"";SUMIFS('na drobno'!N:N;'na drobno'!B:B;Y8;'na drobno'!C:C;X8)) Next to that, I have a separate table that outputs only the negative amount in case plus and minus result in a negative value: =IF(OR(AA8<0;AB8<0;AC8<0);IF(SUMIFS('na drobno'!L:L;'na drobno'!B:B;Y8;'na drobno'!C:C;X8)=0;"";SUMIFS('na drobno'!L:L;'na drobno'!B:B;Y8;'na drobno'!C:C;X8));""). And finally, in the case where plus and minus result in a negative value, I display only the positive amount; otherwise, I display the combined plus and minus result: - IF(AND(AA9>0;AB9>0;AC9>0);IF(SUMIFS('na drobno'!P:P;'na drobno'!B:B;Y9;'na drobno'!C:C;X9)=0;"";SUMIFS('na drobno'!P:P;'na drobno'!B:B;Y9;'na drobno'!C:C;X9));IF(SUMIFS('na drobno'!H:H;'na drobno'!B:B;Y9;'na drobno'!C:C;X9)=0;"";SUMIFS('na drobno'!H:H;'na drobno'!B:B;Y9;'na drobno'!C:C;X9))).
I hope I explained it as clearly as possible.
Now the question is whether this can be done in a simpler way — something as straightforward as possible and that causes Excel to crash as little as possible.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Workbook from Microsoft Form encountering very long load times from excessive complex formulasGood evening, I work in a food production plant in Shipping and Receiving. We have had Microsoft Forms for entering in daily cases produced, cases shipped, and a separate form for doing time studies on trucks that come in, how long to load or unload said truck, and when they leave. I have had a manual workbook to fill in all of this data basically again (this information gets entered into these daily reports we fill out in our Microsoft forms) but to organize it into an easy daily report to give us truck In to Out averages, loading time averages, cases produced vs what was scheduled to produce, etc.. A big issue I have had with this manual data entry workbook, which are done month by month, is the amount of formulas which I have in it..(multiplying cases by item number to give us weight and how many skids, calculating our scheduled amount to produce against what's actually produced, giving percentages, many conditional formatted cells to easily show if we are in the green or red, etc.) Now my boss has always wanted a workbook to do what my manual workbook does but to grab the data from the Excel workbook that these Microsoft forms load the data into. The problem before was we had two separate Microsoft forms for daily cases produced/shipped and the one for our time studies. But I went ahead and made one form which would do both. I was able to copy over many sheets and formulas from my manual workbook into the Excel spreadsheet that loads in the data from this Microsoft Form. My boss really wants it to work indefinitely.. The problem I am encountering which I was afraid of, is the amount of formulas in this one workbook is way too much for a computer to handle. Changing 1 thing results in it needing to calculate a thread for like 20-30 minutes (like with the manual excel spreadsheet, the manual processor has been set to 1). Am I just going about this all wrong? Is there a better way to grab the data from this form that isn't going to overload a computer? Do I make separate workbooks pulling from this form's Excel workbook and just keep the daily report with the initial Microsoft Form workbook (but then would those workbooks update automatically as well?) I imagine there is a way to achieve what my boss is wanting, but my experience with Excel is only so advanced. I'm aware there are other programs or other tools of excel, and that is why I came onto this subreddit for advice. Please help me 🙇🏻♂️ submitted by /u/maverickrose [link] [comments]
- What would you do with this task, and how long would it take you to do it?I'm going to describe a situation as specifically as I can. I am curious what people would do in this situation, I worry that I complicate things for myself. I'm describing the whole task as it was described to me and then as I discovered it. Ultimately, I'm here to ask you, what do you do, and how long does it take you to do it? I started a new role this month, I am new to advertising modeling methods like mmm, so I am reading a lot about how to apply the methods specific to mmm in R and python, I use VScode, I don't have a github copilot license, I get to use copilot through windows office license. Although this task did not involve modeling, I do want to ask about that kind of task another day if this goes over well. The task 5, excel sheets are to be provided. You are told that this is a clients data that was given to another party for some other analysis and augmentation. This is a quality assurance task. The previous process was as follows; the data the data structure: 1 workbook per industry for 5 industries 4 workbooks had 1 tab, 1 workbook had 3 tabs each tab had a table that had a date column in days, 2 categorical columns advertising_partner, line_of_business and at least 2 numeric columns per work book. some times data is updated from our side and the partner has to redownload the data and reprocess and share again the process this is done once per client, per quarter (but it's just this client for now) open each workbook navigate to each tab the data is in a "controllable" table bing bing home home impressions spend partner dropdown line of business dropdown where bing and home are controlled with drop down toggles, with a combination of 3-4 categories each. compare with data that is to be downloaded from a tableau dashboard end state: the comparison of the metrics in tableau to the excel tables to ensure that "the numbers are the same" the categories presented map 1 to 1 with the data you have downloaded from tableau aggregate the data in a pivot table, select the matching categories, make sure the values match additional info about the file the summary table is a complicated sumproduct look up table against an extremely wide table hidden to the left. the summary table can start as early as AK and as late as FE. there are 2 broadly different formats of underlying data in the 5 notebooks, with small structure differences between the group of 3. in the group of 3 the structure of this wide table is similar to the summary table with categories in the column headers describing the metric below it. but with additional categories like region, which is the same value for every column header. 1 of these tables has 1 more header category than the other 2 the left most columns have 1 category each, there are 3 date columns for day, quarter. REGION USA USA USA PARTNER bing bing google LOB home home auto impressions spend ...etc date quarter impressions spend ...etc 2023-01-01 q1 1 2 ...etc 2023-01-02 q1 3 4 ...etc in the group of 2 the left most categories are actually the categorical headers in the group of 3, and the metrics, the values in each category mach the dates are now the headers of this very wide table the header labels are separated from the start of the values by 1 column there is an empty row immediately below the final row for column headers. date Label 2023-01-01 2023-01-02 year 2023 2023 quarter q1 q1 blank row REGION PARTNER LOB measure blank row US bing home impressions 1 3 US bing home spend 2 4 US google auto ...etc ...etc ... etc The question is, what do you do, and how long does it take you to do it? I am being honest here, I wrote out this explaination basically in the order in which I was introduced to the information and how I discovered it. (Oh it's easy if it's all the same format even if it's weird, oh there are 2-ish different formatted files) the meeting of this task ended at 11:00AM. I saw this copy paste manual etl project and I simply didn't want to do it. So I outlined my task by identifying the elements of the table, column name ranges, value ranges, stacked / pivoted column ranges, etc... for an R script to extract that data. by passing the ranges of that content to an argument make_clean_table(left_columns="B4:E4", header_dims=c(..etc)) and functions that extract that convert that excel range into the correct position in the table to extract that element. Then the data was transformed to create a tidy long table. the function gets passed once per notebook extracting the data from each worksheet, building a single table with the columns for the workbook industry, the category in the tab, partner, line of business, spend, impressions, etc... IMO; ideally (if I have to check their data in excel that is), I'd like the partner to redo their report so that I received a workbook with the underlying data in a traditionally tabular form and their reporting page to use power query and table references and not cell ranges and formula. submitted by /u/TheTresStateArea [link] [comments]
- How to deal with a bulky spreadsheet that is starting to hit the limits of Excel?Hello all, I have been venturing on quite the Excel journey the past year or so. I made a corporate spreadsheet that is approaching 500k formulas and that is starting to get serious speed issues at this point. It is 2026, so I conversed with ChatGPT several times regarding the speed issue, but realized I am way better off asking the experts here anyways. What is the problem So, my spreadsheet imports flat databases with specific information regarding objects that need further analysing. The imported flat databases run from say A tot CC or something, from which I probably draw about 12-15 datafields that are used for further analysis. It 'may' be more in the future. Afterwards, said data gets 'enriched' (manually) by things that aren't in the database, also because said data needs a human eye that cannot be automated. So far, so good. Right now, each object gets analysed from several different angles. As it stands, my spreadsheet runs from A until NA or something on the Formula Page. Many columns receive data from preceding columns, that are in the turn the result of many (slightly complex) logical IF or IFS tests, many of which are nested 3 or 4 deep. Often, they work in conjunction with X.LOOKUP to retrieve values, as the columns on the formula page are not equal. For example: A until BC on the Formula Page may analyze 150 objects, BD until DD may analyse 100 objects (from the same dataset, so narrower), and so forths. Thus a lot of X.LOOKUP is required, also because the first 'block' comes up with values that need to be found with X.LOOKUP. Also, values need to be retrieved from the flat database 'import' page with X.LOOKUP. Finally, X.LOOKUP is an insurance compared to FILTER, as I am not fully convinced that empty values in the flat database always contain a space (" "). To get to the point I use many IF, IFS, AND, and if need be, OR, formulas. Thinks: tens of thousands, probably in excess of 100k. These are compounded with X.LOOKUP, or X.LOOKUP gets used copiously without those. Here too, think tens of thousands. These formulas are - as much as possible - in array format, even though I find it controversial to do that as I consider how it can create a chain of updates throughout the spreadsheet. 'Dependencies' is the name of the game, with one object receiving many possible alterations / adjustments due to manual input data, for which the spreadsheet needs to provide. Right now, when I update a value, it may take up to 4 seconds to update the spreadsheet, which is already beyond the annoyance point for me. This leads me to these (hopefully) simple questions: Is it smart to use array formulas, knowing that each thing I change should only impact that one object line (for example, row 488) and none other? It is important to mention that object 1 does not influence object 488, or any other. Any manual data field only effects the object in the row it is in. In my mind, array formulas do not make sense in that regard, as it can result in a cascade of updates, but apparantly array formulas are 'way more efficient'. Is use of a VBA library the way to go to reduce lag and create more of an instant spreadsheet again? I am not able to code in VBA yet, but I am in the slow process of learning it regardless. Alternatively: should I use LET whenever a repeated lookup is needed in the same formula? Really looking for to your answers! submitted by /u/EvolvedRevolution [link] [comments]
- Brainstorming a new table layoutHey everyone. I have a predicament at work and I can't figure it out. Reaching out to the internet for assistance. I have a giant table for pricing that at a high level is rows of products (numbering around 500) and columns (around 300) representing cost, product sizes, ID's, and discounts. Currently these are broken out by a national section and 6 division (state groupings) sections that have their own discount columns. It's cumbersome due to its size but the person charged with entering the data likes it, so it has stayed this way. However, the business has outgrown it and now wants to get more granular with the data. They need discounts at the state and city level. Which I could just expand our current table but I estimate that would create a file with around 2,000+ columns, and around 95% of those would be blank. The reason it gets so long is that each discount has to have its own column and that discount could have all/none of the products. So every level of granularity just compounds this issue. I'm here looking for better ways to handle this data. Right now the best I've come up with is that I create a 2nd workbook that just handles the state/city level, it would still be awkward and add a lot of duplicate work though (maintaining product names/ids/etc) Currently I run the workbook through a power query which condenses it all and spits out individual sales books based on region data. I would plan on combining these two books into one dataset in the future. And that's an issue for another time. I'm decent with excel/power query. I'm the company "excel guy" but I know enough to know that I don't know much. submitted by /u/UsedMeats [link] [comments]