Automate list size reduction
Our take
I'm going to try to lay this out succinctly, let's see how I do:
I am trying to build a flexible re-usable sheet to dump two reports into and receive a comparison report.
One report has a list of date time stamps for a variable number of work orders (Dt), each with a variable number of entries. The second has a list of date time stamps for a variable number of inspections that also have work orders (Mt), but also have way more work orders, and often are missing inspections for some work orders entirely.
What I want out the end is a report for each produced time stamp (Dt) that tells me how separate it is from the most recent (before or after) inspection (Mt). What I do right now is basically a min(abs(Dt-filter( (Mt)*(does Mt work order match Dt work order)))).
It works, but oh em gee the computation, because every Dt entry gets compared to EVERY Mt computation.
After that I just look at a pivot table of the results to see max time and average. This tells me if people went too long saving parts (Dt time stamp) but NOT measuring parts (Mt).
Is there a way to take each of the original lists, have them pushed/pulled to a variable width series of columns (one per Work Order), then have a variable length comparator look at the first set of columns and do that "time to closest inspection" but only with the matching column from the second set of data?
I know I could do it if I make them part of a data model and just slowly go through each work order, but I'd prefer it to just spit out the results at the end so I can hand this sheet off to others and not have to manually do the adjustments every day.
Let me know if this isn't clear and I can try to set up some example mini-tables or something to better explain my gibber jabber above.
e.t.a. uh Version 2603 (Build 19822.20150 Click-to-Run)
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- How to deal with a bulky spreadsheet that is starting to hit the limits of Excel?Hello all, I have been venturing on quite the Excel journey the past year or so. I made a corporate spreadsheet that is approaching 500k formulas and that is starting to get serious speed issues at this point. It is 2026, so I conversed with ChatGPT several times regarding the speed issue, but realized I am way better off asking the experts here anyways. What is the problem So, my spreadsheet imports flat databases with specific information regarding objects that need further analysing. The imported flat databases run from say A tot CC or something, from which I probably draw about 12-15 datafields that are used for further analysis. It 'may' be more in the future. Afterwards, said data gets 'enriched' (manually) by things that aren't in the database, also because said data needs a human eye that cannot be automated. So far, so good. Right now, each object gets analysed from several different angles. As it stands, my spreadsheet runs from A until NA or something on the Formula Page. Many columns receive data from preceding columns, that are in the turn the result of many (slightly complex) logical IF or IFS tests, many of which are nested 3 or 4 deep. Often, they work in conjunction with X.LOOKUP to retrieve values, as the columns on the formula page are not equal. For example: A until BC on the Formula Page may analyze 150 objects, BD until DD may analyse 100 objects (from the same dataset, so narrower), and so forths. Thus a lot of X.LOOKUP is required, also because the first 'block' comes up with values that need to be found with X.LOOKUP. Also, values need to be retrieved from the flat database 'import' page with X.LOOKUP. Finally, X.LOOKUP is an insurance compared to FILTER, as I am not fully convinced that empty values in the flat database always contain a space (" "). To get to the point I use many IF, IFS, AND, and if need be, OR, formulas. Thinks: tens of thousands, probably in excess of 100k. These are compounded with X.LOOKUP, or X.LOOKUP gets used copiously without those. Here too, think tens of thousands. These formulas are - as much as possible - in array format, even though I find it controversial to do that as I consider how it can create a chain of updates throughout the spreadsheet. 'Dependencies' is the name of the game, with one object receiving many possible alterations / adjustments due to manual input data, for which the spreadsheet needs to provide. Right now, when I update a value, it may take up to 4 seconds to update the spreadsheet, which is already beyond the annoyance point for me. This leads me to these (hopefully) simple questions: Is it smart to use array formulas, knowing that each thing I change should only impact that one object line (for example, row 488) and none other? It is important to mention that object 1 does not influence object 488, or any other. Any manual data field only effects the object in the row it is in. In my mind, array formulas do not make sense in that regard, as it can result in a cascade of updates, but apparantly array formulas are 'way more efficient'. Is use of a VBA library the way to go to reduce lag and create more of an instant spreadsheet again? I am not able to code in VBA yet, but I am in the slow process of learning it regardless. Alternatively: should I use LET whenever a repeated lookup is needed in the same formula? Really looking for to your answers! submitted by /u/EvolvedRevolution [link] [comments]
- How to consolidate multiple sheets into one YTD summary report?I have 52 sheets in my excel workbook. It is named in a non-traditional sequence: "Jan Wk1, Jan Wk2, Jan Wk3,...... Dec Wk 4, Dec Wk5", because it's easy for me to print a weekly report with a total for each category. https://preview.redd.it/rq7qb3x8ddyg1.png?width=814&format=png&auto=webp&s=9d101ae17b185451e5140ef1c17f86221cd83aa6 Log 2026 is what my spreadsheet looks like. I only included three weeks as an example. Cell H34 in the highlighted area, on the Jan Wk2 sheet, contains a VBT to retrieve data from the previous sheet. Each day, I also need the total from the previous day in C36:G37. I like my current setup because I can easily convert it into a weekly PDF report without any modification. However, I'm not sure how to create a YTD summary report from 52 sheets. A template of the YTD report is included in Log2026. I tried to use 3D formulas, but I have to manually change sheet names and adjust cell references when crossing columns and rows. not very pragmatic and easily make mistakes. Log 2026: https://drive.google.com/file/d/1IEavZiJiDOv4GS-8c6n8dJqHUnRfT-ft/view?usp=drive_link ***************************** Some people suggest combining a year's worth of data into a single spreadsheet, rather than 52 weekly sheets, and then using Power Query to populate the report, but I am having difficulty generating the weekly report using the preferred template. Preferred Pdf report example: https://drive.google.com/file/d/1uQW7z_XZJ_Nn5sAzcaTkXDObAu8XTi0e/view?usp=drive_link Screenshot of combined data Completed data: https://docs.google.com/spreadsheets/d/1xIJkqTOwjucN2QMT0pK6A6wmSp8U4U8A/edit?usp=drive_link&ouid=102145557861966136866&rtpof=true&sd=true highlights: data collected daily, so every day is different. data for the drivers on the bottom of the spreadsheet changes based on their starting date. So my goal is to create a weekly and yearly report with preferred template (boss's request). Any suggestion is appreciated. submitted by /u/Ok_Solid_9312 [link] [comments]
- I need to do a complex sort involving large amounts of data.Hello, I have posted before HERE. To reiterate, I work at a library and I essentially need to do a review of hundreds of thousands of lines of data compiling information about different periodical volumes into one line. They are technically all different volumes (and there is a column for that) but can be organized under a single periodical title. The raw output data will look something like this: Periodical A / Volume 1 / Last Loaned Date / Usage Number Periodical A / Volume 2 / Last Loaned Date / Usage Number Periodical A / Volume 3 / Last Loaned Date / Usage Number Periodical A / Volume 4 / Last Loaned Date / Usage Number Periodical B / Volume 1 / Last Loaned Date / Usage Number Periodical B / Volume 2 / Last Loaned Date / Usage Number etc. What I need to do is find some way of sorting or filtering or some algorithm that can first, take only the most recent date from the Last Loaned Date column among all volumes, and eliminate all others. Second, take the sum of all Usage Numbers for all volumes of that periodical and put it into a single number. Third, combine these two things into one line for the periodical in question. So the output should look something like: Periodical A / Last Loaned Date (most recent) / Usage Number (total among all) Periodical B / Last Loaned Date (most recent) / Usage Number (total among all) In this way I hope to get hundreds of thousands of lines of data down to about 50,000, representing the number of different periodicals we hold at the library. I have now moved to Excel 365 and am trying to implement the solution recommended by u/Downtown-Economics26 HERE: LINK. However, even the test case is not working for me. I am instead getting a NAME? error in the execution. This is what I see: LINK. I think I am using the exact same function as him, and I have Excel 365 so I can use DROP and GROUPBY functions, but I am still getting an error. Does anyone know why this might be the case? submitted by /u/-Ophidian- [link] [comments]
- Issue with Excel Power QueryIn my excel workbook I have a long string of queries to get the results I want, however I am noticing a small number of duplicates that SHOULDNT be able to exist. In my first query in this string, I am adding a new column (SelectionBucket), based on two other columns - Works. Then taking this SelectionBucket column, and adding another column (IsRequiredBucket) based on [SelectionBucket] returning one of the required values - Works. I then am adding an index at this time (CourseIndex) - Works. Result: Courses have Index, and SelectionBucket and IsRequiredBucket as options. Q2 (Reference to Q1): Adding Column (IsRequiredCandidate) where [IsRequiredBucket] = True - Works. Filters out to ONLY true values next, and sorted on (Name) (Ascending), (SelectionBucket) (Ascending), (EMark) (Decending) - Works. Result: Filtering down to only RequiredBuckets, sorted by Best to Worst. Q3 (Reference Q2): I group the rows based on (Name) and (SelectionBucket), call it [AllRows]. Add Column (TopRequired) with Table.FirstN(Table.Sort([AllRows], {{"EMark",Order.Descending}}),1) to return the BEST value - Works. Expand the [TopRequired] Table, excluding Name and SelectionBucket - Works. Add column (SelectionType) = "Required" This is where I am not sure if it is working or not, because for 99% of my data, this works. But for some of the entries, this isn't working. Add one more column (IsRequiredSelected) to check (SelectionType), if "Required" = TRUE. Result should be: Selection of one result for each of the buckets available per entry, and setting its (IsRequiredSelected) value to TRUE. Q4 (Reference Q1): I merge Q4(which is just Q1), to Q3, matching on (CourseIndex), Expand the merge (SelectionType). Rename (SelectionType) to (RequiredTag). Add column (IsRequiredSelected) checking [RequiredTag] to return TRUE for "Required", FALSE otherwise. Q5 (Reference Q4): Add Column (IsAdditionalCandidate) checking [IsRequiredSelected] = FALSE. Filter (IsAdditionalCandidate) for only TRUE values. Sort by (Name) (Ascending), (EMark) (Descending). Result: Check if (IsRequiredSelected) = TRUE and clearing them out. Q6 (Reference Q5): Group by Name -> [AllRows] with operation of All Rows. Add column (TopAdditional) coded =Table.FirstN(Table.Sort([AllRows], {{"EMark",Order.Descending}}),7). Expand the table [TopAdditional] excluding (Name). Add column (SelectionType) = "Additional" Result: Taking only records that are marked as "Additional" and taking the best 7 results for each (Name). Q7 is an appended query of combining Q3 and Q6, where it should take the Q3 Results, and adds the Q6 results to it, which should result in NO duplicates. Issue: I am receiving some of my entries as a duplicate through a query check, where I see the one record as both an Additional and as a Required. I am not sure WHY or where it is broken, other than where I think it is... I realize I could have done this in less Queries, however I wanted to verify along each step of the way if something went wrong, so that I could fix it as a portion, instead of having to delete and re-write everything. Please note that I CANNOT share the excel file data itself, as it contains confidential information within it. If I haven't explained a step clearly enough, please let me know and I will try to add further information on it. submitted by /u/DLCamilla [link] [comments]