Pull static data from one sheet to use in a VLookup, then output static data + returned values? Can my process be improved?
Our take
Hi all,
I have a weekly process at work that I am hoping to improve, and preferably completely automate. With the help of this subreddit I have taken this process from hundreds of steps down to a few dozen, but I think more can be done – I am hoping for suggestions (and if automation seems possible, a point in the right direction as to where to start). I will do my best to describe the process without droning on, but there are a number of nuances which I feel may be important.
Our largest client sends us a weekly census file for their retirees, which contains information (46 columns) that affect the type of materials I am responsible for sending, as well as the content of those materials, per household (retiree + any dependents). Each file can contain between 1 and 30 retirees. To complete my tasks, I use four separate excel files:
- The census file
- The “calculator,” which generates a contribution amount based whether the retiree (and their dependent(s), if applicable) are over or under 65, years of service and what division the employee worked in. It seems like the process is a series of IF formulas, paired with a VLOOKUP.
- A “premium payment tool” which essentially subtracts the contribution from the premium amounts and is formatted for output onto the master mailing list
- A master mailing list, which includes the retiree’s demographic info, plus their coverage options and individualized rates
To say the process is convoluted is an understatement. Each household is one row on the census file – the client provides the ages and disability status of all household members. My current process:
Copy/Paste the retiree’s: first and last name, DOB, SSN, address, and phone number. The order on the client’s file does not match the order that my company uses. I typically will copy/paste one column at a time for all rows in the census file, then complete the following steps one row at a time
Determine the ages of the retiree, their spouse, and their children (if applicable). If all household members are in the same age group (Pre or Post-65) based on the date of retirement, only one entry on the Mailing file is required. If one household member is in a different age group than the other(s), two entries on the Mailing file are required. Additionally, if a household member has an “E” in the disability column of the file, they are treated as being Post-65 regardless of age.
3. Using the Calculator, determine the employee's monthly contribution amount based on service years, group, and section type. The process is repeated if the employee has a spouse, and again if they have a child dependent (so from 1-3 times per employee).
4. Copy the monthly contribution amount for each household member into the Premium Payment tool – there are two sheets, one for Pre-65 and one for Post-65. Each sheet has cells for the Retiree, Spouse, and Child. The contribution amount is subtracted from the premium rates and formatted for output to the Mailing file. If there is no contribution amount under one household member, the output returns a “N/A” under that coverage. Ex:
5. Copy the “You Pay” amount onto the Mailing file
Besides the time it takes to complete this process for larger files, the sheer amount of copy/pasting required leaves a lot of room for error. What I am working with now is infinitely better than what the girl before me was doing. I was able to add enough formulas to reduce the copy/pasting by about 75% but I suspect more can be done to automate things – I’m just not sure if it’s feasible solely in Excel. I’d love to hear what the community thinks, please let me know what additional info I can provide. Thank you all so much!
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- How to deal with a bulky spreadsheet that is starting to hit the limits of Excel?Hello all, I have been venturing on quite the Excel journey the past year or so. I made a corporate spreadsheet that is approaching 500k formulas and that is starting to get serious speed issues at this point. It is 2026, so I conversed with ChatGPT several times regarding the speed issue, but realized I am way better off asking the experts here anyways. What is the problem So, my spreadsheet imports flat databases with specific information regarding objects that need further analysing. The imported flat databases run from say A tot CC or something, from which I probably draw about 12-15 datafields that are used for further analysis. It 'may' be more in the future. Afterwards, said data gets 'enriched' (manually) by things that aren't in the database, also because said data needs a human eye that cannot be automated. So far, so good. Right now, each object gets analysed from several different angles. As it stands, my spreadsheet runs from A until NA or something on the Formula Page. Many columns receive data from preceding columns, that are in the turn the result of many (slightly complex) logical IF or IFS tests, many of which are nested 3 or 4 deep. Often, they work in conjunction with X.LOOKUP to retrieve values, as the columns on the formula page are not equal. For example: A until BC on the Formula Page may analyze 150 objects, BD until DD may analyse 100 objects (from the same dataset, so narrower), and so forths. Thus a lot of X.LOOKUP is required, also because the first 'block' comes up with values that need to be found with X.LOOKUP. Also, values need to be retrieved from the flat database 'import' page with X.LOOKUP. Finally, X.LOOKUP is an insurance compared to FILTER, as I am not fully convinced that empty values in the flat database always contain a space (" "). To get to the point I use many IF, IFS, AND, and if need be, OR, formulas. Thinks: tens of thousands, probably in excess of 100k. These are compounded with X.LOOKUP, or X.LOOKUP gets used copiously without those. Here too, think tens of thousands. These formulas are - as much as possible - in array format, even though I find it controversial to do that as I consider how it can create a chain of updates throughout the spreadsheet. 'Dependencies' is the name of the game, with one object receiving many possible alterations / adjustments due to manual input data, for which the spreadsheet needs to provide. Right now, when I update a value, it may take up to 4 seconds to update the spreadsheet, which is already beyond the annoyance point for me. This leads me to these (hopefully) simple questions: Is it smart to use array formulas, knowing that each thing I change should only impact that one object line (for example, row 488) and none other? It is important to mention that object 1 does not influence object 488, or any other. Any manual data field only effects the object in the row it is in. In my mind, array formulas do not make sense in that regard, as it can result in a cascade of updates, but apparantly array formulas are 'way more efficient'. Is use of a VBA library the way to go to reduce lag and create more of an instant spreadsheet again? I am not able to code in VBA yet, but I am in the slow process of learning it regardless. Alternatively: should I use LET whenever a repeated lookup is needed in the same formula? Really looking for to your answers! submitted by /u/EvolvedRevolution [link] [comments]
- What would you do with this task, and how long would it take you to do it?I'm going to describe a situation as specifically as I can. I am curious what people would do in this situation, I worry that I complicate things for myself. I'm describing the whole task as it was described to me and then as I discovered it. Ultimately, I'm here to ask you, what do you do, and how long does it take you to do it? I started a new role this month, I am new to advertising modeling methods like mmm, so I am reading a lot about how to apply the methods specific to mmm in R and python, I use VScode, I don't have a github copilot license, I get to use copilot through windows office license. Although this task did not involve modeling, I do want to ask about that kind of task another day if this goes over well. The task 5, excel sheets are to be provided. You are told that this is a clients data that was given to another party for some other analysis and augmentation. This is a quality assurance task. The previous process was as follows; the data the data structure: 1 workbook per industry for 5 industries 4 workbooks had 1 tab, 1 workbook had 3 tabs each tab had a table that had a date column in days, 2 categorical columns advertising_partner, line_of_business and at least 2 numeric columns per work book. some times data is updated from our side and the partner has to redownload the data and reprocess and share again the process this is done once per client, per quarter (but it's just this client for now) open each workbook navigate to each tab the data is in a "controllable" table bing bing home home impressions spend partner dropdown line of business dropdown where bing and home are controlled with drop down toggles, with a combination of 3-4 categories each. compare with data that is to be downloaded from a tableau dashboard end state: the comparison of the metrics in tableau to the excel tables to ensure that "the numbers are the same" the categories presented map 1 to 1 with the data you have downloaded from tableau aggregate the data in a pivot table, select the matching categories, make sure the values match additional info about the file the summary table is a complicated sumproduct look up table against an extremely wide table hidden to the left. the summary table can start as early as AK and as late as FE. there are 2 broadly different formats of underlying data in the 5 notebooks, with small structure differences between the group of 3. in the group of 3 the structure of this wide table is similar to the summary table with categories in the column headers describing the metric below it. but with additional categories like region, which is the same value for every column header. 1 of these tables has 1 more header category than the other 2 the left most columns have 1 category each, there are 3 date columns for day, quarter. REGION USA USA USA PARTNER bing bing google LOB home home auto impressions spend ...etc date quarter impressions spend ...etc 2023-01-01 q1 1 2 ...etc 2023-01-02 q1 3 4 ...etc in the group of 2 the left most categories are actually the categorical headers in the group of 3, and the metrics, the values in each category mach the dates are now the headers of this very wide table the header labels are separated from the start of the values by 1 column there is an empty row immediately below the final row for column headers. date Label 2023-01-01 2023-01-02 year 2023 2023 quarter q1 q1 blank row REGION PARTNER LOB measure blank row US bing home impressions 1 3 US bing home spend 2 4 US google auto ...etc ...etc ... etc The question is, what do you do, and how long does it take you to do it? I am being honest here, I wrote out this explaination basically in the order in which I was introduced to the information and how I discovered it. (Oh it's easy if it's all the same format even if it's weird, oh there are 2-ish different formatted files) the meeting of this task ended at 11:00AM. I saw this copy paste manual etl project and I simply didn't want to do it. So I outlined my task by identifying the elements of the table, column name ranges, value ranges, stacked / pivoted column ranges, etc... for an R script to extract that data. by passing the ranges of that content to an argument make_clean_table(left_columns="B4:E4", header_dims=c(..etc)) and functions that extract that convert that excel range into the correct position in the table to extract that element. Then the data was transformed to create a tidy long table. the function gets passed once per notebook extracting the data from each worksheet, building a single table with the columns for the workbook industry, the category in the tab, partner, line of business, spend, impressions, etc... IMO; ideally (if I have to check their data in excel that is), I'd like the partner to redo their report so that I received a workbook with the underlying data in a traditionally tabular form and their reporting page to use power query and table references and not cell ranges and formula. submitted by /u/TheTresStateArea [link] [comments]
- Data Prep for Software TransitionI’ve been tasked with preparing approximately 7,000 lines of membership data for import into our new system. I have a solid understanding of what needs to be done to build the master data sheet for import, but I’m also aware that I don’t have the technical depth to execute every part of this process independently. About the Data: Sheet 1: Base membership data + contact info ~7,000 records, each with a unique System ID (primary identifier) Sheet 2: Member credit balances ~50 records, includes System ID, needs to be merged into Sheet 1 by way of similar System ID Sheet 3: Member debit balances ~100 records, includes System ID, also needs to be merged into Sheet 1 (after Sheet 2 fields) by way of similar System ID Tools / Limitations I’m currently using the Excel web application via SharePoint, and I’m noticing that some features shown in tutorials (like certain tabs or tools) aren’t available to me. I’ve come across concepts like: Left / Inner / Outer joins (via Python/Pandas) Power Query Indexing / lookup methods However, I don’t have coding experience, and I’m unsure whether my current version of Excel supports Power Query or similar functionality. Questions: What is the simplest way for a beginner to merge data into corresponding columns using a shared System ID? Is working with around 7,000 rows (and potentially up to 20,000 if guest data is included) too large to handle using column-by-column lookup methods? Would you recommend cleaning and validating the data before merging the sheets, or is it better to complete the merge first and then handle data cleanup? --- I'll likely have several more questions over the next few days as I work through this process. If anyone is willing to follow along and offer guidance, I would really appreciate the support. The final data import sheet must be completed and ready to share by April 27. Thank you in advance for your help! submitted by /u/Complex_Chemistry807 [link] [comments]
- Monthly tracking workbook I use to track employee sales metrics; Trying to find a way to make the process less labour intensiveTruly having a hard to describing my issue effectively but hoping someone can help. First time posting here and I'm by no means an expert with excel, so please be kind! I have a monthly workbook where I track each employees revenue and other metrics. Every 2 weeks for payroll, I provide a print out of these numbers, and the payroll sheet pulls data from multiple sheets in the workbook. For example, every workbook has a separate sheet for each day of the month, titled "1" through "31". I have pay period sheets, so I'll use one titled "04.02.26 - 04.15.26". Then I'll have the data for each employee pulled from multiple sheets. For example, I use the formula =SUM('2:15'!E3) to pull the sales data from each day of that period for the specific employee. This works quite well. However, when I create a new month's spreadsheet, I have to manually alter this formula for each employee and for each data point (more than just revenue, at least 6 different data points for 7 employees). Is there a way to automate this? For example, a cell or two where I'm able to enter the date range and all of the formulas update to that date range for the corresponding pages? I'm sorry of this post is confusing. Truly it's confusing even typing it! submitted by /u/Ok_Smile9222 [link] [comments]