Data Prep for Software Transition
Our take
I’ve been tasked with preparing approximately 7,000 lines of membership data for import into our new system. I have a solid understanding of what needs to be done to build the master data sheet for import, but I’m also aware that I don’t have the technical depth to execute every part of this process independently.
About the Data:
- Sheet 1: Base membership data + contact info ~7,000 records, each with a unique System ID (primary identifier)
- Sheet 2: Member credit balances ~50 records, includes System ID, needs to be merged into Sheet 1 by way of similar System ID
- Sheet 3: Member debit balances ~100 records, includes System ID, also needs to be merged into Sheet 1 (after Sheet 2 fields) by way of similar System ID
Tools / Limitations
I’m currently using the Excel web application via SharePoint, and I’m noticing that some features shown in tutorials (like certain tabs or tools) aren’t available to me.
I’ve come across concepts like:
- Left / Inner / Outer joins (via Python/Pandas)
- Power Query
- Indexing / lookup methods
However, I don’t have coding experience, and I’m unsure whether my current version of Excel supports Power Query or similar functionality.
Questions:
- What is the simplest way for a beginner to merge data into corresponding columns using a shared System ID?
- Is working with around 7,000 rows (and potentially up to 20,000 if guest data is included) too large to handle using column-by-column lookup methods?
- Would you recommend cleaning and validating the data before merging the sheets, or is it better to complete the merge first and then handle data cleanup?
---
I'll likely have several more questions over the next few days as I work through this process. If anyone is willing to follow along and offer guidance, I would really appreciate the support.
The final data import sheet must be completed and ready to share by April 27. Thank you in advance for your help!
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Need Excel workflow advice for multi-region data cleanup and tracking progressHi excel pros, I work for a company with about 20k employees, and I’ve got a spreadsheet of roughly 2,000 people who are missing data for two required info columns. These employees are spread out across different regions, and then further down to individual locations/teams. What I need to do is send each region only their portion of the data, have them push it out to their locations to fix, and then somehow track what’s been completed and pull everything back together into one clean file. In the past, I’ve been filtering data, saving separate files, emailing them out, then trying to keep track of who’s done what and combining everything back together. I’m worried I’m going to run into version control issues or miss updates. It’s also very cumbersome and it has ended up just being a big stressful mess in the past. I feel like there has to be a better way to handle this, but I’m not sure if I’m overcomplicating it or missing something obvious in Excel. I’m very much a basic user and not super familiar with more advanced features, but I’m willing to learn. Has anyone set up a process like this before? Appreciate any advice or ideas. Even just “here’s how I’d approach it” would be super helpful. submitted by /u/Magnolia05 [link] [comments]
- How to deal with a bulky spreadsheet that is starting to hit the limits of Excel?Hello all, I have been venturing on quite the Excel journey the past year or so. I made a corporate spreadsheet that is approaching 500k formulas and that is starting to get serious speed issues at this point. It is 2026, so I conversed with ChatGPT several times regarding the speed issue, but realized I am way better off asking the experts here anyways. What is the problem So, my spreadsheet imports flat databases with specific information regarding objects that need further analysing. The imported flat databases run from say A tot CC or something, from which I probably draw about 12-15 datafields that are used for further analysis. It 'may' be more in the future. Afterwards, said data gets 'enriched' (manually) by things that aren't in the database, also because said data needs a human eye that cannot be automated. So far, so good. Right now, each object gets analysed from several different angles. As it stands, my spreadsheet runs from A until NA or something on the Formula Page. Many columns receive data from preceding columns, that are in the turn the result of many (slightly complex) logical IF or IFS tests, many of which are nested 3 or 4 deep. Often, they work in conjunction with X.LOOKUP to retrieve values, as the columns on the formula page are not equal. For example: A until BC on the Formula Page may analyze 150 objects, BD until DD may analyse 100 objects (from the same dataset, so narrower), and so forths. Thus a lot of X.LOOKUP is required, also because the first 'block' comes up with values that need to be found with X.LOOKUP. Also, values need to be retrieved from the flat database 'import' page with X.LOOKUP. Finally, X.LOOKUP is an insurance compared to FILTER, as I am not fully convinced that empty values in the flat database always contain a space (" "). To get to the point I use many IF, IFS, AND, and if need be, OR, formulas. Thinks: tens of thousands, probably in excess of 100k. These are compounded with X.LOOKUP, or X.LOOKUP gets used copiously without those. Here too, think tens of thousands. These formulas are - as much as possible - in array format, even though I find it controversial to do that as I consider how it can create a chain of updates throughout the spreadsheet. 'Dependencies' is the name of the game, with one object receiving many possible alterations / adjustments due to manual input data, for which the spreadsheet needs to provide. Right now, when I update a value, it may take up to 4 seconds to update the spreadsheet, which is already beyond the annoyance point for me. This leads me to these (hopefully) simple questions: Is it smart to use array formulas, knowing that each thing I change should only impact that one object line (for example, row 488) and none other? It is important to mention that object 1 does not influence object 488, or any other. Any manual data field only effects the object in the row it is in. In my mind, array formulas do not make sense in that regard, as it can result in a cascade of updates, but apparantly array formulas are 'way more efficient'. Is use of a VBA library the way to go to reduce lag and create more of an instant spreadsheet again? I am not able to code in VBA yet, but I am in the slow process of learning it regardless. Alternatively: should I use LET whenever a repeated lookup is needed in the same formula? Really looking for to your answers! submitted by /u/EvolvedRevolution [link] [comments]
- How to handle data from different sources when columns are in different orders?I regularly get CSV exports from multiple clients. Each client uses their own column order. One puts names in column A and dates in column B, another swaps them. Manually rearranging every time is driving me crazy. What's your go-to method for standardizing columns from different sources? Power Query seems powerful but I'm not sure where to start. I've tried INDEX/MATCH with header lookups, but it gets messy when column names vary slightly. Also open to VBA solutions if they're reusable. Any tips or templates you'd recommend? submitted by /u/biggy_boy17 [link] [comments]
- Optimizing data entry in Excel file on SharePoint w/ 5 users: Is MS Forms + Power Automate the way?I want to optimize data entry for an Excel spreadsheet stored on SharePoint, used by 5 data enterers. It has around 300 entries/rows and 17 columns on the main worksheet. 4 other worksheets contain additional variables/columns for the same entries on the main worksheet. 8 other worksheets contain extensive instructions for the systematic literature review we are conducting. Edit: Our current approach has each of us open the Excel file in our desk top app for data entry. The issues I want to solve are: Improved data entry environment: 1) Some cells end up with extensive/long text entries. Users resize cells to optimize their view, but that screws up other people's views. We need data entry solution that allows large data entry areas for certain Excel cells, but not others. 2) Decrease users mistakenly entering data in wrong cell. formatting issues caused by each user doing their own rogue formatting (Despite discussing not messing with formatting in couple of meetings, these people, led by our boss, keep doing it!) excel sheet freezing for short periods of time mouse pointer highlighting cells above the cell to which it is pointing A bit of Googling led me to MS Forms + Power Automate. Would this solve the issues described above and be stable. Is there a better solution? Background Info I'm an intermediate/advanced excel user with experience in VBA, SQL, and stats packages (eg, SAS) working at a large federal agency Excel version: MS 365 MSO v2603 We have MS Forms and Power Automate in-house submitted by /u/tenbsmith [link] [comments]