Power Query Help Merging Spreadsheets with Compound Headers
Our take
I've been using Power Query to combine tables from multiple workbooks. It's fairly straightforward when every worksheet has exactly the same headers in exactly the same order in the top row. However, I'm trying to merge many different spreadsheets which have "compound headers," meaning each column's unique identifier is the top 4 rows. (Row 1 is the result type, Row 2 is the Cycle Number, Row 3 is the Analyte Name, and Row 4 is the Unit.) To complicate things even more, not every spreadsheet contains data for all the same headers, because sometimes we don't test all the same chemicals, and sometimes we don't test in the same units. See an example of just 4 of the worksheets I'm working with here: https://drive.google.com/drive/folders/1iZpvy7OSmltpduB0DdGKoGRTysHXHu_j?usp=sharing
I've deleted all the actual data and replaced it with a blue block, but you can see the layout. I'd like to preserve all the data in the resulting merge. In other words, I don't want to only keep columns that exist in every spreadsheet. I want to keep all columns that appear at least once. The end goal is to be able to filter the table to find all the samples named "CCV" or "RLV" for example and trend their respective data. (We don't always include the data in the sample name in Column B, so adding a column for the source file name would be necessary as well, so that we know which date the data is for.)
Please let me know how I might do this with Power Query, or if there's a better way. We have dozens of these files, and we'd really like to avoid copying and pasting all day.
Thanks so much for the help.
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Power query multiple workbooks and sheetsI am trying to combine multiple workbooks that have multiple sheets but I am missing a step. Get data, from file, from folder and open. Transform. Add custom column (=Excel.Workbook([Content]). I select data, item and kind from the custom drop down. Then I filter item to select the specific sheets I want. Close and load. When I do this, I only get the filepaths of the sheets I selected. I tried the same steps, but I clicked on the "Table" link in the custom data column before close and load. It brings up the data for that worksheet and it's correct. But when I then close and load, it will bring up the actual data, but only for that worksheet that I looked at. How do I get the *data* from all the worksheets that I selected in custom data to load. Or am I completely off base? Thanks! This is my first foray into power query and it's a bit daunting. submitted by /u/Choice_Intention_778 [link] [comments]
- Power Query Merge Issue - Duplicate Tables After Refresh?Hi everyone, I have two Excel sheets, each containing a table. I’m using Power Query to merge them into a new table (Merge query), and everything works fine initially. However, after I load the result back into Excel, I notice something odd: The merged sheet (called "Merge") is created correctly. But the original two sheets seem to get duplicated again as separate tables (like "Table1" and "Table2"). My main concern is this: I regularly update the data in the original two sheets. So my questions are: Will the merged query automatically reflect updates from the original sheets when I refresh and where to but the new data in the orginal sheets or the new table 1 and table 2? Or do I need to set up something specific to make sure the merge stays updated? Also, is it normal for Power Query to create duplicate tables when loading, or am I doing something wrong? Any clarification would really help. Thanks! submitted by /u/Resident_Quantity827 [link] [comments]
- Combining several files into one, matching by date and timeHello fellow excel nerds! I need some help I currently have data coming from several different sensors in several different csv files. Each operates on its own schedule and measures different variables. I’d like to combine them all into one master sheet, where for each time and date I have the output from each sensor. I’m sensing power query is going to be what helps me, but I haven’t quite sussed it out yet. Part of the difficulty is that the columns are named differently as the sensors measure different things (like luminosity, speed, humidity, location), and, of course, they operate on different timelines. So for example one sensor will measure every minute, another every hour, and yet another measures only when specific events happen so those timestamps are basically random. What I’m trying to get is a master table that would tell me at 9pm, the speed was x, the humidity was y, the location was z. I understand I will have a lot of blank values at timestamps where only one or two sensors recorded data points, but I can live with that. Does anyone know how to do this, or even know the accurate term for this manipulation to help me in my research? Many thanks! submitted by /u/Mysterious_Camel_717 [link] [comments]
- Creating Pivot Table from Multiple SheetsHi All, I'm working on a large tracking workbook, consisting of several clinical trials in order to track by patient detail the payments we are owed by the funder, what we have received, and the difference. All these payments are delayed by 3m-2 years in some cases and leadership wants to accurately predict how much we are owed. I think what where I'm running into issues is that while I did standardized as much as I could, there are still several columns for each study that don't apply to other studies. I.e. some studies have different arms they could be enrolled in, some are just a 1 time enrollment payment, others have several milestones that can receive payments. But every sheet has roll ups that are standardized that I need in the Pivot Table. Those being: Protocol Randomized Date Federal Accrued Foundation Accrued Industry Accrued Supplement Accrued Federal Received Foundation Received Industry Received Supplement Received Total Owed The Accrued and received columns sum the individual payments into those buckets, that way we can go back to the funder and ask specifically what we are missing for to see if they missed paying us for that milestone specifically. When I tried pulling all these sheets into Power Query, I was able too, and aggregated all the sheets into one via Power Query. Then I tried to pull that aggregate into a pivot table. No Pivot table Loaded and all I got was "load to data model failed" on each queries. Am I asking for too much? Can I get rid of the extra columns in Power Query that do no align together with ruining the data that is being pulled in by formulas. I have if statements pulling into the table for the individual, study specific milestones, from a separate table that automatically helps us track payments accrued, and the "standard columns" have sums formulas that sum the columns that apply to them from the individual milestone columns. The milestone, study specific received columns are entered in manually and have no formulas, but are rolled up into the standard columns just like in the accrued side. And the total owed column is also a formula of the standard accrued and received columns. The goal of pulling this into a pivot table is so we can give high level data to leadership to actually start tracking how much we are owed, given the constant delay in payments, and to have a real sense of the deficit this specific program runs year to year. This way they can accurately plan for the yearly "donation" from other sources of funding in the department. If you made it through this post, thank you! Any help is appreciated. I'm using Excel 365. submitted by /u/Melodic-Pollution-91 [link] [comments]