Power Query’s "Remove Duplicates" has me questioning EVERYTHING I’ve ever done in PQ
Our take
I had an epic failure with Power Query that has me worried about every spreadsheet I’ve ever created.
A simplified version of my issue was that I had a list containing "User", "Password", and "Date" (when the user created their password):
| User | Password | Date |
|---|---|---|
| Alice | Apple1 | 01-Jan-26 |
| Bob | Blue1 | 01-Jan-26 |
| Bob | Blue2 | 04-Feb-26 |
| Carl | Cat1 | 02-Jan-26 |
| Carl | Cat2 | 01-Feb-26 |
| Carl | Cat3 | 07-Mar-26 |
I needed a list of only each user’s most recent password. In Excel, I could sort by "Dates" in descending order, remove duplicates in the "Name" column, then sort "Names" in ascending order and would get exactly what I was looking for:
| User | Password | Date |
|---|---|---|
| Alice | Apple1 | 01-Jan-26 |
| Bob | Blue2 | 04-Feb-26 |
| Carl | Cat3 | 07-Mar-26 |
But when I tried this in Power Query, it didn’t keep the first name in the sorted table - it kept the first name in the original unsorted table:
| User | Password | Date |
|---|---|---|
| Alice | Apple1 | 01-Jan-26 |
| Bob | Blue1 | 01-Jan-26 |
| Carl | Cat1 | 02-Jan-26 |
I found a convoluted workaround (create a separate query to group by name and max date, and then merge this with the original query to append the password), but why would I ever suspect that the order of removing duplicates would be different in Power Query? The worst part is that it was only a single data point that was incorrect - someone else caught the error - and it took me far too long to figure out what the problem was.
TL;DR: Now I’m concerned there are other non-obvious pitfalls out there that will bite me in the rear. What else do I need to look out for?
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Power query and manual table next to itHi, I want to pull data verbatim from a spreadsheet my team uses and use data from it for my own purposes. The main goal for using power query is that the data updates on my spreadsheet. Mainly, if any new entries are added at the bottom. I also have some manual fields that I need to add that correspond with the power query data. I've added another table beside the power query data, and filtering it causes the data on both sides to adjust correctly. I'm mainly concerned that, if the entries are rearranged or sorted on the original sheet, that my tables will not align after a refresh. Also, if a refresh would break my table alignments at any point. Is my fear founded? Is there a way to combine the two features that I need into a single table? submitted by /u/Perspective-Guilty [link] [comments]
- Did I follow the best practice using Power Query?I recently needed to build tables of full moons and solar eclipses from 1948 through the present. I thought this would be an excellent opportunity to use Power Query for the first time. I found the data online and used Power Query to pull it all into Excel. In the case of the full moons, each year was on a separate web page. So, I ended up with 79 separate queries. After investigating how to do so, I was able to merge the 79 separate queries into one. Once I had the tables with the data I wanted. I hard copied and pasted them and deleted all the queries. Without a trace of the queries left behind, they now appear as if I undertook the tedious task of typing hundreds (maybe more than 1,000) of entries into the individual cells. From what I understand about Power Queries, these are rebuilt each time I open the file. Since the historical data will never change, there is little reason for that. Further, I have no way of knowing whether the websites I used will have the same URLs forever. Presumably, any change would break the query. Later this year, I'll add 2027 fulll moon and solar eclipse data. I can run new Power Queries, append them to the tables I have, hard copy and paste them and then delete the queries. It seems I can do this every year until I am too old and feeble to continue. I would appreciate critiques from experienced Power Query users about the decisions I made after considering the particular circumstances. Would there have been an advantage to keep the Power Queries intact? Did I speed up the calculation of the workbook by deleting the Power Queries? submitted by /u/Taxman1913 [link] [comments]
- Excel power users: What finally made you take the plunge into Power Query?I discovered Power Query last year when a coworker solved an issue I had with simultaneously updating multiple tables. I don’t know how I’d gone so long without having heard about Power Query, but immediately had to figure out how this sorcery worked, and now I can’t imagine using Excel without it. I’ve been spreading the PQ gospel among my coworkers, all very Excel-savvy users who prepare recurring reports with lots of manual data sorting/formatting - the ideal folks to take full advantage of what Power Query has to offer. And, as no good deed goes unpunished, I’ve been asked to give a presentation on Power Query: the basics on what it is, how it can improve the workflow, examples on how I’m using it, and point them towards some learning resources. So… I’d love to hear from people who were initially on the fence: What made you finally decide to learn Power Query? What kind of work were you doing before implementing Power Query and what does it look like now? How much time and frustration has this saved you? Bonus: Any "I can’t believe I used to did it this way" stories. Double Bonus: Any "I tried Power Query and it was a total waste of my time" stories (because I find it hard to imagine any situation where this might be true). Hoping you guys will share some of your experiences to help others get over that initial learning curve! submitted by /u/KimHoJo [link] [comments]
- I want to use Power Query to import data received from a client, where the file name changes each month. What's the easiest way to automate this?I used to use VBA for this, but that's a lot more roundabout, and I have a lot less control over the transformation. I have no issues with transforming the actual data itself. My issue lies in the fact that it's a different file each month. Using wildcard formatting, *filehere*.xls* would always pull the correct file. This file is also stored in the same place relative to my spreadsheet each time, but the location of the spreadsheet and folders itself changes each month. In VBA, I could find the relative position quite easily via ThisWorkbook.Path & "\Data\" However, I don't know how to use PQ to import automatically like this, so that I'd always import the correct data simply by refreshing links. I think I've seen people set up a somewhat hacky way, where PQ first reads a table in the workbook to retrieve values, and then uses those to find the file to query. Is that the only way? submitted by /u/space_reserved [link] [comments]