•1 min read•from Microsoft Excel | Help & Support with your Formula, Macro, and VBA problems | A Reddit Community
Trying to automate extracting info from PDFs into a table with PowerQuery but they're somehow not structured the same and it's messing up.
Our take
Are you grappling with the frustration of extracting data from seemingly identical PDFs using PowerQuery, only to find inconsistent results? You’re not alone. Many users face this challenge when working with documents from government agencies that appear uniform but contain hidden structural differences. Fortunately, there’s a way to streamline your process and ensure all the data you need lands in the right columns. Keep reading to discover practical strategies that can help you automate your data extraction and regain control over your workflow.
I thought since the PDFs looked like they were the same format (they're documents from a government agency), they would produce the same results if I ran them through PowerQuery. Somehow, they don't.
I need three pieces of data from each file. Somehow they all end up on different columns despite looking identical. I've tried my best to make it fit but the moment I try to remove extraneous columns, the same error pops up because one of the file doesn't have a specific numbered column.
It's so frustrating. I don't even need it to look nice, I just need the info in a list for convenience. Is there anything I can do to make it work?
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Extract data from Power QueryHi, I've been fighting with powerquery (pq) bc I need to extract specific numbers from a pdf, it has hundreds of pages and the numbers are always at the same spot, but they're not spreadsheets, they're invoices. I've tried pq but it makes nonsense sheets trying to convert the text to a normal sheet, but I can't find how to keep just the number I need and toss the rest of the info submitted by /u/sbeveguy [link] [comments]
- What is your actual workflow for getting PDF data into Excel cleanly when formats vary across files?I work with invoices and reports from multiple vendors and the PDF formats are all different. Some import into Excel reasonably well through Power Query but others come through as jumbled text with no consistent structure to parse. I have tried copying text manually and running some through AI tools for tabular output but neither scales well. Curious what workflows people have actually settled on when dealing with inconsistent PDF sources. Is there a combination of tools or Excel features that handles varied formats without needing a custom solution for each file type? submitted by /u/beckstarlow [link] [comments]
- Extracting data from PDF in an organized manner?Hi all, I'm looking to parse information from different formats of PDFs (Basically Different Vendor quotes) into excel, so far I was using PDF to excel converter and then copying this data into my main file and then using macros to only select fields of the required data. The process is really repititive and takes up a lot of time which adds more pressure when I've got deadlines. I need advice on how I can parse information into excel seamlessly from a PDF file. Would really appreciate your suggestions. I know Power Automate is a beautiful solution but currently my company is not going to get this subscription in the near future, so I really need an effective solution to manage my work load. submitted by /u/ThenLandscape2108 [link] [comments]
- Using Power Query to Read PDFs (W-9/Tax Data)Basically the title, but I’ve been trying to find a way to compile all the data from the W9’s / tax tracking info sent to us from other companies. I’m decently proficient at excel (thanks to this sub mainly) I currently have tried using Power Query and saving all the files into a folder but I’m struggling to get it to read the files within the folder. I get it to either just pull the file names/info or one individual file info. If there is a different process that’s easier, I’m essentially trying to save and organize companies names and tracking info to make the end of year accounting process easier! submitted by /u/crustysanta [link] [comments]
Tagged with
#Excel alternatives for data analysis#generative AI for data analysis#rows.com#natural language processing for spreadsheets#big data management in spreadsheets#conversational data analysis#Excel compatibility#real-time data collaboration#intelligent data visualization#financial modeling with spreadsheets#PDFs#PowerQuery#data extraction#government agency#structured data#columns#documents#automate#different formats#error