1 min readfrom Microsoft Excel | Help & Support with your Formula, Macro, and VBA problems | A Reddit Community

Trying to automate extracting info from PDFs into a table with PowerQuery but they're somehow not structured the same and it's messing up.

Our take

Are you grappling with the frustration of extracting data from seemingly identical PDFs using PowerQuery, only to find inconsistent results? You’re not alone. Many users face this challenge when working with documents from government agencies that appear uniform but contain hidden structural differences. Fortunately, there’s a way to streamline your process and ensure all the data you need lands in the right columns. Keep reading to discover practical strategies that can help you automate your data extraction and regain control over your workflow.

I thought since the PDFs looked like they were the same format (they're documents from a government agency), they would produce the same results if I ran them through PowerQuery. Somehow, they don't.

I need three pieces of data from each file. Somehow they all end up on different columns despite looking identical. I've tried my best to make it fit but the moment I try to remove extraneous columns, the same error pops up because one of the file doesn't have a specific numbered column.

It's so frustrating. I don't even need it to look nice, I just need the info in a list for convenience. Is there anything I can do to make it work?

submitted by /u/DoctorKrakens
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Related Articles

Tagged with

#Excel alternatives for data analysis#generative AI for data analysis#rows.com#natural language processing for spreadsheets#big data management in spreadsheets#conversational data analysis#Excel compatibility#real-time data collaboration#intelligent data visualization#financial modeling with spreadsheets#PDFs#PowerQuery#data extraction#government agency#structured data#columns#documents#automate#different formats#error