How to organize a sheet based on how many times a certain value in a column is duplicated, and have all other columns follow?

Our take

Organizing your spreadsheet to identify and sort duplicated values can significantly enhance your data analysis. In this guide, we'll explore how to count the frequency of each variable in your dataset and arrange the rows accordingly, ensuring all related information remains intact. By filtering out test participants and focusing on meaningful comments, you can easily navigate the dataset to uncover insights. Whether you're a complete beginner or just looking to refine your skills, this approach will empower you to manage your data more effectively.

Hope the title is descriptive enough... i feel like i always struggle to describe excel stuff efficiently. I am a complete excel beginner!

So i have a dataset that is 3000+ rows long. For the sake of ease (and also to not share PHI), I made a shortened 27-row long example; this is what is shown in the screenshots.

The data I'm working with is downloaded from a website we use to give people questionnaires. Sometimes, people do not have very straightforward answers to each question, so we type in "comments" in those cases to help clarify the exact answer participants gave. The data I'm working with lists the ID number we gave to each participant; the "variable" AKA the name of the question (tells me exactly which question it is in our questionnaire); which "session" AKA appointment the question is from (we repeat the questionnaire multiple times per participant throughout a year, some in-person, some over the phone); which coworker left the comment (commenter); and finally the actual comment itself. We are trying to see which questions/variables were given comments most often.

This is a replica of what I have:

https://preview.redd.it/9lvlnwplfgng1.png?width=1025&format=png&auto=webp&s=4716f993d500383af6307b8fb1724fef5b8626e1

This is what i WANT it to sort of look like:

https://preview.redd.it/3d4w5a6ofgng1.png?width=980&format=png&auto=webp&s=dd32c658aceadd0afe5c427f0ec19177226d5081

In other words, I want:

A count of how many times each type of variable repeats in the excel file.
the dataset to be organized from which variable appears the MOST often at the top, and the ones that show up the LEAST at the bottom.
to get rid of all rows that were for a "test" participant (notice the "test" rows 8, 18, and 26 in the first screenshot are gone in the second).
to be able to find an exact question from the questionnaire website based on this sheet. for example, if I wanted to look more into the context behind row 5's comment, I would know to go to our questionnaire website, go to participant 111's questionnaires from the Lab 1 appointment and specifically look at the alcohol_amount1 question. In other words it is important to keep the participant number and session information.

It doesn't need to look exactly like the second picture, that was just the first way to organize it that came to mind. As long as it fills the above requirements that all I need that's what matters.

I was trying out pivot tables but I couldn't really get it to look in a way that made sense to me. I really don't know what else to do besides comb through all 3000+ rows one by one... sorry if any of this doesnt fit the exact posting rules. I tried. thanks for any help in advance🥹

submitted by /u/soupysyrup
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article →

Tagged with

#Excel alternatives for data analysis#generative AI for data analysis#rows.com#natural language processing for spreadsheets#Excel compatibility#Excel alternatives#big data management in spreadsheets#conversational data analysis#large dataset processing#row zero#real-time data collaboration#financial modeling with spreadsheets#intelligent data visualization#data visualization tools#enterprise data management#big data performance#data analysis tools#data cleaning solutions#cloud-based spreadsheet applications#Excel

How to organize a sheet based on how many times a certain value in a column is duplicated, and have all other columns follow?

Related Articles

Tagged with