How to organize a sheet based on how many times a certain value in a column is duplicated, and have all other columns follow?
Our take
Hope the title is descriptive enough... i feel like i always struggle to describe excel stuff efficiently. I am a complete excel beginner!
So i have a dataset that is 3000+ rows long. For the sake of ease (and also to not share PHI), I made a shortened 27-row long example; this is what is shown in the screenshots.
The data I'm working with is downloaded from a website we use to give people questionnaires. Sometimes, people do not have very straightforward answers to each question, so we type in "comments" in those cases to help clarify the exact answer participants gave. The data I'm working with lists the ID number we gave to each participant; the "variable" AKA the name of the question (tells me exactly which question it is in our questionnaire); which "session" AKA appointment the question is from (we repeat the questionnaire multiple times per participant throughout a year, some in-person, some over the phone); which coworker left the comment (commenter); and finally the actual comment itself. We are trying to see which questions/variables were given comments most often.
This is a replica of what I have:
This is what i WANT it to sort of look like:
In other words, I want:
- A count of how many times each type of variable repeats in the excel file.
- the dataset to be organized from which variable appears the MOST often at the top, and the ones that show up the LEAST at the bottom.
- to get rid of all rows that were for a "test" participant (notice the "test" rows 8, 18, and 26 in the first screenshot are gone in the second).
- to be able to find an exact question from the questionnaire website based on this sheet. for example, if I wanted to look more into the context behind row 5's comment, I would know to go to our questionnaire website, go to participant 111's questionnaires from the Lab 1 appointment and specifically look at the alcohol_amount1 question. In other words it is important to keep the participant number and session information.
It doesn't need to look exactly like the second picture, that was just the first way to organize it that came to mind. As long as it fills the above requirements that all I need that's what matters.
I was trying out pivot tables but I couldn't really get it to look in a way that made sense to me. I really don't know what else to do besides comb through all 3000+ rows one by one... sorry if any of this doesnt fit the exact posting rules. I tried. thanks for any help in advance🥹
[link] [comments]
Read on the original site
Open the publisher's page for the full experience
Related Articles
- How to autofill data from a row to a column on a different sheet in the same folder?I've been struggling with some solutions I've found on the forum but after 1.5hrs I'm close to giving up and manually entering data - which is bound to cost me another 28 hrs. Hoping someone has the solution I'm looking for and is willing to share.. I've exported questionnaire results from Mentimeter to Excel. The document output is formatted automatically in a way that uses columns for unique respondents, followed by their answers in the same column but along individual cells on that column's row, meaning the first entry is A2 and the last entry is in cell CQ2 or something. I would like to make this more user-friendly by: 1) putting each respondent's answers in their own sheet in the folder, and 2) by listing the questions in the first column and the answers in the next columns pretty much 'the other way around'. Currently it looks like this; the answers I need are listed in !VotersF3 to !VotersCQ3. The next respondent's answers are in !VotersF4 through to !VotersCQ4 and so on. What I'm looking for would ideally display answers in !AnswerA3 through to A80. When I manually select !AnswerA3 and click on !VotersF3, logically it does what I want. When I then drag down to autofill, equally logically the sheet enters !VotersF4 instead of !VotersB3 as it's a row vs column problem. I've tried different version of INDEX and TRANSPOSE but I can't get a working formula from that. Would anyone be able to provide me with the correct solution for doing this? I've got another 20+ respondents answers that need to be 'easy to view' instead of scrolling 500 screens horizontally.... Thank you Excel wizards! :) submitted by /u/Lost_Mud2097 [link] [comments]
- Need better way to organise spreadsheethttps://preview.redd.it/oiuewnp0pstg1.png?width=599&format=png&auto=webp&s=d34be7eaeb702d5ded087538a2292a57b295dd65 I am currently keeping track of visiting football stadiums (often called the 72 or the 92) as you can see on the attached image, the shear amount of anfield visits leads to many duplicate entries and am wondering if anyone has a suggestion on tidying this up, so that anfield appears once, but perhaps links through so that i can see all visits there will also likely be more stadiums in the future that will have multiple duplicates. *several columns to the right cropped off because of personal data. submitted by /u/EpilepticFlshbng [link] [comments]
- What would you do with this task, and how long would it take you to do it?I'm going to describe a situation as specifically as I can. I am curious what people would do in this situation, I worry that I complicate things for myself. I'm describing the whole task as it was described to me and then as I discovered it. Ultimately, I'm here to ask you, what do you do, and how long does it take you to do it? I started a new role this month, I am new to advertising modeling methods like mmm, so I am reading a lot about how to apply the methods specific to mmm in R and python, I use VScode, I don't have a github copilot license, I get to use copilot through windows office license. Although this task did not involve modeling, I do want to ask about that kind of task another day if this goes over well. The task 5, excel sheets are to be provided. You are told that this is a clients data that was given to another party for some other analysis and augmentation. This is a quality assurance task. The previous process was as follows; the data the data structure: 1 workbook per industry for 5 industries 4 workbooks had 1 tab, 1 workbook had 3 tabs each tab had a table that had a date column in days, 2 categorical columns advertising_partner, line_of_business and at least 2 numeric columns per work book. some times data is updated from our side and the partner has to redownload the data and reprocess and share again the process this is done once per client, per quarter (but it's just this client for now) open each workbook navigate to each tab the data is in a "controllable" table bing bing home home impressions spend partner dropdown line of business dropdown where bing and home are controlled with drop down toggles, with a combination of 3-4 categories each. compare with data that is to be downloaded from a tableau dashboard end state: the comparison of the metrics in tableau to the excel tables to ensure that "the numbers are the same" the categories presented map 1 to 1 with the data you have downloaded from tableau aggregate the data in a pivot table, select the matching categories, make sure the values match additional info about the file the summary table is a complicated sumproduct look up table against an extremely wide table hidden to the left. the summary table can start as early as AK and as late as FE. there are 2 broadly different formats of underlying data in the 5 notebooks, with small structure differences between the group of 3. in the group of 3 the structure of this wide table is similar to the summary table with categories in the column headers describing the metric below it. but with additional categories like region, which is the same value for every column header. 1 of these tables has 1 more header category than the other 2 the left most columns have 1 category each, there are 3 date columns for day, quarter. REGION USA USA USA PARTNER bing bing google LOB home home auto impressions spend ...etc date quarter impressions spend ...etc 2023-01-01 q1 1 2 ...etc 2023-01-02 q1 3 4 ...etc in the group of 2 the left most categories are actually the categorical headers in the group of 3, and the metrics, the values in each category mach the dates are now the headers of this very wide table the header labels are separated from the start of the values by 1 column there is an empty row immediately below the final row for column headers. date Label 2023-01-01 2023-01-02 year 2023 2023 quarter q1 q1 blank row REGION PARTNER LOB measure blank row US bing home impressions 1 3 US bing home spend 2 4 US google auto ...etc ...etc ... etc The question is, what do you do, and how long does it take you to do it? I am being honest here, I wrote out this explaination basically in the order in which I was introduced to the information and how I discovered it. (Oh it's easy if it's all the same format even if it's weird, oh there are 2-ish different formatted files) the meeting of this task ended at 11:00AM. I saw this copy paste manual etl project and I simply didn't want to do it. So I outlined my task by identifying the elements of the table, column name ranges, value ranges, stacked / pivoted column ranges, etc... for an R script to extract that data. by passing the ranges of that content to an argument make_clean_table(left_columns="B4:E4", header_dims=c(..etc)) and functions that extract that convert that excel range into the correct position in the table to extract that element. Then the data was transformed to create a tidy long table. the function gets passed once per notebook extracting the data from each worksheet, building a single table with the columns for the workbook industry, the category in the tab, partner, line of business, spend, impressions, etc... IMO; ideally (if I have to check their data in excel that is), I'd like the partner to redo their report so that I received a workbook with the underlying data in a traditionally tabular form and their reporting page to use power query and table references and not cell ranges and formula. submitted by /u/TheTresStateArea [link] [comments]
- HSTACK, Autofill, and Skipping rows of dataHi everyone. I had a question about data and manipulating it. I am currently working on collecting data for my thesis and I have created an excel sheet and it is going pretty well. However I am having trouble formatting my sheet. I have a sheet for the original data and a condensed version with the information I care about. This was relatively easy to do. The problem I am running into is the I have 3 sets of data per participant. So I organized it as E_1, E_2, E_3. Basically it is the first set, second set, and third set listed across the columns. The rows are the participants ID number. The problem I am running into is getting the data from the condensed sheet into the organized sheet effectively. I originally was just copying and pasting each row into the other sheet but I am supposed to have 150 participants so doing this for every one would be annoying. So I turned to code to make it less difficult. I was using HSTACK which does make it easier. However when I try to autofill, it goes to the next available cell instead of skipping 3 cells. So it looks kinda like this =HSTACK(A1:C1,A2:C2, A3:C3). Then the next one should be =HSTACK(A4:C4,A5:C5,A6:C6) and so on. It is autofilling it as =HSTACK(A1:C1,A2:C2, A3:C3) then =HSTACK(A2:C2,A3:C3, A4:C4). I don't know how to fix this. I thought that maybe it just needed more examples of the format so I put the HSTACK filled out correctly in rows 1-4. Sometimes it recognizes that it needs to skip 3 but it is inconsistently doing so. And it is like back tracking (like HSTACK(A1:C1, A2:C2) then HSTACK(A2:C2, A3:C3)) instead of just carrying on the pattern. Am I autofilling incorrectly? I am so confused and I am sure there is a way to get it to work but I cannot figure it out on my own submitted by /u/Lower-Duck-5779 [link] [comments]