Back in March, I implemented a needs assessment survey for my local LGBT community center. Lacking the paid software I’d used in college, I had to improvise with free resources. I collected the data using Google forms, downloaded the data as a CSV, and analyzed it using R. The survey included questions such as what activities people were most interested in and what hours they were available to visit the center. These options were displayed using checkboxes, so participants could select multiple options. Very quickly I ran into a problem deciphering this data in R.
Rather than running any complex statistical analyses, I was interested purely in descriptive statistics to get a general idea of what visitors wanted in an LGBT community center. A crosstabulation is a simple way of displaying categorial data. In R, using the table function allows you to see how many participants selected an option. If these were yes/no or one-answer questions, this would have worked fine. However, here’s a small sampling of the mess I got with this attempt:
table(NorthStar$Programs) ... Exercise Groups, Community Service Projects 1 Film/Movie Viewings, Art Shows, Community Service Projects 1 Film/Movie Viewings, Art Shows, Community Service Projects, Men's Social Activities 1 Film/Movie Viewings, Art Shows, Legal Advice/Information Sessions, Men's Social Activities 1 Film/Movie Viewings, Art Shows, Potluck, Community Service Projects, Legal Advice/Information Sessions, Youth Social Activities, Family Social Activities 1 ...
I had used checkboxes, so every unique combination of selections counted as a category, rather than each category acting individually! In other words, one person checked off “Film/Movie Viewings”, “Art Shows”, and “Community Service Projects”, another had checked off “Film/Movie Viewings”, “Art Shows”, “Community Service Projects”, and “Men’s Social Activities”, and so on.
Fortunately, I knew enough programming fundamentals to solve this problem. This was the first time learning to code came in handy for a real problem. Almost everything I worked on before was purely for learning and building my portfolio.
I realized I needed to iterate through every participant’s answers, and count up how many times they had selected a certain option. To do this, I wrote a function that takes in an array of options and the data that was actually collected. The function loops through each option in the array. Within that loop, another loop checks if that option matches the data. Every time it does, a program counts variable is increased by 1. The program option is added to one vector*, and the program count is added to another vector. After the loops finish, the vectors are merged to display a new, organized data set.
(*In R, a vector is a sequence of data elements, you can think of it as a row.)
I’ve read that best practices discourages for loops in R, so this may not have been the ideal solution. It was a quick solution, though, and certainly less tedious than counting up all those results by hand!
The full R script I used to analyze the survey’s data is below.
NorthStar = read.csv("NorthStar.csv") NorthStar$Timestamp = NULL names(NorthStar) <- c("County", "Programs", "OtherPrograms", "TransSupportGroup", "GenderqueerSupportGroup", "BisexualSupportGroup", "WomenSupportGroup", "MenSupportGroup", "YouthSupportGroup", "POCSupportGroup", "MentorMeeting", "BookClub", "FilmViewing", "Trivia", "ArtShow", "OpenMic", "Potluck", "ExerciseGroup", "CommunityServiceProject", "LegalAdvice", "Over65Activity", "YouthSocial", "MenSocial", "WomenSocial", "FamilySocial", "GenerallyAvailable", "ConvenientAvailable", "OtherComments") table(NorthStar$County) #Programs and Activities ActivityList = c("Trans\\* Support Group", "Genderqueer Support Group", "Bisexual Support Group", "Women's Support Group", "Men's Support Group", "Youth Support Group", "People of Color Support Group", "Mentorships", "Book Club", "Film/Movie Viewings", "Trivia Competition", "Art Shows", "Music Open Mic", "Potluck", "Exercise Groups", "Community Service Projects", "Legal Advice/Information Sessions", "Over 65 Activities", "Youth Social Activities", "Men's Social Activities", "Women's Social Activities", "Family Social Activities") ListItems <- function(ProgramList, ProgramVar) { ProgramCountsVar = vector() ProgramCounts = vector() for (program in ProgramList) { string <- program programCount = 0 for(item in ProgramVar) { if(grepl(string, item)) {programCount = programCount + 1} } ProgramCountsVar = c(ProgramCountsVar, program) ProgramCounts = c(ProgramCounts, programCount) } names(ProgramCounts) <- ProgramCountsVar ProgramCounts } sort(ListItems(ActivityList, NorthStar$Programs)) HighInterest = function(programVar) {score <- sum(programVar == "Once a month") + sum(programVar == "Every other week") + sum(programVar == "Once a week") print(score)} lapply(NorthStar[4:25], HighInterest) #Availability Times TimeList = c("Monday late afternoon \\(3-5pm\\)", "Monday early evening \\(5-7pm\\)", "Monday late evening \\(7-9pm\\)", "Tuesday late afternoon \\(3-5pm\\)", "Tuesday early evening \\(5-7pm\\)", "Tuesday late evening \\(7-9pm\\)", "Wednesday late afternoon \\(3-5pm\\)", "Wednesday early evening \\(5-7pm\\)", "Wednesday late evening \\(7-9pm\\)", "Thursday late afternoon \\(3-5pm\\)", "Thursday early evening \\(5-7pm\\)", "Thursday late evening \\(7-9pm\\)", "Friday late afternoon \\(3-5pm\\)", "Friday early evening \\(5-7pm\\)", "Friday late evening \\(7-9pm\\)", "Saturday morning \\(9am-noon\\)", "Saturday early afternoon \\(noon-3pm\\)", "Saturday late afternoon \\(3-5pm\\)", "Saturday early evening \\(5-7pm\\)", "Saturday late evening \\(7-9pm\\)", "Sunday morning \\(9am-noon\\)", "Sunday early afternoon \\(noon-3pm\\)", "Sunday late afternoon \\(3-5pm\\)", "Sunday early evening \\(5-7pm\\)", "Sunday late evening \\(7-9pm\\)", "Other") sort(ListItems(TimeList, NorthStar$ConvenientAvailable))