Breaking Away from EXCEL – Find & Replace in R

by Andrea Hutson

Note: This post may be for you even if you don’t regularly use R! You’ll learn a technique that is FASTER than Find & Replace in EXCEL and is more accurate to boot.

Raise your hand high if are in the Find & Replace club – you use it in EXCEL to clean up your data before starting analyses, no matter what software you eventually use to analyze data.

If you are editing your data in EXCEL before analyzing, you’re not alone.

Here’s one way we often use EXCEL – to recode text data into numeric data. Let’s imagine we have a dataset like the one below:

a table of values that need to be recoded ("Strongly Agree", "Somewhat Agree" etc.)

We need to recode these variables so that they’re numeric and then we can analyze them.

The typical recode method in EXCEL looks something like this:

Use of Find and Replace in EXCEL - Find "Strongly Agree" and replace with "5"

It takes a while, but it works, right? Well, mostly. Have you ever done something like this?

Whoops, I forgot to do “Strongly Agree” before Agree, or to check “Find entire cells only”

In the example above, I coded “Agree” first, which replaced all of the text for “Strongly Agree”, “Somewhat Agree”, etc. with “4”. Yes, there are ways to avoid this happening,but I often forget about them until I’ve made the mistake.

Have you ever confused yourself so much you’ve had to start COMPLETELY OVER?

(I can’t be the only one!)

I’m here to tell you something exciting – your copy and paste days can be TOTALLY OVER, even if you don’t normally use R.

STEP 1: Load Your EXCEL Data

In step 1, you need to load your EXCEL data into R. This is harder than it should be, and requires you to use the “openxlsx” library or convert your file to a .csv format as there’s no native EXCEL support in R. I’ll plan to do a post all about this in the future, as this was my first major roadblock to using R – actually loading the dang data.

Here’s the first step. For simplicity in the next steps, please name your dataset “myData”.

 
setwd("~/Directory Of Your Files")   # set working directory
 
     # You can get to this in R studio by going to Session -> Set Working Directory 
 
library(openxlsx) # if this is your first time using openxlsx, you'll need to install first
 
    # Use this code first if so ->   install.packages("openxlsx") 
 
myData <- read.xlsx("Your EXCEL FILE.xlsx", sheet=1)

STEP 2: Recode

I’m going to skip explaining the ‘lapply’ function right now – just know that you don’t need to change anything here but the values for your scales. That is, you can change “Strongly Agree” to “Very Often” if that’s the scale you’re using, or change the value for Strongly Agree to a different number. You can add more values, too, just make sure that:

  • each value is separated by a semicolon (;)
  • your text values are in SINGLE quotes,
  • the final bit of code has an end double quote and a closing parenthesis.
# --> Load the library that has the recode function (car) - this one generally comes with R so you shouldn't need to install
 
   library (car)
 
# --> Now do a batch recode!  The code below works for your typical 1-5 "Strongly Disagree -> Strongly Agree" scale but you can use 
 
myData <- lapply(myData, FUN = function(x) recode(x, "'Strongly Agree' = 5; 'Agree' = 4; 
                                              'Neither Agree nor Disagree' = 3; 'Disagree' = 2;
                                              'Strongly Disagree'=1; 'not applicable' = NA")

STEP 3: Convert back to a table

Almost done. The last step to find/replace is to convert this new object you’ve created back into a data frame, which only uses one line of code.

 
# --> Change object back into data frame
 
   myData <- as.data.frame(myData)

STEP 4: Send back to EXCEL

Many of us that use R for data analysis will skip this step, and just start analyzing here. But you can absolutely just send this right back to EXCEL if you’d like.

# --&gt; Write to EXCEL
 
   write.xlsx(myData, 'Data recoded.xlsx')

Boom – you’re done! Now, I know, some of you are thinking, ‘But Andrea, in EXCEL find/replace is just one step, and you’ve just given us FOUR steps to follow. I’m going to ignore this post and keep doing things as I’ve always done.’

But here’s why this way is better:

  • It is NOT just one step in EXCEL! You’ve got to Find/Replace each individual value…and it’s more time consuming than you think.
  • Once you have the base code written down you can reuse it over and over!
  • Similarly, if you save the R file, you can recode this particular data set over and over again. If you get a survey update with 10 more students, you don’t have to spend an hour finding/replacing – you literally just use the same exact code you’ve already written.
  • You’re not going to make silly mistakes.

Try it out! I’ve got the entire code below for your copy/pasting needs. The items that you will need to change should appear in RED. You should be able to leave all of the rest of the code as is.

 
# --> Set the directory where your files are : you can get to this in R studio by going to Session -> Set Working Directory 
 
setwd("~/Directory Of Your Files")   # set working directory
 
 
library(openxlsx) # if this is your first time using openxlsx, you'll need to install first
 
    # Use this code first if so ->   install.packages("openxlsx") 
 
myData <- read.xlsx("Your EXCEL FILE.xlsx", sheet=1)
 
# --> Load the library that has the recode function (car) - this one generally comes with R so you shouldn't need to install
 
   library (car)
 
# --> Now do a batch recode!  The code below works for your typical 1-5 "Strongly Disagree -> Strongly Agree" scale but you can use 
 
myData <- lapply(myData, FUN = function(x) recode(x, "'Strongly Agree' = 5; 'Agree' = 4; 
                                              'Neither Agree nor Disagree' = 3; 'Disagree' = 2;
                                              'Strongly Disagree'=1; 'not applicable' = NA")
 
# --> Change object back into data frame
 
   myData <- as.data.frame(myData) 
 
 
# --> Write to EXCEL
 
   write.xlsx(myData, 'Data recoded.xlsx')

Try it out and let me know how it goes! I promise, after you’ve saved yourself an hour of tedious finding and replacing, you’ll be a convert!