Scenario
I used to write majority of my blogs based upon the practical constraints or the queries that I received during my training or consulting assignments. This blog is also based on the query raised by one the participants during my training. He asked,
“Sir, I am used to Excel, so If I want to find mean of 50 columns than I put formula and then just drag it. Can we do this in R?”
It just came suddenly, so I was also thinking that How can we do that. Then I answered, “Yes, we can do that by using Apply function”
Let’s See the Data
We will use the same data that we used in my previous blog on How to deal with Missing Values.
The data is Survey from the MASS Package.
This data frame contains the responses of 237 Statistics students at the University of Adelaide to a number of questions.
library(MASS)
## Warning: package 'MASS' was built under R version 3.5.3
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.3
kable(head(survey))
Sex | Wr.Hnd | NW.Hnd | W.Hnd | Fold | Pulse | Clap | Exer | Smoke | Height | M.I | Age |
---|---|---|---|---|---|---|---|---|---|---|---|
Female | 18.5 | 18.0 | Right | R on L | 92 | Left | Some | Never | 173.00 | Metric | 18.250 |
Male | 19.5 | 20.5 | Left | R on L | 104 | Left | None | Regul | 177.80 | Imperial | 17.583 |
Male | 18.0 | 13.3 | Right | L on R | 87 | Neither | None | Occas | NA | NA | 16.917 |
Male | 18.8 | 18.9 | Right | R on L | NA | Neither | None | Never | 160.00 | Metric | 20.333 |
Male | 20.0 | 20.0 | Right | Neither | 35 | Right | Some | Never | 165.00 | Metric | 23.667 |
Female | 18.0 | 17.7 | Right | L on R | 64 | Right | Some | Never | 172.72 | Imperial | 21.000 |
How to do it
Now let’s saay we want to find out the mean of all the numeric variables i.e. Wr. Hnd, NW.Hnd, Pulse, Height and Age.
So first let us make label that we want to give to the mean of each columns. We will use list function to make a list and store it in to the object list1.
list1 <- list(Writing_Hand = survey$Wr.Hnd, Non_writing_hand = survey$NW.Hnd, Pulse = survey$Pulse, Height = survey$Height, Age = survey$Age)
Now we will use sapply function to find out the mean of all these columns in a signle command line.
sapply(list1, mean, na.rm = T)
## Writing_Hand Non_writing_hand Pulse Height ## 18.66907 18.58263 74.15104 172.38086 ## Age ## 20.37451
It’s amazing. We can find out the mean of multiple columns/ varibales with the help of only few line of code.
You can also use any of the function on this sapply function. e.g. if we want to find out the Standard Deviation of the same columns, then we can use it as follows,
sapply(list1, sd, na.rm = T)
## Writing_Hand Non_writing_hand Pulse Height ## 1.878981 1.967068 11.687157 9.847528 ## Age ## 6.474335
Thanks you for reading the blog. Do comment the next blog you want me to write on.
Sex | Wr.Hand | NW.Hnd | W.Hnd | Fold | Pulse | Clap | Exer | Smoke | Height | MI | Age |
Female | 18.5 | 18.0 | Right | R On L | 92 | Left | Some | Never | 173.00 | Matric | 18.250 |
Male | 19.5 | 20.5 | Left | R On L | 104 | Left | None | Regul | 177.80 | Imperial | 17.583 |
Male | |||||||||||
Male | |||||||||||
Male | |||||||||||
Female |