How to Deal with Missing Values in R

Scenario

One day I was imparting training to the participants on Statistical Techniques using R Programming. In the class, I was taking Descriptive Statistics and was trying to demonstrate them how to calculate mean using Survey data of Package MASS.

Quick View of Data

## Warning: package 'MASS' was built under R version 3.5.3

## Warning: package 'knitr' was built under R version 3.5.3

kable(head(survey))

Sex	Wr.Hnd	NW.Hnd	W.Hnd	Fold	Pulse	Clap	Exer	Smoke	Height	M.I	Age
Female	18.5	18.0	Right	R on L	92	Left	Some	Never	173.00	Metric	18.250
Male	19.5	20.5	Left	R on L	104	Left	None	Regul	177.80	Imperial	17.583
Male	18.0	13.3	Right	L on R	87	Neither	None	Occas	NA	NA	16.917
Male	18.8	18.9	Right	R on L	NA	Neither	None	Never	160.00	Metric	20.333
Male	20.0	20.0	Right	Neither	35	Right	Some	Never	165.00	Metric	23.667
Female	18.0	17.7	Right	L on R	64	Right	Some	Never	172.72	Imperial	21.000

Find the Mean

Variable Wr. Hand is showing span (distance from tip of thumb to tip of little finger of spread hand) of writing hand, in centimetres. Its continuous variable, so mean would be the correct measurement for central tendency. So I tried following command,

mean(survey$Wr.Hnd)

## [1] NA

I was surprised why is it showing NA even though the data is of continuous type as you can see in the above quick view of data.

I just tried to reload the MASS Package and again did the same procedure considering that there would be some error loading the package. But after that also I was getting the same error. Now I was feeling embarrassed.

Then I started looking at individual data value of that particular variable. And suddently I found, observation no. 43 has value NA.Due to that I was getting the error while calculated mean.

kable(survey[c(40:45),])

	Sex	Wr.Hnd	NW.Hnd	W.Hnd	Fold	Pulse	Clap	Exer	Smoke	Height	M.I	Age
40	Male	19.0	19.0	Right	R on L	NA	Neither	Freq	Occas	171.00	Metric	19.917
41	Female	17.5	16.0	Right	L on R	NA	Right	Some	Never	169.00	Metric	17.500
42	Female	17.8	18.0	Right	R on L	72	Right	Some	Never	154.94	Imperial	17.083
43	Male	NA	NA	Right	R on L	60	NA	Some	Never	172.00	Metric	28.583
44	Female	20.1	20.2	Right	L on R	80	Right	Some	Never	176.50	Imperial	17.500
45	Female	13.0	13.0	NA	L on R	70	Left	Freq	Never	180.34	Imperial	17.417

Now I came to know that yes this is the observation which make me embarrased. But how to deal with it. I can remove this observation, but it is having the data for other variables. So if I remove it, then it would be a loss of information. The best way is to skip this observation while calculating mean of Wr. hand variable. So I used following argument in command,

mean(survey$Wr.Hnd, na.rm = T)

## [1] 18.66907

Wooooo !!!!!! Now it’s giving the result without loosing the other information by skipping just NA values.

How to deal with Missing Values in R Programming?

Scenario

Quick View of Data

Find the Mean