suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(dplyr))Solutions: Advance Data Manipulation
Materials adapted from Adrien Osakwe, Larisa M. Soto and Xiaoqi Xie.
Write one command ( can span multiple lines) using pipes that will output a data frame that has only the columns lifeExp, country and year for the records before the year 2000 from African countries, but not for other Continents.
tidy_africa <- gapminder %>%
dplyr::filter(continent == "Africa") %>%
dplyr::select(year, country, lifeExp)
head(tidy_africa)# A tibble: 6 × 3
year country lifeExp
<int> <fct> <dbl>
1 1952 Algeria 43.1
2 1957 Algeria 45.7
3 1962 Algeria 48.3
4 1967 Algeria 51.4
5 1972 Algeria 54.5
6 1977 Algeria 58.0
Calculate the average life expectancy per country. Which country has the longest average life expectancy and which one the shortest average life expectancy?
gapminder %>%
dplyr::group_by(country) %>%
dplyr::summarize(mean_lifeExp = mean(lifeExp)) %>%
dplyr::filter(mean_lifeExp == min(mean_lifeExp) | mean_lifeExp == max(mean_lifeExp))# A tibble: 2 × 2
country mean_lifeExp
<fct> <dbl>
1 Iceland 76.5
2 Sierra Leone 36.8
In the previous hands-on you discovered that all the entries from 2007 are actually from 2008. Write a command to edit the data accordingly using pipes. In the same command filter only the entries from 2008 to verify the change.
gapminder %>%
dplyr::mutate(year = ifelse(year==2007,2008,year)) %>%
dplyr::filter(year==2008) %>%
head()# A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Afghanistan Asia 2008 43.8 31889923 975.
2 Albania Europe 2008 76.4 3600523 5937.
3 Algeria Africa 2008 72.3 33333216 6223.
4 Angola Africa 2008 42.7 12420476 4797.
5 Argentina Americas 2008 75.3 40301927 12779.
6 Australia Oceania 2008 81.2 20434176 34435.