library(tidyverse) # for data wrangling and visualization
library(scales) # for pretty axis breaksAE 06: Joining country populations with continents
Application exercise
Goal
Our ultimate goal in this application exercise is to create a bar plot of total populations of continents, where the input data are:
- Countries and populations
- Countries and continents
Data
Countries and populations
These data come from The World Bank and reflect population counts as of 2022.
population <- read_csv("https://sta199-s24.github.io/data/world-pop-2022.csv")Let’s take a look at the data.
population# A tibble: 217 × 3
country year population
<chr> <dbl> <dbl>
1 Afghanistan 2022 41129.
2 Albania 2022 2778.
3 Algeria 2022 44903.
4 American Samoa 2022 44.3
5 Andorra 2022 79.8
6 Angola 2022 35589.
7 Antigua and Barbuda 2022 93.8
8 Argentina 2022 46235.
9 Armenia 2022 2780.
10 Aruba 2022 106.
# ℹ 207 more rows
Continents
These data come from Our World in Data.
continents <- read_csv("https://sta199-s24.github.io/data/continents.csv")Let’s take a look at the data.
continents# A tibble: 285 × 4
entity code year continent
<chr> <chr> <dbl> <chr>
1 Abkhazia OWID_ABK 2015 Asia
2 Afghanistan AFG 2015 Asia
3 Akrotiri and Dhekelia OWID_AKD 2015 Asia
4 Aland Islands ALA 2015 Europe
5 Albania ALB 2015 Europe
6 Algeria DZA 2015 Africa
7 American Samoa ASM 2015 Oceania
8 Andorra AND 2015 Europe
9 Angola AGO 2015 Africa
10 Anguilla AIA 2015 North America
# ℹ 275 more rows
Exercises
Think out loud:
- Which variable(s) will we use to join the
populationandcontinentsdata frames?
Add response here.
- We want to create a new data frame that keeps all rows and columns from
populationand brings in the corresponding information fromcontinents. Which join function should we use?
Add response here.
- Which variable(s) will we use to join the
Demo: Join the two data frames and name assign the joined data frame to a new data frame
population_continents.
# add code here- Demo: Take a look at the newly created
population_continentdata frame. There are some countries that were not incontinents. First, identify which countries these are (they will haveNAvalues forcontinent).
# add code here- Demo: All of these countries are actually in the
continentsdata frame, but under different names. So, let’s clean that data first by updating the country names in thepopulationdata frame in a way they will match thecontinentsdata frame, and then joining them, using acase_when()statement inmutate(). At the end, check that all countries now have continent information.
# add code here- Think out loud: Which continent do you think has the highest population? Which do you think has the second highest? The lowest?
Add your response here.
- Demo: Create a new data frame called
population_summarythat contains a row for each continent and a column for the total population for that continent, in descending order of population. Note that the function for calculating totals in R issum().
# add code here- Your turn: Make a bar plot with total population on the y-axis and continent on the x-axis, where the height of each bar represents the total population in that continent.
# add code here- Your turn: Recreate the following plot, which is commonly referred to as a lollipop plot. Hint: Start with the points, then try adding the
segments, then add axis labels andcaption, and finally, as a stretch goal, update the x scale (which will require a function we haven’t introduced in lectures or labs yet!).

# add code here- Think out loud: What additional improvements would you like to make to this plot.
Add your response here.