library(tidyverse)
AE 09: Opinion articles in The Chronicle
Suggested answers
Application exercise
Answers
Important
These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.
Part 1 - Data scraping
See chronicle-scrape.R
for suggested scraping code.
Part 2 - Data analysis
Let’s start by loading the packages we will need:
- Your turn (1 minute): Load the data you saved into the
data
folder and name itchronicle
.
<- read_csv("data/chronicle.csv") chronicle
- Your turn (3 minutes): Who are the most prolific authors of the 100 most recent opinion articles in The Chronicle?
|>
chronicle count(author, sort = TRUE)
# A tibble: 204 × 2
author n
<chr> <int>
1 Luke A. Powery 30
2 Heidi Smith 27
3 Advikaa Anand 22
4 Monday Monday 17
5 Monika Narain 16
6 Community Editorial Board 12
7 Linda Cao 12
8 Sonia Green 12
9 Valerie Tan 11
10 Nathan Luzum 10
# ℹ 194 more rows
- Demo: Draw a line plot of the number of opinion articles published per day in The Chronicle.
|>
chronicle count(date) |>
ggplot(aes(x = date, y = n, group = 1)) +
geom_line()
- Demo: What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title?
|>
chronicle mutate(
title = str_to_lower(title),
climate = if_else(str_detect(title, "climate"), "mentioned", "not mentioned")
|>
) count(climate) |>
mutate(prop = n / sum(n))
# A tibble: 2 × 3
climate n prop
<chr> <int> <dbl>
1 mentioned 10 0.02
2 not mentioned 490 0.98
- Your turn (5 minutes): What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title or abstract?
|>
chronicle mutate(
title = str_to_lower(title),
abstract = str_to_lower(abstract),
climate = if_else(
str_detect(title, "climate") | str_detect(abstract, "climate"),
"mentioned",
"not mentioned"
)|>
) count(climate) |>
mutate(prop = n / sum(n))
# A tibble: 3 × 3
climate n prop
<chr> <int> <dbl>
1 mentioned 14 0.028
2 not mentioned 482 0.964
3 <NA> 4 0.008
- Time permitting: Come up with another question and try to answer it using the data.
# add code here
- Time permitting: