AE 09: Opinion articles in The Chronicle

Suggested answers

Application exercise
Answers
Important

These are suggested answers. This document should be used as reference only, it’s not designed to be an exhaustive key.

Part 1 - Data scraping

See chronicle-scrape.R for suggested scraping code.

Part 2 - Data analysis

Let’s start by loading the packages we will need:

library(tidyverse)
  • Your turn (1 minute): Load the data you saved into the data folder and name it chronicle.
chronicle <- read_csv("data/chronicle.csv")
  • Your turn (3 minutes): Who are the most prolific authors of the 100 most recent opinion articles in The Chronicle?
chronicle |>
  count(author, sort = TRUE)
# A tibble: 204 × 2
   author                        n
   <chr>                     <int>
 1 Luke A. Powery               30
 2 Heidi Smith                  27
 3 Advikaa Anand                22
 4 Monday Monday                17
 5 Monika Narain                16
 6 Community Editorial Board    12
 7 Linda Cao                    12
 8 Sonia Green                  12
 9 Valerie Tan                  11
10 Nathan Luzum                 10
# ℹ 194 more rows
  • Demo: Draw a line plot of the number of opinion articles published per day in The Chronicle.
chronicle |>
  count(date) |>
  ggplot(aes(x = date, y = n, group = 1)) +
  geom_line()

  • Demo: What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title?
chronicle |>
  mutate(
    title = str_to_lower(title),
    climate = if_else(str_detect(title, "climate"), "mentioned", "not mentioned")
    ) |>
  count(climate) |>
  mutate(prop = n / sum(n))
# A tibble: 2 × 3
  climate           n  prop
  <chr>         <int> <dbl>
1 mentioned        10  0.02
2 not mentioned   490  0.98
  • Your turn (5 minutes): What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title or abstract?
chronicle |>
  mutate(
    title = str_to_lower(title),
    abstract = str_to_lower(abstract),
    climate = if_else(
      str_detect(title, "climate") | str_detect(abstract, "climate"), 
      "mentioned", 
      "not mentioned"
      )
    ) |>
  count(climate) |>
  mutate(prop = n / sum(n))
# A tibble: 3 × 3
  climate           n  prop
  <chr>         <int> <dbl>
1 mentioned        14 0.028
2 not mentioned   482 0.964
3 <NA>              4 0.008
  • Time permitting: Come up with another question and try to answer it using the data.
# add code here
  • Time permitting: