AE 09: Opinion articles in The Chronicle

Part 1 - Data scraping

See chronicle-scrape.R for suggested scraping code.

Part 2 - Data analysis

Let’s start by loading the packages we will need:

library(tidyverse)

Your turn (1 minute): Load the data you saved into the data folder and name it chronicle.

chronicle <- read_csv("data/chronicle.csv")

Your turn (3 minutes): Who are the most prolific authors of the 100 most recent opinion articles in The Chronicle?

chronicle |>
  count(author, sort = TRUE)

# A tibble: 204 × 2
   author                        n
   <chr>                     <int>
 1 Luke A. Powery               30
 2 Heidi Smith                  27
 3 Advikaa Anand                22
 4 Monday Monday                17
 5 Monika Narain                16
 6 Community Editorial Board    12
 7 Linda Cao                    12
 8 Sonia Green                  12
 9 Valerie Tan                  11
10 Nathan Luzum                 10
# ℹ 194 more rows

Demo: Draw a line plot of the number of opinion articles published per day in The Chronicle.

chronicle |>
  count(date) |>
  ggplot(aes(x = date, y = n, group = 1)) +
  geom_line()

Demo: What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title?

chronicle |>
  mutate(
    title = str_to_lower(title),
    climate = if_else(str_detect(title, "climate"), "mentioned", "not mentioned")
    ) |>
  count(climate) |>
  mutate(prop = n / sum(n))

# A tibble: 2 × 3
  climate           n  prop
  <chr>         <int> <dbl>
1 mentioned        10  0.02
2 not mentioned   490  0.98

Your turn (5 minutes): What percent of the most recent 100 opinion articles in The Chronicle mention “climate” in their title or abstract?

chronicle |>
  mutate(
    title = str_to_lower(title),
    abstract = str_to_lower(abstract),
    climate = if_else(
      str_detect(title, "climate") | str_detect(abstract, "climate"), 
      "mentioned", 
      "not mentioned"
      )
    ) |>
  count(climate) |>
  mutate(prop = n / sum(n))

# A tibble: 3 × 3
  climate           n  prop
  <chr>         <int> <dbl>
1 mentioned        14 0.028
2 not mentioned   482 0.964
3 <NA>              4 0.008

Time permitting: Come up with another question and try to answer it using the data.

# add code here

Time permitting: