Data types and classes

Lecture 8

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Spring 2024

2024-02-08

Warm up

While you wait for class to begin…

  • Go to your ae repo, click Pull to get today’s application exercise to get ready for later.

  • Questions from the prepare materials?

Questions from last time

Pivoting data

Suppose we have the following patient data:

patients
# A tibble: 3 × 4
  patient_id pulse_1 pulse_2 pulse_3
  <chr>        <dbl>   <dbl>   <dbl>
1 XYZ             70      85      73
2 ABC             90      95     102
3 DEF            100      80      70

And we want to know:

  • Average pulse rate for each patient.

  • Trends in pulse rates across measurements.

Pivoting data

Suppose we have the following patient data:

patients
# A tibble: 3 × 4
  patient_id pulse_1 pulse_2 pulse_3
  <chr>        <dbl>   <dbl>   <dbl>
1 XYZ             70      85      73
2 ABC             90      95     102
3 DEF            100      80      70

And we want to know:

  • Average pulse rate for each patient.

  • Trends in pulse rates across measurements.

These require a longer format of the data where all pulse rates are in a single column and another column identifies the measurement number.

Pivoting data

patients_longer <- patients |>
  pivot_longer(
    cols = !patient_id,
    names_to = "measurement",
    values_to = "pulse_rate"
  )

Summarizing pivoted data

patients_longer |>
  group_by(patient_id) |>
  summarize(mean_pulse = mean(pulse_rate))
# A tibble: 3 × 2
  patient_id mean_pulse
  <chr>           <dbl>
1 ABC              95.7
2 DEF              83.3
3 XYZ              76  

Visualizing pivoted data

ggplot(
  patients_longer, 
  aes(x = measurement, y = pulse_rate, group = patient_id, color = patient_id)
  ) +
  geom_line()

Types and classes

Types and classes

  • Type is how an object is stored in memory, e.g.,

    • double: a real number stored in double-precision floatint point format.
    • integer: an integer (positive or negative)
  • Class is metadata about the object that can determine how common functions operate on that object, e.g.,

    • factor

Types of vectors

You’ll commonly encounter:

  • logical
  • integer
  • double
  • character

You’ll less commonly encounter:

  • list
  • NULL
  • complex
  • raw

Types of functions

Yes, functions have types too, but you don’t need to worry about the differences in the context of doing data science.

typeof(mean) # regular function
[1] "closure"
typeof(`$`) # internal function
[1] "special"
typeof(sum) # primitive function
[1] "builtin"

Factors

A factor is a vector that can contain only predefined values. It is used to store categorical data.

x <- factor(c("a", "b", "b", "a"))
x
[1] a b b a
Levels: a b
typeof(x)
[1] "integer"
attributes(x)
$levels
[1] "a" "b"

$class
[1] "factor"

Other classes

Just a couple of examples…

Date:

today <- Sys.Date()
today
[1] "2024-02-29"
typeof(today)
[1] "double"
attributes(today)
$class
[1] "Date"

Date-time:

now <- as.POSIXct("2024-02-08 11:45", tz = "EST")
now
[1] "2024-02-08 11:45:00 EST"
typeof(now)
[1] "double"
attributes(now)
$class
[1] "POSIXct" "POSIXt" 

$tzone
[1] "EST"

Application exercise

ae-07-population-types

  • Go to the project navigator in RStudio (top right corner of your RStudio window) and open the project called ae.

  • If there are any uncommitted files, commit them, and then click Pull.

  • Open the file called ae-07-population-types.qmd and render it.