Lecture 9
Duke University
STA 199 - Spring 2024
2024-02-13
Questions for/about the exam?
Exam format / flow
Academic dishonesty / Duke Community Standard
Explicit type coercion: You ask R to change the type of a variable
Implicit type coercion: R changes / makes assumptions for you about the type of a variable without you asking for it
A vector is a collection of values
Atomic vectors can only contain values of the same type
Lists can contain values of different types
Why do we care? Because each column of a data frame is a vector.
✅ From numeric to character
❌ From character to numeric
Which of the column types were implicitly coerced?
Suppose you conduct a survey and ask students their student ID number and number of credits they’re taking this semester. What is the type of each variable?
survey <- survey_raw |>
mutate(
student_id = if_else(student_id == "I don't remember", NA, student_id),
n_credits = case_when(
n_credits == "I'm not sure yet" ~ NA,
n_credits == "2 - underloading" ~ "2",
.default = n_credits
),
n_credits = as.numeric(n_credits)
)
survey
# A tibble: 4 × 2
student_id n_credits
<chr> <dbl>
1 273674 4
2 298765 4.5
3 287129 NA
4 <NA> 2
If variables in a data frame have multiple types of values, R will coerce them into a single type, which may or may not be what you want.
If what R does by default is not what you want, you can use explicit coercion functions like as.numeric()
, as.character()
, etc. to turn them into the types you want them to be, which will generally also involve cleaning up the features of the data that caused the unwanted implicit coercion in the first place.
openintro::loan50
# A tibble: 50 × 3
annual_income interest_rate homeownership
<dbl> <dbl> <fct>
1 59000 10.9 rent
2 60000 9.92 rent
3 75000 26.3 mortgage
4 75000 9.92 rent
5 254000 9.43 mortgage
6 67000 9.92 mortgage
7 28800 17.1 rent
8 80000 6.08 mortgage
9 34000 7.97 rent
10 80000 12.6 mortgage
# ℹ 40 more rows
What will the following code result in?
What will the following code result in?
What will the following code result in?
What will the following code result in?
Aesthetic mapping defined at the global level will be used by all geom
s for which the aesthetic is defined.
Aesthetic mapping defined at the local level will be used only by the geom
s they’re defined for.
Factors are used for categorical variables – variables that have a fixed and known set of possible values.
They are also useful when you want to display character vectors in a non-alphabetical order.
The forcats package has a bunch of functions (that start with fct_*()
) for dealing with factors and their levels: https://forcats.tidyverse.org/reference/index.html
Factors and the order of their levels are relevant for displays (tables, plots) and they’ll be relevant for modeling (later in the course)
factor
is a data class
==
|
|