AE 08: Data import

Application exercise

Packages

We will use the following two packages in this application exercise.

  • tidyverse: For data import, wrangling, and visualization.
  • readxl: For importing data from Excel.
library(tidyverse)
library(readxl)

Part 1: Hollywood relationships

# add code here
  • Your turn (5 minutes): Split the data into three – where woman is older, where man is older, where they are the same age. Save these subsets as two appropriately named data frames. Remember: Use concise and evocative names. Confirm that these new objects appear in your Environment tab and that the sum of the number of observations in the two new data frames add to the number of observations in the original data frame.
# add code here
  • Demo: Write out the three new datasets you created into the data folder:
# add code here

Part 2: Sales

Sales data are stored in an Excel file that looks like the following:

  • Demo: Read in the Excel file called sales.xlsx from the data-raw/ folder such that it looks like the following.

# add code here
  • Demo - Stretch goal: Manipulate the sales data such such that it looks like the following.

# add code here
  • Question: Why should we bother with writing code for reading the data in by skipping columns and assigning variable names as well as cleaning it up in multiple steps instead of opening the Excel file and editing the data in there to prepare it for a clean import?

Add response here.