Grammar of graphics

Lecture 2

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Spring 2024

2024-01-18

Warm up

While you wait…

Questions from the prepare materials?

Announcements

  • Lab 1 on Monday – go to your assigned lab
  • Catch up with the prepare materials

From the survey

  • Almost half of the class mentioned being nervous about coding – you’re not alone! 💙

  • A good number of students mentioned being nervous about being new to stats + data science – you’re also not alone! 💜

  • Visibility of text on slides: Cranked up font size! 💛

  • About exams: No coding on the in class (time-limited) exam + there will be practice exams

  • About teams: Peer evaluations + milestones throughout semester

  • Grading criteria: Scale of 0-4 for each exercise on lab + more details in lab instructions

  • Hardware: Code running on university containers

Toolkit: Version control and collaboration

Git and GitHub

Git logo

  • Git is a version control system – like “Track Changes” features from Microsoft Word, on steroids
  • It’s not the only version control system, but it’s a very popular one

GitHub logo

  • GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better

  • We will use GitHub as a platform for web hosting and collaboration (and as our course management system!)

Versioning - done badly

Versioning - done better

Versioning - done even better

with human readable messages

How will we use Git and GitHub?

How will we use Git and GitHub?

How will we use Git and GitHub?

How will we use Git and GitHub?

Git and GitHub tips

  • There are millions of git commands – ok, that’s an exaggeration, but there are a lot of them – and very few people know them all. 99% of the time you will use git to add, commit, push, and pull.
  • We will be doing Git things and interfacing with GitHub through RStudio, but if you google for help you might come across methods for doing these things in the command line – skip that and move on to the next resource unless you feel comfortable trying it out.
  • There is a great resource for working with git and R: happygitwithr.com. Some of the content in there is beyond the scope of this course, but it’s a good place to look for help.

Tour: Git + GitHub

Just one option for now:

Sit back and enjoy the show!

Data visualization

Examining data visualization

Discuss the following for the visualization.

04:00

Source: Twitter

ae-02-bechdel-dataviz

  • Go to the course GitHub org and find your ae repo (repo name will be suffixed with your GitHub name).
  • Clone the repo in your container, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline (end of this week)

Recap of AE

  • Construct plots with ggplot().
  • Layers of ggplots are separated by +s.
  • The formula is (almost) always as follows:
ggplot(DATA, aes(x = X-VAR, y = Y-VAR, ...)) +
  geom_XXX()
  • Aesthetic attributes of a geometries (color, size, transparency, etc.) can be mapped to variables in the data or set by the user, e.g. color = binary vs. color = "pink".
  • Use facet_wrap() when faceting (creating small multiples) by one variable and facet_grid() when faceting by two variables.