```
Rows: 50
Columns: 1
$ ppg <dbl> 48.00000, 40.00000, 99.00000, 13.00000, 55.00000, 75.00000, 74.00000, 169.00…
```

Lecture 20

Dr. Mine Çetinkaya-Rundel

Duke University

STA 199 - Spring 2024

2024-04-02

Any questions from prepare materials?

- Lab 7 (last lab) is due Monday 4/8 at 8 am
- Have at least one team meeting this week to review feedback and make some progress on your lab for peer review after exam on Monday 4/15

We have data on the price per guest (ppg) for a random sample of 50 Airbnb listings in 2020 for Asheville, NC. We are going to use these data to investigate what we would of expected to pay for an Airbnb in in Asheville, NC in June 2020.

We have data on the price per guest (ppg) for a random sample of 50 Airbnb listings in 2020 for Asheville, NC. We are going to use these data to investigate what we would of expected to pay for an Airbnb in in Asheville, NC in June 2020.

**Population parameter**- What we are interested in. Statistical measure that describes an entire population.**Sample statistic (point estimate)**- describes a sample. A piece of information you get from a fraction of the population.

Suppose you’re fishing in a murky lake. Are you more likely to catch a fish using a spear or a net?

- Spear \(\rightarrow\) point estimate
- Net \(\rightarrow\) interval estimate

Quantifying the variability of the sample statistics to help calculate a range of plausible values for the population parameter of interest:

Via simulation \(\rightarrow\) using bootstrapping – using a statistical procedure that re samples a single data set to create many simulated samples.

Via mathematical formulas \(\rightarrow\) using the Central Limit Theorem

The term bootstrapping comes from the phrase “pulling oneself up by one’s bootstraps”, which is a metaphor for accomplishing an impossible task without any outside help.

Impossible task: estimating a population parameter using data from only the given sample.

**Note**

Note: This notion of saying something about a population parameter using only information from an observed sample is the crux of statistical inference, it is not limited to bootstrapping.

- Resample with replacement from our data n times, where n is the sample size
- Calculate the sample statistic of interest of the new, resampled (bootstrapped) sample and record the value
- Do this entire process many many times to build the
**bootstrap distribution**

What do you expect the center of the bootstrap distribution to be? Why? Check your guess by visualizing the distribution.

Which of the following is the correct interpretation of the bootstrap interval?

There is a 95% probability the true mean price per guest for an Airbnb in Asheville is between $64.7 and $89.6.

There is a 95% probability the price per guest for an Airbnb in Asheville in this sample is between $64.7 and $89.6.

We are 95% confident the true mean price per guest for Airbnbs in Asheville is between $64.7 and $89.6.

We are 95% confident the price per guest for an Airbnb in Asheville in this sample is between $64.7 and $89.6.

Calculating the observed sample statistic:

Calculating the interval:

Visualizing the interval:

`ae-15-duke-forest-bootstrap`

- Go back to your project called
`ae`

. - If there are any uncommitted files, commit them, and push.
- Work on
`ae-15-duke-forest-bootstrap.qmd`

.