Hi everyone! I will be using this to upload the practice problem sets(taken from prior prelims, with some slightly adapted by Rohan Hore to summarize the paper and/or give more potential questions.)
I highly recommend working through the problems before we go over them together, or at least reading through them and thinking about how you would approach each problem even if you don't write anything out.
The planned schedule is as follows:
| Date Time | Location | Problem Set |
|---|---|---|
| Tu 8/12 3pm | Jones 226 | Applied Analysis 2 (2020 prelim) |
| Th 8/14 3pm | Jones 226 | Applied Analysis 3 (2015 prelim) |
| Tu 8/19 3pm | Zoom | Applied Analysis 4 (2019 prelim) |
| Th 8/21 3pm | Zoom | Applied Analysis 5 (2016 prelim) |
| Tu 8/26 3pm | Zoom | Applied Analysis 6 (2021 prelim) |
| Th 8/28 3pm | Zoom | 2022 prelim |
| Tu 9/2 3pm | Jones 226 | 2023 prelim |
| Th 9/4 3pm | Jones 226 | 2024 prelim - Mock Oral |
| -- | -- | extra (2017 prelim) |
A few notes on the format and some general tips(we will talk about this more in the sessions):
- The prelim generally is structured with simpler data visualization/testing problems at the start and then more complicated modling questions after.
- Don't over complicate your answers/model choices, especially on early questions. I generally suggest that you choose easy/standard models for your answers. The goal of the prelims is to find a reasonable solution, not the best solution. This typically helps with the model diagnostics and explanations. If you are confident about your approaches feel free to complicate but starting simpler isn't a bad idea.
- If you are asked to find statistical issues in a paper or some analysis, investigate how they have performed their tests. Most of the times, you will see a problem in form of multiple testing, a test that is not interpretable, or some clearly un-modeled or mis-modeled effects.
- The oral component of the prelim is very impoirtant. The usual questions are in the form of 'why did you do this?', 'What else you can do here?' If you have pointed out some issues, you might be asked to either elaborate or asked for a possible solution for them. Often, they also ask to explain any observable pattern in the data (plot). Keeping that in mind, it might be helpful to go through your solution once before the oral exam, so that you know your own solution well.
- It is often important to rely on intuition and standard patterns/good practices. For example, if we see a response of count data, the first thought we should have is: there might be skewness, and a log transform could help! Some other examples include respecting ordinal behavior of response, questioning linearity of a covariate (for e.g., time).
More Detailed tips/notes:
- Plotting
- Be careful that histogram/density smoothing doesn't lie about patterns
- Modeling
- Choose model(LM/GLM and why?)
- Sometimes it's stated what to use, sometimes it's open for you to decide
- Often the type of data motivates the model (e.g. Count data ->(poisson/NB/ZIP)
- Fit the model & assess fit
- Feature selection
- Diagnositic plots
- Residual plot (Over dispersion/hetroskeasticity)
- Outliers?
- Interpret p-values & appropriate model comparison tests
- Comment on reliability of results based on model fit and assumptions
- Choose model(LM/GLM and why?)