Syllabus
Stat 210A is an introductory Ph.D.-level course in theoretical statistics. It is a fast-paced and demanding course intended to prepare students for research careers in statistics.
Course information
Instructors
- Primary Instructor Prof. Will Fithian
- Office Hours: Tuesday 3:30-4:30pm on Zoom, Thursday 9:30-10:30am in 301 Evans
- Email: wfithian@berkeley.edu
- GSI Dohyeong Ki
- Office Hours: Wednesday 9-10am, Friday 4:30-5:30pm in 444 Evans
- Email: dohyeong_ki@berkeley.edu
Course schedule
- Lectures: Tuesday and Thursday 11am-12:30pm, Evans 60
- Recitation sections: Every second F 3:30-4:30pm in 444 Evans, starting September 6
- Election week:
- Blanket 2-day extension on homework 9 (due Friday November 8, 11:59pm)
- Thanksgiving week:
- Tuesday November 26: lecture 11-12:30 on Zoom, 3:30-4:30 OH on Zoom
- No lecture Thursday November 28
- No in-person office hours
- Final exam review: (a.k.a. last recitation section) Friday, December
- Final exam: Wed December 18, 8-11am
Course communications
- Lecture videos and homework solutions at [https://bcourses.berkeley.edu bCourses]
- Email policy: You can email me or Dohyeong about administrative questions, with “[Stat 210A]” in the subject line. No math over email, please.
- Ed page for announcements and technical discussion (no homework spoilers!)
- Gradescope for turning in homework
About Stat 210A
What is the theory of statistics?
Statistics is the study of methods that use data to understand the world. Statistical methods are used throughout the natural and social sciences, in machine learning and artificial intelligence, and in engineering. Despite the ubiquitous use of statistics, its practitioners are perpetually accused of not actually understanding what they are doing. Statistics theory is, broadly speaking, the subject of what exactly we are doing when we apply statistical methods.
While there are many possible ways to analyze data, most (but certainly not all) statistical methods are based on statistical modeling: treating the data as a realization of some random data-generating process with attributes, usually called parameters, that are a priori unknown. The goal of the analyst, then, is to use the data to draw accurate inferences about these parameters and/or to make accurate predictions about future data. If the modeling has been done well (a very big “if”) then these unknown parameters will correspond well to whatever real-world questions initially motivated the analysis. Applied statistics courses like Stat 215A and B delve deeply into questions about how to ensure that the statistical modeling exercise successfully captures something interesting about reality.
In this course we will instead focus on how the analyst can use the data most effectively within the context of a given mathematical setup. We will discuss the structure of statistical models, how to evaluate the quality of a statistical method, how to design good methods for new settings, and the philosophy of Bayesian vs frequentist modeling frameworks. We will cover estimation, confidence intervals, and hypothesis testing, in parametric and nonparametric methods, in finite samples and asymptotic regimes.
Topics
Statistical decision theory (frequentist and Bayesian), exponential families, point estimation, hypothesis testing, resampling methods, estimating equations and maximum likelihood, empirical Bayes, large-sample theory, high-dimensional testing, multiple testing and selective inference.
Prerequisites
The course prerequisites are linear algebra, analysis, probability, and statistics.
Relationship of Stat 210A to other Berkeley courses
Stat 210A focuses on classical statistical contexts: either inference in finite samples, or in fixed-dimensional asymptotic regimes. Stat 210B (for which 210A is a prerequisite) is more technical and covers topics like empirical process theory and high-dimensional statistics.
Berkeley’s graduate course on Statistical Learning Theory (CS 281A / Stat 241A) is also very popular and has some overlap in its topics. Roughly speaking, it is more tilted toward “machine learning”: it spends more time on topics in predictive modeling (i.e. classification and regression, which are covered in Stat 215A), optimization, and signal processing, but spends less time on inferential questions and (I believe) does not cover topics like hypothesis testing, confidence intervals, and causal inference. Both courses cover estimation and exponential families.
References
The online notes for this course are self-contained, however it can be helpful to see a different presentation in the following supplementary texts (all links are to public websites or Springer Link):
Keener, Theoretical Statistics: Topics for a Core Course, Springer 2010. The textbook that is closest in technical level and presentation style to our course reader.
Lehmann and Casella, Theory of Point Estimation, Springer 1998. A highly-detailed reference text that covers much of the estimation material in this course.
Lehmann and Romano, Testing Statistical Hypotheses, Springer 2005. A highly-detailed reference text covering much of the material on testing and confidence estimation.
Hacking, Probability and Inductive Logic, Cambridge University Press, 2001. A beautifully written book that treats probability and statistics from a philosophical point of view.
Candes, Stats 300C Lecture notes, Stanford 2016. The course notes for a great course at Stanford that covers some of the later material in this course.
Undergrad-level review texts for prerequisites:
Grading
Your final grade is based on:
Weekly problem sets: 50%
Final exam: 50%
Lateness policy: Homework must be submitted to Gradescope at 11:59pm on Wednesday nights. Late problem sets will not be accepted, but we will drop your lowest two grades.
Collaboration policy: For homework, you are welcome to work with each other or consult articles or textbooks online, with the following caveats:
You must write up your solution by yourself.
You may NOT consult any solutions from previous iterations of this course.
No generative AI allowed for problem sets.
If you collaborate or use any resources other than course texts, you must acknowledge your collaborators and the resources you used.
Academic integrity: You are expected to abide by the Berkeley honor code. Violating the collaboration policy, or cheating in any other way, will result in a failing grade for the semester and you will be reported to the University Office of Student Conduct.
While the final exam is nominally one half of the grade, it is quite difficult and typically accounts for most of the variance in final course grades. If you take shortcuts on the homework, you will save yourself time in the short run, but you may do quite poorly on the final.
Accommodations
Students with disabilities: Please see me as soon as possible if you need particular accommodations, and we will work out the necessary arrangements.
Scheduling conflicts: Please notify me in writing by the second week of the semester about any known or potential extracurricular conflicts (such as religious observances, graduate or medical school interviews, or team activities). I will try my best to help you with making accommodations, but cannot promise them in all cases. In the event there is no mutually-workable solution, you may be dropped from the class.
Exam accommodations: If you need accommodations on the final exam due to disability, or unavoidable travel or time conflict, please fill out the exam exam accommodation form by Friday, October 4 so that I can make arrangements. To ensure exam integrity I much prefer for all students to take the exam in Berkeley at the regularly scheduled time (8am December 18th), but will try to work with you if you have a conflict.