Syllabus

Stat 210A is an introductory Ph.D.-level course in theoretical statistics. It is a fast-paced and demanding course intended to prepare students for research careers in statistics.

Course information

Instructors

Primary Instructor Will Fithian
- Office Hours: Tuesday 3:30-4:30pm on Zoom, Thursday 9:30-10:30am in Evans 301
- Email: wfithian@berkeley.edu
GSI TBD
- Office Hours: TBD
- Email: TBD

Course schedule

Lectures: Tuesday and Thursday 9:30-11:00am, Evans 60
Recitation sections: TBD, starting September 5
Veterans Day: No class on Tuesday, November 11 (quiz held Thursday, November 13)
Thanksgiving week:
- Tuesday November 25: lecture 9:30-11am on Zoom, 3:30-4:30 OH on Zoom
- Thursday November 27: no lecture or OH
- All other OH canceled
Final exam review: (last recitation section) Friday, December 12
Final exam: Tuesday December 16, 3-6pm

Course communications

Lecture videos and homework solutions at [https://bcourses.berkeley.edu bCourses]
Email policy: You can email course staff about administrative questions, with “[Stat 210A]” in the subject line. No math over email, please.
Ed page for announcements and technical discussion (no homework spoilers!)
Gradescope for turning in homework

About Stat 210A

What is the theory of statistics?

Statistics is the study of methods that use data to understand the world. Statistical methods are used throughout the natural and social sciences, in machine learning and artificial intelligence, and in engineering. Despite the ubiquitous use of statistics, its practitioners are perpetually accused of not actually understanding what they are doing. Statistics theory is, broadly speaking, the subject of what exactly we are doing when we apply statistical methods.

While there are many possible ways to analyze data, most (but certainly not all) statistical methods are based on statistical modeling: treating the data as a realization of some random data-generating process with attributes, usually called parameters, that are a priori unknown. The goal of the analyst, then, is to use the data to draw accurate inferences about these parameters and/or to make accurate predictions about future data. If the modeling has been done well (a very big “if”) then these unknown parameters will correspond well to whatever real-world questions initially motivated the analysis. Applied statistics courses like Stat 215A and B delve deeply into questions about how to ensure that the statistical modeling exercise successfully captures something interesting about reality.

In this course we will instead focus on how the analyst can use the data most effectively within the context of a given mathematical setup. We will discuss the structure of statistical models, how to evaluate the quality of a statistical method, how to design good methods for new settings, and the philosophy of Bayesian vs frequentist modeling frameworks. We will cover estimation, confidence intervals, and hypothesis testing, in parametric and nonparametric methods, in finite samples and asymptotic regimes.

Topics

Statistical decision theory (frequentist and Bayesian), exponential families, point estimation, hypothesis testing, resampling methods, estimating equations and maximum likelihood, empirical Bayes, large-sample theory, high-dimensional testing, multiple testing and selective inference.

Prerequisites

The course prerequisites are linear algebra, analysis, probability, and statistics. See the course FAQ for more details if you are unsure about your level of preparation.

Relationship of Stat 210A to other Berkeley courses

Stat 210A focuses on classical statistical contexts: inference in finite samples and in fixed-dimensional asymptotic regimes. Stat 210B (for which 210A is a prerequisite) is more technical and covers topics like empirical process theory and high-dimensional statistics.

Berkeley’s graduate course on Statistical Learning Theory (CS 281A / Stat 241A) is also very popular and has some overlap in its topics. Roughly speaking, it is more tilted toward “machine learning”: it spends more time on topics in predictive modeling (i.e. classification and regression, which are covered in Stat 215A), optimization, and signal processing, but spends less time on inferential questions and (I believe) does not cover topics like hypothesis testing, confidence intervals, and causal inference. Both courses cover estimation and exponential families.

References

The online notes for this course are self-contained, however it can be helpful to see a different presentation in the following supplementary texts (all links are to public websites or Springer Link):

Keener, Theoretical Statistics: Topics for a Core Course, Springer 2010. The textbook that is closest in technical level and presentation style to our course reader.
Lehmann and Casella, Theory of Point Estimation, Springer 1998. A highly-detailed reference text that covers much of the estimation material in this course.
Lehmann and Romano, Testing Statistical Hypotheses, Springer 2005. A highly-detailed reference text covering much of the material on testing and confidence estimation.
Hacking, Probability and Inductive Logic, Cambridge University Press, 2001. A beautifully written book that treats probability and statistics from a philosophical point of view.
Candes, Stats 300C Lecture notes, Stanford 2016. The course notes for a great course at Stanford that covers some of the later material in this course.

Undergrad-level review texts for prerequisites:

Grading

Your final grade is based on:

Weekly problem sets: 40%
Weekly quizzes: 20%
Final exam: 40%

Drop policy: We will drop your lowest two homework grades and your lowest three quiz grades. It is assumed that You are meant to use drops when you have a good reason:

Lateness policy: Homework must be submitted to Gradescope at 11:59pm on Wednesday nights, and solutions will be released after they are due. Late problem sets will not be accepted.

Tuesday Quizzes (new this semester): At the beginning of class on Tuesday, there will be a 10-minute quiz consisting of an extra part for one of the problems on the homework that was due the previous Wednesday night. Your odds of solving it will be higher if you have read and understood the solutions.

Collaboration policy: For homework, you are welcome to work with each other or consult articles or textbooks online, with the following caveats:

You must write up your solution by yourself.
You may not:
- consult any solutions from previous iterations of this course
- use generative AI to solve the problems
If you collaborate or use any resources other than course texts, you must acknowledge your collaborators and the resources you used.

Academic integrity: You are expected to abide by the Berkeley honor code. Violating the collaboration policy, or cheating in any other way, will result in a failing grade for the semester and you will be reported to the University Office of Student Conduct.

While the final exam is nominally 40% of the grade, it typically accounts for most of the variance in final course grades since the average grade on it is commonly around 50%.

Accommodations

Students with disabilities: Please see me as soon as possible if you need particular accommodations, and we will work out the necessary arrangements.

Scheduling conflicts: Please notify me in writing by the second week of the semester about any known or potential extracurricular conflicts (such as religious observances, graduate or medical school interviews, or team activities). I will try my best to help you with making accommodations, but cannot promise them in all cases. In the event there is no mutually-workable solution, you may be dropped from the class.

Exam accommodations: If you need accommodations on the final exam due to disability, or unavoidable travel or time conflict, please fill out the exam exam accommodation form by Friday, October 3 so that I can make arrangements. To ensure exam integrity I much prefer for all students to take the exam in Berkeley at the regularly scheduled time, but will try to work with you if you have a conflict.