Writing · Assessment

Scaling Oral Exams

Once an AI tool can produce a confident written answer in seconds, a written exam stops telling you what a student understands. Oral exams still do. The question that occupied me was practical: how do you run them in a class of more than 30 students without spending your whole semester in an exam room?

Notes from CSE 5114 and CSE 3104 · Updated June 2026

Why oral exams

An oral exam asks a student to talk through a problem out loud while you listen and probe. That single change fixes several things at once. It evaluates the student independent of whatever tools they used to study, which matters when a chatbot can write a passable essay or function on demand. It also mirrors situations students will actually face: job interviews, design reviews, standups, thesis defenses, oral boards for physicians, qualifying exams.

The format has quieter advantages too. It adapts to the person in front of you, so a strong student can be pushed and a struggling student can be met where they are. It is fast to grade because you are forming the judgment while the exam happens. And it lets you grade by the spirit of the answer rather than the letter, rewarding genuine reasoning over memorized phrasing.

What the exam looks like

I model my exams on a design interview at a technology company. Each one runs 20 to 25 minutes, with about a 5 minute buffer between students for notes and grading. Everyone starts from the same open-ended prompt so the exam is comparable across students; the follow-up questions become semi-structured after that, shaped by where the conversation goes.

I record the exams, both to support grading and to give students something to review. Students get one retake if they want it, and historically about 10 to 20 percent take it. The retake is not a loophole; it is a recognition that a single 20 minute window is a noisy measurement, and that a student who comes back better prepared has learned the thing the exam was checking for.

Grading without it taking forever

Before the exam I pick several key concepts or learning goals. During the conversation I grade each one coarsely on a check-minus, check, check-plus scale, which I treat as 1, 2, 3, with a 0 or a 4 reserved for the rare exceptional case. A few sentences of notes per student are enough to jog my memory later.

The coarse scale is what keeps grading honest and quick. If I already have a clear read on one topic, I move the student to another so I am spending time where the signal is. If an answer sounds superficial, I dig deeper before deciding. Afterward I convert the marks to a numeric grade with a fixed rule, so the judgment happens live and the arithmetic happens consistently. When a TA is unsure about a category, they flag it and I review the recording.

Scaling with TAs

Up to a class of about 30, I run every exam myself. Beyond that I lean on teaching assistants, and I aim for roughly 8 students per TA. Giving exams turns out to be good practice for the TAs, who are often students themselves heading into the same interviews.

Consistency is the thing that breaks first when you add people, so I invest in it directly. For each exam I write out the full prompt, a sample solution, and a set of sample probing questions. TAs train in advance with examples and recordings. When I double-checked grades against the recordings, most TA grades were well calibrated. Two unglamorous logistics matter more than you would expect: reserve exam rooms early, and make sure every student leaves with feedback, which the TA notes and the recordings provide.

What it actually costs

The common objection is that oral exams cannot scale. The arithmetic says otherwise, because TA time scales with the number of students, not the number of TAs. Assuming 8 students per TA and one exam per semester, an exam costs roughly 3 to 4 hours of administering and grading plus about an hour of prep per TA. That is on par with grading a written exam or a final project.

It helps to compare the two formats side by side:

Written exam. Testing takes students 1 to 2 hours and proctors 1 to 2 hours each; grading then takes another 10 to 30 minutes per student.
Oral exam. Testing takes about 20 minutes per student, for both the student and the TA, and grading is largely done by the time the exam ends, at roughly 5 minutes per student.

The oral exam front-loads the work into the conversation instead of leaving a stack of papers to mark afterward, which is why the totals come out similar.

How students react

At the start of the semester students arrive with a mix of apprehension, curiosity, and cautious excitement. There is real anxiety about being put on the spot, so I work to lower the pressure: clear expectations, a shared starting prompt, the retake, and a tone that treats the exam as a conversation rather than an interrogation.

By the end, the sentiment shifts. In course evaluations, on a 7 point scale, 83 percent of responding students gave a 6 or 7 to the statement that the oral midterm format, intended to mimic a data engineering design interview, was valuable for their learning. Several students, and several TAs, have told me the practice had a direct effect on landing an internship or a job. That is the outcome I care about most: an assessment that doubles as preparation for the work itself.

This piece is the written version of a talk; you can open the slides for the condensed form. Oral exams are one half of how I keep assessment meaningful when AI tools are available. The other half is the grading model in Multiplicative Grading, and the wider argument is in Teaching Computer Science in the Age of AI. Both formats live in CSE 5114.