The prompt
Design evals for an AI tutor for K-12 math. How do you measure correctness, pedagogy quality, and safety? What is the bar for general availability?
Concepts this question tests
How to practice it in PrepOS
PrepOS's adaptive queue surfaces this question when you calibrate to a matching target level and your weakest concepts overlap with what it tests. Open the practice simulator and select the round type to start a rep with this prompt.
Practice this rep →