Table 1.

Common assessment tools grouped by Miller’s levels of competence

Assessment Categories and InstrumentsDescription of AssessmentAdvantagesDisadvantages
Assessment of “knows” and “knows how”
MCQsMCQs are a selected response instrument consisting of a stem or case/problem description, a lead-in question, and a list of options. Single-best answer is the most common format.Efficiently tests a broad range of knowledge and application in a single test; easiest format to produce statistically reliable results; automated markingLimited cognitive level tested; should not be used to extrapolate what students can show/do; hard to write to avoid technical flaws that allow “test-smart” students to select correct responses; faculty training needed
Short answerStudent answers structured questions with an open-ended response. They are scored against predetermined model answers.Easy to create items; reasonable content coverage; easier to grade than full essays but still requires significant effort; often valued because the answer is not in front of the student (constructed response instead of selected response)Require a large number of questions to match MCQs for reliability (e.g., 30-40 structured items); usually test at the same cognitive levels as MCQs but less efficient
Essay/reportStudent submits prose in response to a stimulus. They can be scored either with a points system against a rubric or using a global rating.Easy for faculty members to create; can assess written communication skills; possibility of assessing complex topics and ability to make coherent arguments Limited representation of content; usually modest reliability and interrater agreement; time consuming for faculty members to grade
Oral exam (viva voce)One or more examiners ask candidates questions face to face. Questions should be blueprinted; answers should be recorded and graded against a predetermined rubric.Traditional in some disciplines (notably surgery); valued for demonstration by the candidate of synthesis under pressure but no verified advantages; better used in formative situationsLow reliability; high interrater variation; testing usually limited to knowledge level; prone to unconscious examiner biases
Assessment of “shows” (demonstrations of performance in simulated setting)
Laboratory practical/simulated clinical examsStudents are observed performing defined tasks in a specified time and are graded against a standardized checklist. In medical education, the most common example is the objective structured clinical examination.Fairly authentic situations; tasks or cases can be standardized, allowing more reliable gradingGood reliability requires several stations (usually >10); labor intensive to create, grade, and set standards; expensive to deploy; is a demonstration of student’s best effort, not what is done in real practice; context specificity makes it hard to extrapolate skills observed at a given station
Assessment of “does” (conscious demonstration of performance in a real-world setting)
Direct observationStudents are observed in a practice situation such as a laboratory or clinic. Rating scales are needed to describe the criteria of interest (e.g., procedural skills or communication skills).Provide an assessment of what learners do in real situations; easy to administer; can also provide global rating of performanceRequires faculty training to assure reliability; need to have multiple encounters to provide reliable data
PortfolioA collection of work samples, projects, and evaluations over time to provide evidence of achievement of goals. It should be accompanied by student goal setting and frequent faculty feedback on progress.A representation of actual performance over time; powerful as a feedback and as a progress monitoring deviceTime consuming for students and faculty members; often low acceptance by students; hard to grade reliably and to set standards under high-stakes conditions
Peer assessmentA group of students typically assess each other’s work using a rubric or criteria previously determined by faculty. The object of assessment is variable (e.g., project, presentation, or professional behavior).Encourages student responsibility and ownership; develops student skills of judgment; valuable alternative feedback perspective, especially for teamwork and behaviorGrade inflation likely with less reliability of scores; better suited to formative assessment; reluctance to give negative feedback if not anonymous; requires faculty members to brief students on how to assess and give feedback; should be supervised
Self-assessmentStudents make judgments about their own learning, achievements, and learning outcomes, usually according to established criteria.Encourages goal setting and responsibility; promotes the development of reflective practiceGrade inflation likely with less reliability of scores; requires guided practice to develop self-monitoring skills
360-Degree (multisource feedback)Surveys completed by several individuals within the candidate’s domain of competency, including supervisors, peers, other coworkers, and clients. These are sually targeted at observable behaviors and interpersonal skills.Authentic assessment in a real-world setting; includes multiple perspectives; provides evidence about behavior and therefore is a powerful feedback toolReluctance of evaluators to provide negative feedback of workmates; large number of raters (>10) needed for reliable data; difficult to deploy and collect data
Assessment of “is” (consistent demonstration of expected values, attitudes, and behaviors; fully formed professional identity, e.g., independent scientist or healthcare provider)
InterviewsA subject-object one-on-one interview to explore professional identity.Indepth personal explorationRequires a highly skilled examiner; data from “does” level are a prerequisite
Standardized survey inventoriesA new area of research with limited tools available.Easy to deploy; theoretically groundedRelies on self-report; not well validated at this time
  • MCQ, multiple-choice question.