Multiple-choice questions are a gold-standard tool in medical school for assessment of knowledge and are the mainstay of licensing examinations. However, multiple-choice questions items can be criticized for lacking the ability to test higher-order learning or integrative thinking across multiple disciplines. Our objective was to develop a novel assessment that would address understanding of pathophysiology and pharmacology, evaluate learning at the levels of application, evaluation and synthesis, and allow students to demonstrate clinical reasoning. The rubric assesses student writeups of clinical case problems. The method is based on the physician's traditional postencounter Subjective, Objective, Assessment and Plan note. Students were required to correctly identify subjective and objective findings in authentic clinical case problems, to ascribe pathophysiological as well as pharmacological mechanisms to these findings, and to justify a list of differential diagnoses. A utility analysis was undertaken to evaluate the new assessment tool by appraising its reliability, validity, feasibility, cost effectiveness, acceptability, and educational impact using a mixed-method approach. The Subjective, Objective, Assessment and Plan assessment tool scored highly in terms of validity and educational impact and had acceptable levels of statistical reliability but was limited in terms of acceptance, feasibility, and cost effectiveness due to high time demands on expert graders and workload concerns from students. We conclude by making suggestions for improving the tool and recommend deployment of the instrument for low-stakes summative assessment or formative assessment.
- Bloom's taxonomy
- multiple-choice questions
- curriculum integration
- physiology education
- medical education
medical education in the United States is in a period of dynamic change that is likely to profoundly impact both the positioning of physiology in the curriculum as well as the methods by which it is taught and assessed. For example, one aspect of the Carnegie Foundation report Educating Physicians: a Call for Reform of Medical School and Residency is the recommendation that the learning of formal basic science knowledge should be integrated with clinical experiences (9). Most medical schools are currently either engaged in curriculum reform or are planning to be in the foreseeable future (2). Harvard Medical School's new curriculum Pathways, beginning in fall 2015, is illustrative of newer thinking in which more emphasis will be placed on professional identity formation, population and social sciences, ethics and professionalism, clinical competency, and critical inquiry; basic sciences will be reduced from the traditional 2 yr to 1 yr of instruction (15).
At the same time as the learning of medical physiology is being compressed and integrated among many other disciplines, methods of teaching are shifting toward more active and independent formats. We are seeing the widespread adoption of progressive pedagogies such as “flipped classrooms” (23) and team-based learning (TBL) (22, 24) as well as increased use of technology to scaffold independent learning by students outside of the classroom (18). While this period of disruptive change will bring about real advances in the way students are educated, it brings with it a number of challenges for faculty members and students. Khalil and Kibble (17) recently discussed the myriad of challenges faculty members face when implementing a new medical curriculum. Among these is the fact that development of new methods of assessment is lagging behind in the thrust for education reform.
The single-best answer multiple-choice question (MCQ) is the current gold standard for knowledge testing in medical schools. MCQ exams have many advantages in terms of reliably testing broad swathes of knowledge in a manner that is time efficient and cost effective (1). However, it is too frequently the case that internally developed MCQ exams test at low cognitive levels (20, 33), are poorly constructed (14), and only test knowledge in one discipline area at a time (17). Therefore, we feel that assessing students with MCQs alone is misaligned with the goals of curriculum integration and higher-order active learning. This leaves a gap in which our assessments do not inform students about their achievement of higher-order learning tasks nor do they adequately inform faculty members about their success in deploying progressive pedagogies. To this end, the present study describes the development of a novel assessment instrument, based on the traditional doctor's Subjective, Objective, Assessment and Plan (SOAP) note (30). SOAP notes are used to provide a consistent and systematic framework for producing medical records, facilitating malady-specific diagnosis, and providing a consistent format for communication between physicians (31). Here, we describe a modification for second-year medical students in which the “P” simultaneously stands for Physiology/Pathophysiology/Pathology/Pharmacology to reflect that our second-year medical students are not formulating a treatment plan for their patient but are learning basic sciences through clinical case problems. We evaluated a rubric to assess how well students can solve clinical case problems to successfully identify subjective and objective findings, how well they can invoke physiological mechanisms to explain the patient's problems, derive a differential diagnosis, and explain relevant pharmacological mechanisms in the case.
Van der Vleuten's notion of assessment “utility” was used as the conceptual framework to evaluate our innovation (29). The components of a test's utility are reliability, validity, feasibility, cost effectiveness, acceptability, and educational impact. Our goal was to characterize the modified SOAP note as an authentic assessment of higher-order learning in the context of an integrated curriculum. A further goal of this study was to demonstrate the importance of applying an objective analysis to make decisions about the quality of assessments.
Participants and educational context.
The University of Central Florida College of Medicine offers an integrated 4-yr MD program. During the first year, foundational basic sciences are taught in Human Body modules that leverage traditional synergies between disciplines (e.g., physiology is taught together with anatomy, microbiology is taught with immunology, etc). In the second year, we use a body systems-based approach (S modules), which focuses on the study of disease process and culminates in students completing United States Medical Licensing Examination Step 1. The last phase of the curriculum is translation of knowledge and skills into practice and is represented by clerkship and elective rotations in the third and fourth academic years. The cardiopulmonary module in the second year (Fig. 1) was the focus of this study. This 6-wk module includes ∼3 wk of instruction on the cardiovascular system, 2 wk of instruction on the pulmonary system, and 1 wk to prepare for the final examination. The course is highly integrated with the concurrent Practice of Medicine course such that students are learning the cardiothoracic examination at the same time as the basic sciences. Pedagogy includes a wide variety of session types such as lectures, web-based self-learning modules, case-based learning, TBL, high-fidelity simulations, ultrasound sessions, and use of standardized patients. Participants for this study were a single cohort of second-year medical students. The college has a holistic admissions policy resulting in a class that is diverse for prior education, race, ethnicity, age, sex, etc. The class consisted of 80 second-year medical students (40 female students and 40 male students) with an average age at matriculation of 24.8 yr; the age range of the participants was between 22 and 36 yr. The study was reviewed and exempted by the Institutional Review Board of the University of Central Florida; faculty members and students gave informed consent for interviews and focus groups.
The modified SOAP note assessment tool.
The SOAP assessment tool was used with case-based learning sessions. The case topics were as follows: 1) hypertension, 2) heart failure, and 3) chronic obstructive pulmonary disease. Each case had a lead faculty author (A. Asmar, B. Gros, and T. Howard, respectively) who presented the draft case and answer key to the whole course faculty team for editing and final approval. The format of the case-based learning discussion as well as the details of the SOAP assessment rubric (see below) were described verbally in class and documented in writing via the online learning management system. Students were given an opportunity to ask questions in class as well as via online discussion boards to ensure faculty expectations were clear. Students were first exposed to the cases in their small groups of six to eight students using an online delivery system (LabTutor, AD Instruments); the same small groups were used for every case. Cases followed a consistent presentation that included the patient's chief complaint, history of present illness, past medical history, allergies, medication, review of systems, physical exam, and laboratory and imaging studies. After a group discussion of the case, each student was then expected to complete their own SOAP note assessment at home and post their individual answers to an online dropbox.
The SOAP note template was a simple table provided in Microsoft Word (Table 1) that directed students to identify and organize subjective and objective findings in each case. Students completed the second column with concise physiological explanations of each finding and provided a narrative about the mechanism of action of each drug mentioned as well as whether adverse effects were part of the presentation. The form culminated in students providing a differential diagnosis requiring both positive and negative findings to support each potential diagnosis.
Before submission, students were provided with the rubric faculty used for grading (Table 2). The rubric was arbitrarily scored out of 30 points. The distribution of points per section varied slightly by case, according to what the relative emphasis on pathophysiology or pharmacology was in the particular case. Each student's individual SOAP note was double marked by faculty members in a blinded fashion. SOAP notes were deployed to the students on a Friday and had to be submitted over the weekend. Students were directed to work alone and reminded that any plagiarism would be regarded as a professionalism violation. During the week after submission, the faculty members provided a 15- to 30-min debrief in the classroom where they discussed the case to provide feedback on the expected answers. Each note was worth 5% of the final grade (the remaining grade was made up of 3% from TBL exercises and 82% from a final written exam that consisted of 140 MCQ questions).
Assessment of test reliability and interrater reliability.
Reliability is the degree to which an assessment tool produces stable and consistent results (i.e., would a retest produce the same answer?). The most commonly used statistic for reliability is the Cronbach α (10). In this study, we used an extension of this approach known as generalizability theory (6). Generalizability theory allows for the calculation of an overall reliability coefficient measured on a scale of −1 to +1. The gold-standard reliability (e.g., national licensing exams) exceeds a value of 0.9; acceptable reliability for classroom tests is often regarded as >0.7 (13). Generalizability theory allows for sources of variance to be explored further in what is called the followup D-study. This method was used to explore the effects of increasing the number of cases on test reliability. Generalizability theory may also be applied to assess interrater reliability when there is a crossed design such as a group of faculty raters who each provide a score for every student. The faculty members in this course decided to distribute the grading workload among 8 faculty members such that 6 unique rater pairs were used and each pair was assigned a set of 40 student SOAP writeups. This approach prevented the use of generalizability theory to estimate interrater effects. As an alternative, an intraclass correlation coefficient was reported for each rater pair to give an assessment of how different faculty members were able to consistently apply the grading rubric (21).
Assessment of validity.
Overall construct validity is the extent to which a test measures what it is purported to measure. There is no single measurement for validity; rather, a judgment requires several lines of validity evidence. Predictive validity describes the extent to which assessment scores can be used as a guide to future performance, and concurrent validity describes whether a test produces similar results to other tests students are taking at the same time. To explore these aspects, SOAP scores were correlated with scores from the final summative exam, which used multiple-choice items. Content validity is perhaps the most fundamental aspect of construct validity and refers to how well the measure represents all the intended facets. To assess content validity of SOAP notes, qualitative interviews were conducted with faculty members involved in the conception and evaluation of the completed SOAP notes and focus groups were conducted with students (see below).
Assessment of feasibility, cost effectiveness, acceptability, and educational impact.
The feasibility of the assessment tool questions whether or not SOAP notes were capable of being produced by the faculty members, completed by the students, and effectively graded. Cost effectiveness considers if the faculty member's time and effort allocated to the creation and grading of the SOAP notes are worth the benefits gained by the students. Cost effectiveness from the student perspective examines the ratio of educational benefits versus costs in time and effort. Acceptability assesses whether the students and faculty members approve or feel that the creation and completion of SOAP notes are an adequate form of assessment in relation to the effort required and are perceived to be fair. The educational impact investigates how SOAP notes play a role in the progression of medical student learning to gain knowledge and understanding of the material and to develop their clinical reasoning abilities.
These aspects are all subjective and require qualitative data to inform judgments about them. Given the small group of eight faculty authors/graders who created the SOAP notes, we decided to conduct one-on-one interviews. The student investigator (N. Cramer) contacted the faculty members by e-mail to set up an appointment. The faculty interview questions are shown in Table 3. The interview sessions were recorded and transcribed verbatim for data analysis (12).
It was expected that the SOAP note intervention would create some short-term discomfort for students, which was confirmed through informal feedback at the time. To better assess educational impact, we wanted to gauge student reflections later in their development after some experience of their clinical M3 year, ∼6 mo later. To accommodate the student's busy clinical schedules, we elected to run two focus groups (4) rather than conduct individual student interviews. While this introduced a methodological difference in gathering student and faculty perceptions, we faced the practical constraint of recruiting students who are distributed across many hospital sites and who only return to the medical school every 6 wk for a single day at the end of each clinical rotation. The student investigator (N. Cramer) recruited students to focus groups by making class announcements and sending an e-mail explaining the goals of the project, remuneration ($20), and focus group process. The focus group questions are shown in Table 3. Discussions were recorded and transcribed verbatim for data analysis.
Quantitative data analysis.
Student SOAP notes and final examination scores are reported as percentages and expressed as means ± SD. ANOVA was used to compare percent SOAP scores and the final examination score; Student's t-tests with a Bonferroni correction were used as the post hoc test for pairwise comparison between groups. As indicated above, SOAP test reliability was determined using generalizability theory, which is based on ANOVA. Interrater reliability for each faculty rater pairing was determined by calculating an intraclass correlation coefficient. Predictive validity was determined by calculating a Pearson product-moment correlation coefficient for the relationship between SOAP scores and final exam scores. Significance was determined at the 5% level. Statistical analyses were conducted using SPSS 21.0 (IBM, Chicago, IL) except for generalizability theory calculations, which used the program G-String (6).
Coding of qualitative data.
Qualitative analysis of interviews allows an indepth study to understand the views of participants. A coding analysis (32) was applied in which the transcripts were read and reread and the data were first disassembled by assigning codes. The primary coding of each transcript was performed by N. Cramer, who then assembled all the coded faculty interviews together to extract broader categories. J. D. Kibble also read the transcripts, checked samples of primary coding, and met with N. Cramer to reach a consensus that the categories shown in Table 4 represented the major themes that emerged from the texts.
Eighty second-year students were each assigned the same SOAP cases; the first case was on hypertension, the second on heart failure, and third was on chronic obstructive pulmonary disease. The SOAP scores were as follows: SOAP 1, 91.6 ± 8.2% (n = 80 students); SOAP 2, 86.4 ± 7.1% (n = 79 students); and SOAP 3, 86.1 ± 7.1% (n = 72 students). Note that in SOAP 2 and SOAP 3 a few students did not hand in a SOAP note; these students were given permission by the course director to hand in an alternative SOAP assignment later in the course. The final exam score was 86.7 ± 7.6% (n = 80 students). ANOVA revealed significant differences between group scores [F(3,307) = 8.4, P < 0.05]. Followup pairwise comparisons indicated that the SOAP 1 score was significantly different from all other scores but there were no other significant differences between SOAP 2, SOAP 3, and the final exam. Thus, as a benchmark, the overall level of difficulty of the SOAP note exercises closely reflected the relative standards set by the final examination.
From the standpoint of using SOAP scores as a summative measure of student performance, the overall reliability coefficient of the pilot sample of three cases was 0.39. To project the number of cases necessary to produce a reliable measure, a followup D-study was applied. The D-study projected that 10 cases would be necessary to produce a reliability coefficient of 0.68 and that 20 cases would provide a reliability coefficient of 0.81. For comparison, reliability of around 0.80–0.85 is typical for a high-quality internally developed summative MCQ exam in our medical school (16).
Considering reliability from the perspective of whether different graders can consistently apply the grading rubric, an intraclass correlation coefficient was calculated for each of six unique rater pairs. There were only eight individual faculty members available for grading; the six unique pairs were assigned by the course coordinator as follows: faculty members 1 and 2, faculty members 3 and 4, faculty members 1 and 6, faculty members 2 and 8, faculty members 3 and 7, and faculty members 4 and 5. The intraclass correlation coefficients ranged from 0.29 to 0.74 and were significantly different from zero in every case. Using the 95% confidence interval for each intraclass correlation coefficient and applying Student unpaired t-tests, no significant differences were observed between intraclass correlation coefficients when comparing any of the rater pairs.
The predictive validity of the SOAP tool was examined by assessing the correlation between each student's mean SOAP score and their grade on the final exam. Note that 9 of 240 student SOAP scores in this analysis were derived from different cases since a few students did not submit a SOAP note for SOAP 2 and SOAP 3 but were allowed an option to submit alternate SOAP assignments later in the course. These exercises behaved similarly to SOAPs 1–3, and these nine data points were included in the analysis of predictive validity. There was a modest but significant correlation between mean SOAP score and final exam score (r = 0.41, n = 80, P < 0.05).
Six faculty members provided indepth individual interviews based on the questions shown in Table 3. A summary of themes emerging from a detailed reading of the interview transcripts is shown in Table 4. All six of the faculty members interviewed felt that the SOAP assessment cases involved high-yield topics for student learning and that the elements in the grading rubric were valid. Four of six faculty members acknowledged the importance of faculty coordination in establishing the cases and expected answers. Faculty members were enthusiastic and unanimous that the SOAP assessment rehearsed authentic knowledge and skills needed to solve clinical cases, and five of six faculty members noted that they felt the student narratives provided good insights into the level of student understanding. Half or more of the faculty members pointed to the rehearsal of clinical reasoning skills and need for students to understand pathophysiology. While these areas related to educational impact and achievement of the faculty's original goal in creating SOAP note assessment cases, faculty members were equally critical of aspects related to feasibility and acceptability. In particular, all faculty members commented on the excessive amount of time required for initial creation of cases and for the subsequent grading of the SOAP notes.
Student focus groups.
A total of 17 students contributed to 2 focus groups. The script shown in Table 3 was used to structure the conversation with participants who were encouraged to amplify and explain their responses. A summary of the findings is shown in Table 4 alongside findings from the faculty interviews. The majority of students agreed that the SOAP cases had focused on high-yield topics, confirming the content validity of the exercises. Most students agreed in hindsight that SOAP notes were a valuable exercise to develop differential diagnosis and clinical reasoning. However, students were critical of the rubric on the basis that faculty members did not provide enough information about the expected depth of answers and, as a result, had invested “entire weekends” constructing answers. Given the high-stakes nature of the SOAP exercises (5% course grade per note), many students could not budget time for other studies and judged the cost:benefit ratio to be unacceptable.
The interviews and focus groups both provided comments about how to improve the prototype SOAP note exercise to preserve positive learning effects while lowering the amount of time commitment needed. These are discussed below.
Before discussing study findings it is appropriate to acknowledge the limitations of our study design. This study was a pilot study undertaken in a single institution. Sample sizes were limited by the course context and could include only 8 faculty members deploying 3 SOAP cases to a cohort of 80 students. Another limitation of our study design was the use of different qualitative data collection methods for faculty members (interviews) and students (focus groups). There is a risk that a single dominant student influences a whole group, and attribution of group consensus is dependent on the interpretation of the researchers. To mitigate these limitations, all recordings were checked by two researchers, and the findings agreed. The reader will appreciate a balanced presentation of positive and negative views from participants.
The idea of modified SOAP notes was originally described by Kibble et al. (19) as an organizing tool to help students prepare for a small-group case discussion. The modified SOAP note was found to enhance the quality of in-class discussion and was also used as a basis to discuss aspects of professionalism such as being well prepared as a team member. In this previous study, a rudimentary rubric was used to score student SOAP notes on a 6-point scale. SOAP scores were found to correlate with the final MCQ exam, suggesting the potential for SOAP notes to be developed as a novel method of assessment. The goal of the faculty members in the present study was to explore how to better assess student performance in case-based learning activities using the modified SOAP note. Case-based sessions are included in the curriculum to provide a more authentic form of contextual learning for medical students. The purpose is to challenge students at a higher cognitive level and to require application, evaluation, and synthesis of information to arrive at a solution to a real clinical problem. In the long run, our hope is to promote better diagnostic decision-making skills and thereby improve patient care. It is worth noting that use of authentic cases allows for transdisciplinary integration of topics and could also allow the incorporation of other competencies such as patient safety, ethics, cultural competency, and others. Recognizing that student learning is often driven by assessment (8), the faculty members wanted an assessment tool not only capable of assessing these higher order attributes but also an assessment tool that made explicit our high expectations of learning during case-based sessions.
The consideration of validity and educational impact in the utility analysis indicated that the faculty members were largely successful in achieving their educational goals. It was clear from the data that both faculty members and students felt the substance of the SOAP cases was “high yield.” The activity was also associated with helping students to develop clinical reasoning skills, application of pathophysiology, and formulation of differential diagnoses. It is generally accepted that good alignment between course goals and the level of assessment leads to better student outcomes (11, 25). The SOAP tool is a good example where the rubric makes explicit the high cognitive levels expected (students must explain, differentiate, give rationale, etc) and they are accountable for performing at this level through the linkage to summative assessment.
The utility analysis provided insights into other strengths and weaknesses of the SOAP tool. Considering the question of whether SOAP notes could be a major tool to determine summative grades, the first requirement is high reliability. Without good test-retest reliability, grade measurements are not defensible because the signal-to-noise ratio in the measurement is too low to be confident about a student's true score. With a pilot sample of only three assessments it was to be expected that reliability would be low (0.39), since the number of test items is a major determinant of reliability (13). By undertaking a generalizability study, we were able to project that 10–20 SOAP cases would provide reasonable reliability for a high stakes assessment. It is conceivable that 10–20 SOAP cases could be presented to students over the course of a whole academic year, perhaps gathered in a learning portfolio, with displacement of other assessments to provide students more time. Our data indicate that decisions based on such a score would be reliable. However, for reasons discussed below, this seems an unlikely scenario given concerns for the time commitment needed from faculty members and students.
The newly developed rubric (Table 2) displayed excellent interrater reliability with all rater pairs exhibiting significant correlation. The faculty raters included clinicians and basic scientists with varying levels of teaching experience, so it would seem that the rubric is quite robust as a scoring tool. As shown in Table 2, the scores for meet/exceed expectation were clustered at the top of the score range. This was a constraint of our ABCF grading system in which 70–79% is a grade C, 80–89% is a grade B, and 90–100% is a grade A, although it could also be argued this is appropriate for a highly selected student population who almost always perform in the good to excellent range. The scores produced by the rubric were found to closely match standards set by the final examination. One disadvantage of clustered scores is that the rubric does not discriminate well between shades of excellence; clustered scores also predispose to lower correlations with other metrics. Despite this caveat, individual student SOAP scores were found to correlate moderately but significantly (r = 0.41, n = 80) with the final examination. This level of predictive validity is comparable with other validity studies, such as how well the Medical College Admissions Test predicts medical school performance (7).
Students were more critical of the rubric. On one hand, they recognized the authenticity and high-yield nature of the task but despite in-class orientation, they were frustrated by the lack of guidance on the depth of answers that would translate into maximum points. Faculty instruction to limit the whole exercise to two pages was almost universally ignored as students produced some remarkably detailed accounts. The scoring scheme appeared to have the effect of driving students to do more work than faculty members intended, which did not add value in most cases but led to excessive time being spent on the exercise. Certainly the students indicated that better orientation to the expectations was needed. The rubric could perhaps be altered to make it slightly easier to achieve a maximum score or the grade per SOAP note could be reduced to attempt and alleviate the tendency of students to overwork.
Students were also frustrated about the quality of feedback. Our approach was to provide limited comments on the student individual submissions but to spend time going over the case in a lecture to highlight all the important points. Faculty members were reluctant to circulate model written answers for a summative assessments that may be reused later, and time prevented them in most cases from giving detailed feedback narratives on individual student submissions. While it is universally accepted that feedback is an essential part of the learning process (3, 5), it is less clear what type of feedback will lead to increased performance (26). What seems clear in our case is that students did not value the feedback provided and felt a significant degree of discomfort. As faculty members, we need to tread a fine line between presenting the desirable difficulty of high levels of challenge and accountability as a source of external motivation with the need to provide adequate support; there is a need to be responsive to student concerns without being overly reactive. The primary problem here seems to stem from the high-stakes nature of the assessment and student fixation on not losing any valuable course points. Interestingly, the outcomes showed that students performed just as well on the SOAP assessment as other assessments in the course, and it may be that a broader expectation of “soft points” from continuous assessment in other courses contributed to the anxiety provoked by SOAP notes.
The greatest concern shared by faculty members and students was the amount of time both parties invested in the SOAP notes, leading to a poor rating for acceptance in the utility analysis. From the student side, we have already discussed that they were significantly overworking in relation to faculty expectations, so this problem may be fixed with better orientation and feedback as well as some adjustments to the rubric. For the faculty members, our context is that of a new medical school with minimal faculty members, many of whom are busy clinicians coming to teach a focused part of the curriculum. In other learning environments, it is probably more normal for faculty members to spend time grading papers or to have teaching assistants available to help with grading. It should be noted that any attempt to assess students at these higher levels is likely to more time intensive, demand deeper reflection by faculty graders, require time spent in direct observation of students, and more detailed feedback. Another option offering significant potential to help with SOAP notes in the near future is automated computer grading (27). Stakeholders made several suggestions for changes to increase the acceptability of SOAP notes. For example, students and faculty members agreed that the exercise could be done within a scheduled class session and that SOAP would lend itself to a collaborative group assessment. This is an attractive idea in that it would leverage social aspects of learning and may have the effect of allowing stronger students to improve the lower performing students in a group (22). In this setting, verbal faculty feedback about the case could be deployed at the end of the session and may be better appreciated for its immediacy.
Conclusions and recommendations.
Application of a formal utility analysis was found to be a powerful means to properly evaluate our novel assessment instrument. The SOAP note tool had a mixture of strengths and weaknesses that give it a moderate overall utility in its first iteration. The SOAP tool scores well with regard to the educational impact of making explicit to students the high levels of learning and integration expected and, at the same time, provided faculty members with insights that learners were attaining these goals. The tool showed high levels of construct validity and reasonable predictive and concurrent validity. Test reliability was also reasonable, and generalizability analysis showed the SOAP tool could be used more expansively in a curriculum to give defensible measurement of student performance for high-stakes decisions. However, the feasibility, cost effectiveness, and acceptance ratings were low, primarily due to the time investment needed on all sides. At present, we recommend that the SOAP tool be incorporated as a low-stakes assessment instrument as a means to promote higher-order learning but that adopters focus on making expectations clear and on providing the best feedback local resources will allow.
No conflicts of interest, financial or otherwise, are declared by the author(s).
N.C., A.A., L.G., B.G., D.M.H., M.H., and J.D.K. conception and design of research; N.C. performed experiments; N.C. and J.D.K. analyzed data; N.C., A.A., L.G., B.G., D.M.H., T.H., M.H., S.A.S., and J.D.K. interpreted results of experiments; N.C. and J.D.K. prepared figures; N.C. and J.D.K. drafted manuscript; N.C., D.M.H., and J.D.K. edited and revised manuscript; N.C., A.A., L.G., B.G., D.M.H., T.H., M.H., S.A.S., and J.D.K. approved final version of manuscript.
The authors thank Philip Bellew, academic coordinator, and Mary Beth Soborowicz, assistant director of assessment, for collating and deidentifying data.
- Copyright © 2016 The American Physiological Society