Adv Physiol Educ AdInstruments
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Advan. Physiol. Edu. 29: 83-93, 2005; doi:10.1152/advan.00039.2004
1043-4046/05 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Oh, D. M.
Right arrow Articles by Krilowicz, B. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Oh, D. M.
Right arrow Articles by Krilowicz, B. L.
ADV PHYSIOL EDUC 29:83-93, 2005
© 2005 American Physiological Society

HOW WE TEACH

Valid and reliable authentic assessment of culminating student performance in the biomedical sciences

Deborah M. Oh1, Joshua M. Kim2, Raymond E. Garcia3 and Beverly L. Krilowicz4

1 Educational Foundations and Interdivisional Studies, California State University, Los Angeles, California 2 Department of Mathematics, California State University, Los Angeles, California 3 Department of Chemistry and Biochemistry, California State University, Los Angeles, California 4 Department of Biological Sciences, California State University, Los Angeles, California

Address for reprint requests and other correspondence: B. L. Krilowicz; Dept. of Biological Sciences, California State Univ., 5151 State Univ. Dr., Los Angeles, CA 90032 (E-mail: bkrilow{at}calstatela.edu)


    Abstract
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 
There is increasing pressure, both from institutions central to the national scientific mission and from regional and national accrediting agencies, on natural sciences faculty to move beyond course examinations as measures of student performance and to instead develop and use reliable and valid authentic assessment measures for both individual courses and for degree-granting programs. We report here on a capstone course developed by two natural sciences departments, Biological Sciences and Chemistry/Biochemistry, which engages students in an important culminating experience, requiring synthesis of skills and knowledge developed throughout the program while providing the departments with important assessment information for use in program improvement. The student work products produced in the course, a written grant proposal, and an oral summary of the proposal, provide a rich source of data regarding student performance on an authentic assessment task. The validity and reliability of the instruments and the resulting student performance data were demonstrated by collaborative review by content experts and a variety of statistical measures of interrater reliability, including percentage agreement, intraclass correlations, and generalizability coefficients. The high interrater reliability reported when the assessment instruments were used for the first time by a group of external evaluators suggests that the assessment process and instruments reported here will be easily adopted by other natural science faculty.

Key words: interrater reliability; validity; higher education sciences; capstone course


    Introduction
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 
THERE IS INCREASING pressure, both from institutions central to the national scientific mission and from regional and national accrediting agencies, on natural sciences faculty to move beyond course examinations as measures of student performance and to instead develop and use reliable and valid authentic assessment measures for both individual courses and for degree-granting programs (12, 5, 15, 19, 23, 26). Although some progress has been made by biomedical sciences faculty in developing authentic assessment measures for individual undergraduate courses (6, 2122, 24, 2728), little progress has been made in developing such measures for undergraduate degree-granting programs (21). Most practicing biomedical scientists support their research initiatives with external grants from federal organizations such as the National Institutes of Health and the National Science Foundation. Consequently, senior undergraduate students should benefit significantly from exposure to this important authentic aspect of professional scientific practice before graduation and entry into the workforce, particularly when departments have development of scientific research skills as a major outcome for their students. Specific objectives that might be evaluated using such an approach include among others: 1) applying the scientific process, including designing experiments and testing of hypotheses; 2) using mathematics and statistics to evaluate scientific evidence; and 3) reading, understanding, and critically reviewing scientific papers and presentations.

Whereas no assessment of students’ work is completely reliable and without error, consistency across measures and evidence that instruments evaluate the student skills and knowledge for which they were designed, are two key issues in performance assessment (18). A high interrater reliability score, indicating the extent to which error is eliminated from the assessment process by variability among raters, demonstrates consistency across measures or reliability in assessment (18, 25). Content validity, the extent to which an instrument uses criteria central to the outcome being assessed, can be demonstrated by collaborative review, critique, and revision of instruments by content experts (3, 7, 14).

We report here on a capstone course developed by two natural sciences departments, Biological Sciences and Chemistry/Biochemistry, which engages students in an important culminating experience, requiring synthesis of skills and knowledge developed throughout the program, while providing the departments with important assessment information for use in program improvement. This course qualifies as a capstone experience because it requires advanced students to demonstrate comprehensive learning in their major (5). The student work products produced in the course, a written grant proposal and an oral summary of the proposal, provide a rich source of data regarding student performance on an authentic assessment task. The validity and reliability of the instruments and the resulting student performance data were demonstrated by collaborative review by content experts and a variety of statistical measures of interrater reliability.


    EXPERIMENTAL PROCEDURES
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 
Student Project Protocol

Assignment of students to teams. The course instructors assigned students to teams of 4–5 students. Factors such as senior vs. graduate standing, academic major, and grades in prerequisite courses were used to assign students to teams. Team assignments were finalized at the beginning of the second week of the quarter, once final class enrollment was established.

General project description. Students received a detailed description of the project requirements on the first day of the class. This material was supplied as a part of the course syllabus. Briefly, teams developed a National Institutes of Health-style grant proposal on a topic of current interest related to the disease state chosen that year for study in a course entitled "The Biochemical and Physiological Basis of Disease Process." Students engaged in a variety of activities and completed a series of assignments throughout the quarter that 1) aided in team development, 2) ensured timely completion of final course projects, and 3) helped students, by providing formative assessment, to produce and then refine their final projects (Tables 12). The formative assessment data produced by this model were used both to help award final student course grades and for summative assessment of the undergraduate programs. Final student work products presented at the end of the quarter were a 9–12 page written grant proposal and a 35-min oral presentation of the proposed research. Assessments of the final student work products were used both to help award final student grades in the course and to obtain data for summative assessment of the undergraduate programs.


View this table:
[in this window]
[in a new window]
 
Table 1. Preparation of the written grant proposal in the capstone course

 

View this table:
[in this window]
[in a new window]
 
Table 2. Preparation of the oral presentations in the capstone course

 
Preparation of the written grant proposal. Preparation of the written grant proposal (Table 1) began with a literature research phase. The literature research phase began during the first week of the quarter with an orientation to library research and completion of a companion library research exercise, due during week 3 and designed to ensure that all students understood how to use the online scientific bibliographic databases. During weeks 3–4, following an extensive literature review, teams identified the specific subtopic that would be the focus of their research proposal. Weeks 4–7 were used to produce an annotated bibliography for the research proposal based on 10 primary articles identified during the library research exercise. The process for production of the annotated bibliography consisted of 1) a minilecture delivered by the course instructors, including presentation of the specific grading criteria for the annotated bibliography, which provided students with a background to the specific elements required of an entry in an annotated bibliography; 2) student preparation of a single entry for the annotated bibliography; 3) formative assessment of the single annotation; 4) submission by teams of the 10 primary articles, based on the grading criteria provided by the course instructors, to be used in production of the course project; 5) instructor feedback in writing and in individual team meetings regarding the appropriateness of the submitted articles; 6) production by teams of the complete annotated bibliography; and 7) instructor feedback in writing and in individual team meetings regarding the annotated bibliography.

Production of the written grant proposal continued during weeks 6–11 with a process of drafting and revising the sections of a grant proposal. This process began with a minilecture delivered by the course instructors, including use of an actual proposal to demonstrate the specific grading criteria for production of the specific aims and background, and significance sections of the grant proposal. The minilecture was followed by an exercise in which students identified, using an actual instructor-generated proposal, how an author responded to the writing prompts required for production of this section of a grant proposal. Teams then drafted their specific aims and background and significance sections and received feedback both in writing and during individual team meetings about their drafts. The experimental design section of the proposal was then produced by the same series of steps. A second, revised draft of the complete proposal, including specific aims, background and significance section, experimental design, and literature cited, was due during week 10. Individual team meetings to provide feedback both in writing and orally were conducted at the end of week 10. The final written grant proposals, edited based on formative feedback, were submitted during week 11, final exam week.

There were few changes in the process for production by students of the written grant proposals between the first and second years of the course. The instructor assessment of student work products, the external evaluators’ report, and the interrater reliability data (see Establishment of interrater reliability) suggested that no changes in the process for student production of this work product were required. A slight modification in the time line, indicated in bold, was necessitated by modifications in the production of the oral presentations discussed below.

Preparation of the oral presentation. Preparation of the oral presentation (Table 2) began during the first year of the course with a survey of recent literature reviews of the disease under discussion that year. This phase began during the second week of the quarter with an exercise designed to introduce the teams to the efficient reading of review articles, followed later in the week by the presentation, according to predetermined assessment criteria, of a section of an instructor-chosen review article. The oral presentation preparation continued during week 3 with the presentation, according to preestablished assessment criteria, of a complete review article. The final phase of preparation of the oral presentation occurred during week 8 when a minilecture was delivered by the course instructors, including presentation of the specific grading criteria for the final oral presentation, which provided students with a background to the specific elements required in the final oral presentation. The final oral presentation occurred in week 11, during the final examination period for the course.

The process for production of the final oral presentations was modified in the second year of the course due to poor interrater reliability and suggestions made in the external evaluators’ report from the previous year (see Establishment of interrater reliability). In 2000, we added, indicated in bold in Tables 1 and 2, several meetings with the teams to provide oral formative feedback on the review article presentations and, during week 9, a practice "final" oral presentation, followed by individual team meetings. These latter team meetings were used to provide formative feedback, both in writing and orally, regarding student performance on the practice final oral student work products.

Protocol for faculty assessment of student performance. Student achievement on each assignment was evaluated based on direct assessment of student performance according to detailed quantitative grading criteria. Although the course instructors developed quantitative grading criteria for every course assignment, only those used for assessment of the final written grant proposal and oral presentation are included in this paper (see Appendixes A and B). As indicated above, all grading criteria were discussed with and distributed to the students before production by the teams of the required work product.

Assessment of student performance by course instructors was a two-step process. First, both instructors individually assessed all student work products according to the predetermined assessment criteria. These data were used for the interrater reliability measures discussed below. Second, the course instructors then engaged in a dialogue about our ratings and the rationale for those ratings. This dialogue served as an important "norming process" that allowed us to establish internal validity (7) for our assessment process and resulted in the final scores that teams received on their work products (Tables 3 and 4). These final negotiated scores were used to award final course grades for each student.


View this table:
[in this window]
[in a new window]
 
Table 3. Average performance scores ± SD (based on percentage of points) over two years on the written grant proposal by eight student teams

 

View this table:
[in this window]
[in a new window]
 
Table 4. Average performance scores ± SD (based on percentage of possible points) on oral presentations by student teams

 
Development and Establishment of Validity of Assessment Instruments

Development and establishment of content validity of assessment instruments. The relative prevalence of selected assessment criteria on other instruments designed to measure our desired target outcomes were not available to us. Consequently, we used a previous study that identified assessment criteria central to our target outcomes as a starting point for the development of our quantitative grading criteria (7). The modified grading criteria were then sent for collaborative review to a panel of three science education/assessment experts. The final revised versions of our quantitative grading criteria were then distributed to the student teams for use in production of their course work products and used by the course instructors for assessment purposes.

Establishment of interrater reliability. The repeatability or consistency of our assessment process was established in two ways. First, course instructor interrater reliability was established based on the total percent points awarded in common between the two instructors for each of the five sections of the written grant proposals (specific aims, background and significance section, research design and methods, literature cited, and general criteria) or the four sections of the oral presentations (introduction, background and significance section, research design, and general criteria).

Second, to determine whether the course instructors’ evaluation of student performance was consistent with that of external experts, we convened during 1999 a panel of three experts who provided content validity expertise, as described above. They also used our assessment instruments and evaluated independently of the course instructors the final student work products. The panel consisted of an on-campus chemistry/pedagogy expert, an off-campus biology/pedagogy expert, and an on-campus assessment expert.

Interrater reliability among the external evaluators and the course instructors were calculated in the following manner. First, the mean of the five evaluations for each section of the written grant proposals (specific aims, background and significance section, research design and methods, literature cited, and general criteria) and for each section of the final oral presentations (introduction, background and significance section, research design, and general criteria) was calculated. Interrater reliability was then calculated as the mean percent points awarded in common among the five evaluators.

Statistical Analysis

Internal consistency was measured for different sections of the written and oral requirements (9). The two faculty raters were then compared in their average percentage agreement for first, second, and final drafts of the written proposal and for the final oral presentation. Percentage agreement is different from reliability estimates conceptually (10). It measures the similarity in ratings between raters. In the case of the present study, the percents do not produce overestimates due to the high number of categories (17). An intraclass correlation, confidence interval, and standard error of measurement (SEM) were used to assess the interrater reliability of the instructors’ ratings (12). These measures were calculated using one-way ANOVA that also determined the significance of the difference between raters in each section (16). To account for the error associated with assessments performed on different occasions, a generalizability coefficient was used to determine the reliability between course instructors through time (8, 10, 13). ANOVA was used to determine statistical difference in oral and written assignment total mean scores from the five raters and differences between the instructors’ ratings and external evaluators’ ratings (16). In all cases, results were considered statistically significant at P < 0.05.


    RESULTS
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 
Internal Consistency

The internal consistency of sections of the instrument used to assess the written grant proposal of the eight research teams (4 in 1999 and 4 in 2000) was measured using the Cronbach’s coefficient alpha (11). Coefficients ranged from 0.58 to 0.72 (specific aims, 0.65; background and significance section, 0.72; research design and methods, 0.58). A coefficient was not calculated for the literature-cited section because these items were format requirements met by all research teams by the second assignment and were not required in the first assignment. Similarly, a coefficient was not calculated for general items because these were mechanical, not content requirements, met by all research teams within the required time line. Coefficients were calculated in the same way for final oral presentations. Coefficients ranged from 0.50 to 0.65 (introduction, 0.50; background to research, 0.62; research design, 0.65). A coefficient was not calculated for the conclusions because all requirements were completed by all teams without variation. Similarly, a coefficient was not calculated for general items because these were mechanical, not content, requirements of the assignment. It should be noted that drafts of the written proposals were used for the internal consistency analysis because the course instructors had not yet had their "norming sessions," and so the calculated values should differ the most for those sets of ratings, yielding a very conservative estimate for this measure. In contrast, the final oral presentation was used for the internal consistency analysis because in 1999 there was no practice final oral presentation. Although some ratings were below 0.70, the total scores were calculated for each section and used for further analysis. This methodological decision was based on prior establishment of content validity for the instruments. For example, as previously described, the course instructors and the external evaluators examined all items and determined whether or not they expressed legitimate aspects of performance. In addition, the instruments were subsequently pilot-tested. Consequently, the instructors were confident that the performance criteria identified in each section of both instruments were valid. Therefore, sections of the work products rather than individual criteria were used to measure interrater reliability between the two instructor raters.

Course Instructor Interrater Reliability for Written Grant Proposals

Before the final written grant proposal, students were assessed on two preliminary drafts. Instructors assessed these drafts, and feedback was given to students in tutorials to assist in preparing them for the final version of the written grant proposal. The agreement between instructor raters was examined for consistency in assessment outcomes for two consecutive years. Table 5 gives percentage agreement in 1999, 2000, and combined (averaged) between instructor ratings for the two drafts and the final version of the written proposals. Percentage agreements were high, ranging from 88.0 to 100.


View this table:
[in this window]
[in a new window]
 
Table 5. Percentage agreement through time between course instructors on first, second, and final versions of the written grant proposals in 1999, 2000, and combined

 
Means among raters were compared in each section of the final version of the written proposal for the eight teams, and internal consistency between raters was measured by intraclass correlations (Table 6). There were no statistically significant mean differences among raters (P < 0.05). Intraclass correlation measures internal consistency between items by taking raters as items. The correlation is the ratio between true score variance and observed score variance (12). Therefore, the correlations can be interpreted as a measure of interrater reliability and are accurate when equal variances can be assumed, as in the present study. The table shows high intraclass correlations and low SEMs, which shows high interrater reliability between the raters.


View this table:
[in this window]
[in a new window]
 
Table 6. Mean ratings, intraclass correlation, 95% confidence interval (CI) and standard error of measurement (SEM) by the two course instructors for eight teams on the final written grant proposals

 
Using ANOVA, generalizability coefficients were obtained for ratings of each section of the written grant proposal to measure interrater reliability for assessment of student performance across time. With specific scoring guidelines, sufficient training for the raters, rating under similar conditions, as in the case of the present study, the generalizability coefficient serves as an accurate index of interrater agreement (12). Generalizability coefficient also accounts for measurement errors due to 24 different occasions of rating across the two-year period. Table 7 shows high percentage agreements between raters and high generalizability coefficients, ranging from 0.973 to 0.997. Percentage agreement between raters of 100% agreement would mean a high rate of interrater agreement, and 0% would mean low.


View this table:
[in this window]
[in a new window]
 
Table 7. Average interrater reliability for items in the written grant proposal

 
Course Instructor Interrater Reliability for Oral Presentations

Oral presentations were based on the students’ written grant proposals. The presentations were graded based on introduction, background and significance section, research design, conclusions, and general sections. The introduction, background and significance section, research design, and general sections of the oral presentation were tested for internal consistency using the final presentation scores of all eight research teams and combined for both years. Only final oral presentation data were used for this analysis because no comparable practice oral presentation was required in 1999. Table 8 shows no significant mean difference between raters on any of the oral presentation sections. Table 8 also shows high intraclass correlations and low SEMs, which shows high interrater reliability between the raters.


View this table:
[in this window]
[in a new window]
 
Table 8. Mean ratings by the two course instructors, intraclass correlation, 95% CI, and SEM for all eight teams on the final oral presentations

 
Generalizability coefficients indicated better interrater reliability than did the intraclass correlations. This analysis used data from the final oral presentations from both years and the practice final oral presentation added to the course in 2000. The generalizability coefficients and percentage agreements on final presentations and combined found in Table 9 suggest consistency between raters. However, the 2000 practice oral presentation shows lower percentage agreements, ranging from 67.1 to 98.0% with the total average agreement of 91.7%. Percentage agreement improved for the final oral presentation with a total average agreement of 98.3%.


View this table:
[in this window]
[in a new window]
 
Table 9. Average interrater reliability between the two course instructors on oral presentations

 
In the year 2000, the students were given a chance to practice their final oral presentation. Consequently, the oral presentation section and total scores were compared between 1999 and 2000. Table 10 demonstrates that although the final scores generally improved in 2000, there was no significant effect of the added practice oral presentation on student performance. However, the standard deviations are much lower for most sections, indicating greater consistency in the scores. In total scores, there was a 7.06-point increase, almost two-thirds of a standard deviation, but still not statistically significant.


View this table:
[in this window]
[in a new window]
 
Table 10. Comparisons of oral presentation section means ± SD between 1999 and 2000

 
Course Instructor and External Expert Interrater Reliability

Tables 11 and 12 show average percentage agreement among five raters (two instructor raters) and three external raters in 1999 for the final written proposals and oral presentations. Ratings for all four teams for all sections of the written proposals were high in agreement (range: 89.3–100%). Intraclass correlation was 0.998 and the generalizability coefficient was 0.997, suggesting high interrater reliability. There appear to be more, but not unacceptable, discrepancies in the evaluation of the oral presentations. The percentage agreement was lower than for the written proposals, ranging from 86.5 to 97.4%, as were intraclass (0.996) and generalizability (0.965) coefficients.


View this table:
[in this window]
[in a new window]
 
Table 11. Average percentage agreement on the final written proposals among five raters (2 course instructors and 3 external raters) for four research teams in 1999

 

View this table:
[in this window]
[in a new window]
 
Table 12. Average percentage agreement among five raters (2 course instructors and 3 external raters) on final oral presentations for four research teams in 1999

 
Table 13 shows the total scores given to all four teams by each of five raters for both oral presentations and written proposals. The results of the ANOVAs performed on data for oral presentations and written proposals show no significant mean difference between the scores awarded by the course instructors (written proposal, P = 0.789; oral presentation, P = 0.323) among the three external raters (written proposal, P = 0.155; oral presentation, P = 0.528) or among all five raters (written proposal, P = 0.171; oral presentation, P = 0.090).


View this table:
[in this window]
[in a new window]
 
Table 13. Comparison of means for total oral presentation and written proposal scores by five raters

 

    DISCUSSION
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 
To our knowledge, the authentic culminating assessment process and validated rubrics described in our study are unique in higher education natural science. Although rubrics and processes for performance-based assessment of individual assignments or courses in the natural sciences have been described previously (2, 56, 2122, 24, 2728), within the higher education community, few attempts have been made to address summative assessment of natural science programs (21). Finally, although the previous studies mentioned above reported potential natural science assessment strategies, validation of the associated instruments was not addressed.

The high interrater reliability between course instructors reported for our instruments, estimated by a variety of statistical measures including percentage agreement between raters, mean differences between raters on absolute scores awarded, intraclass correlations, and generalizability coefficients (Ref. 12; Tables 59), suggests that these rubrics produce both reliable and valid assessments of the performance criteria for which they were designed. The establishment of content validity by review, critique, and revision of instruments by content experts before their use (3, 7, 14) and the norming process engaged in by the instructors throughout the course used to establish internal validity for the assessment process (7), likely contributed to this high interrater reliability.

The course instructor interrater reliability for the oral presentations may be lower than for the written proposals (Tables 69). Intraclass correlations were routinely lower for assessment of the oral presentations than for the written proposals (Tables 6 and 8), suggesting less agreement between course instructor raters using the oral presentation rubric. However, there were no significant mean differences in the scores awarded by the course instructors for any sections of either the oral or written projects (Tables 6 and 8), variances measured as SEM overlap (written proposal 0.366–4.92 vs. oral presentation 0.316–4.43; Tables 6 and 8), and generalizability coefficients were similar (Tables 7 and 9), suggesting no differences in the reliabilities of the two instruments. A possible explanation for the differences in intraclass correlations is the shorter time that was allotted for the evaluation of the oral presentation (30 min) relative to the written proposal (theoretically an unlimited amount of time). The time pressure associated with evaluating oral presentations might have led to greater variability in instructors’ assessments estimated as intraclass correlations. In addition, there were fewer norming sessions associated with use of the oral presentation rubric due to the course design in year 1. These sessions allowed the course instructors to reach agreement regarding performance criteria so fewer sessions would have presumably led to greater discrepancy in ratings. This explanation is supported by the greater variances associated with scores awarded on oral presentations in 1999 vs. 2000, when additional oral presentation assignments and thus instructor norming sessions were added to the course (Table 10).

The high interrater reliabilities reported between three external raters using the assessment instruments for the first time and the two course instructors suggests that the evaluation process and rubrics reported in this study should be easily adopted by others and yield valid and reliable student performance data. By a variety of statistical estimates including percentage agreement, intraclass correlations and generalizability coefficients (Tables 11 12), the interrater reliability among the five raters is extremely high. In addition, there were no statistically significant differences in the total mean scores awarded by the five raters (Table 13). This result might be due in part to the involvement of the external raters in the process used to establish the content validity of our instruments.


    APPENDIX A: GRADING CRITERIA FOR FINAL VERSION OF GRANT PROPOSAL (250 PT. TOTAL)
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 
Specific Aims

  1. The broad, long-term objective(s) of the project is/are identified.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 8 pt. 9 pt.


  2. The specific research to be carried out and what it is intended to accomplish are clearly indicated.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 8 pt. 9 pt.


  3. The hypothesis/hypotheses to be tested is/are clearly indicated.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 8 pt. 9 pt.


  4. The section is no more than one page in length.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 2 pt. 3 pt. 4 pt.
    Section Total (out of 31 pt.)___________


Background and Significance

  1. The background to the proposal is summarized using the annotations generated earlier in the quarter.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


  2. There is a synthesis of information from the different papers instead of 10 separate article summaries, i.e., results that can be used to address the same points are grouped together.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


  3. References are indicated by number in the text in parentheses.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.


  4. Author critically evaluates existing knowledge. The evaluation takes the form of the topic sentence for each paragraph, the concluding sentence of the paragraph, or a separate paragraph that ends the subsection.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


  5. Author identifies gaps in current knowledge.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


  6. Author states the importance and health relevance of the research project.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  7. Author relates the specific aims of the proposal to the broad, long-term objectives of the proposal.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 2 pt. 3 pt. 4 pt.


  8. This section is at most three pages in length.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 2 pt. 3 pt. 4 pt.
    Section Total (out of 76 pt.)___________


Research Design and Methods

  1. Two to three experiments are included that address the specific aims of the grant proposal.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  2. Each experiment includes a detailed description of how data will be collected. The description includes such things as sample sizes, species of animal, or source of tissues and cell lines, equipment that you plan to use, concentrations/volumes/etc. for chemicals, details of surgical procedures, handling of animals, processing of tissues, etc., and the time frame for experiments. A flow chart, not counted in the page limit, is included for each experiment.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


  3. Each experiment includes a description of how data will be analyzed.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 2 pt. 3 pt. 4 1/2 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 2 pt. 3 pt. 4 1/2 pt.


  4. Each experiment includes a description of how data will be interpreted.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.


  5. If appropriate, advantages of new methodologies are contrasted with those of existing methodologies.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  6. Potential difficulties with proposed experimental designs are discussed.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.


  7. Alternate approaches to achieve the specific aims are discussed.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.


  8. A time line is included.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 4 pt. 5 pt. 6 pt.


  9. This section is at most six pages in length.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.
    Section Total (out of 76 pt.)___________


Literature Cited

  1. Includes at least 10 primary articles.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  2. Citations are in the following format. Complete author list; authors are listed as follows: last name, initials of first and middle names; commas separate authors names. Year the article was published is in parenthesis and follows the list of authors. The article title follows the year. The first word of the title begins with a capital letter and the title entry ends with a period. The name of the journal follows the title of the article, is capitalized, and is in italics. The volume number follows the name of the journal, is in bold, and is followed by a comma. The pages of the article, inclusive, follow the volume number.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 14 pt. 17 pt. 20 pt.


  3. List is alphabetized and numbered.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 1 pt. 2 pt. 3 pt.
    Section Total (out of 28 pt.)___________


General

  1. The proposal is typed single-spaced in 12-point font.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  2. The margins are 1 inch on all sides.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  3. Readability (wording, sentence and paragraph structure).
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 9 pt. 10 pt. 12 pt.


  4. Mechanics (usage, punctuation, spelling)
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 9 pt. 10 pt. 12 pt.


  5. Paraphrases instead of using direct quotes.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.
    Section Total (out of 39 pt.)___________

    Final Total (out of 250 pt.)_____________________

    Team__________________________



    APPENDIX B. GRADING CRITERIA FOR FINAL ORAL PRESENTATION (250 PT. TOTAL)
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 
Introduction, Significance, and Specific Aims of the Grant Proposal

  1. The speaker provides a brief general introduction to Alzheimer’s disease, which includes the importance and health relevance of the research project.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  2. The speaker presents the broad, long-term objective(s) of the project.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  3. The speaker briefly describes what the specific research proposed in the grant will accomplish and relates it to the broad, long-term objective of the project.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  4. The speaker ends this section with a clear overview of the presentation to follow.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  5. The section is an appropriate length (about 5 min).
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.
    Section Total (out of 25 pt.)___________


Background to the Proposed Research

  1. The speaker discusses relevant literature that provides a background to the research proposed in the grant.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 14 pt. 16 pt. 20 pt.


  2. The speaker critically evaluates and summarizes the background literature.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 14 pt. 16 pt. 20 pt.


  3. The speaker identifies gaps in current knowledge that the proposed research will fill by identifying the hypothesis/hypotheses to be tested by the proposed research.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 14 pt. 16 pt. 20 pt.


  4. The speaker uses appropriate techniques to reference sources (i.e., "Smith found....", or, "It has been shown by both the Smith and Garcia groups that....").
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 7 pt. 8 pt. 10 pt.


  5. This section is an appropriate length (about 10 min).
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.
    Section Total (out of 75 pt.)___________


Research Design and Methods

  1. The speaker describes 2–3 experiments that address the specific aims of the grant proposal.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  2. For each experiment, the speaker describes how data will be collected.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


  3. For each experiment, the speaker describes how data will be analyzed.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 2 pt. 3 pt. 4 1/2 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 2 pt. 3 pt. 4 1/2 pt.


  4. For each experiment, the speaker describes the predicted outcomes for the experiments and how results will be interpreted relative to the hypothesis being tested.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 7 pt. 9 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 7 pt. 9 pt.


  5. If appropriate, the speaker discusses the advantages of any new methodologies described in the proposal.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  6. The speaker discusses potential difficulties with proposed experimental designs and suggests alternate approaches to achieve the specific aims.
    Experiment 1
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 7 pt. 9 pt.


    Experiment 2
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 7 pt. 9 pt.


  7. A time line for the proposed research is presented and explained.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  8. The speaker ends with a brief summary of the proposed experiments and their relationship to the broad, long-term objectives of the proposal.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  9. This section is an appropriate length (about 15 min).
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.
    Section Total (out of 100 pt.)___________


General

  1. The speaker paraphrases instead of using direct quotes.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  2. There is a clear organizational structure to the presentation, i.e., introduction, significance and specific aims, background, and research design and methods concluding with a brief summary.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  3. The speakers have an effective and engaging presentation style.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 6 pt. 7 pt. 8 pt.


  4. There are good transitions between sections and between topics within a section.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.


  5. The speech adheres to established time limit (30 min for the presentation and 5 min for audience questions).
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 5 pt. 6 pt. 7 pt.


  6. Appropriate visual aids are used and are referred to during the presentation.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 10 pt. 12 pt. 15 pt.


  7. Each member of the group speaks for at least 5 min.
    Not Acceptable______ Acceptable______ Good______ Excellent______
    0 pt. 3 pt. 4 pt. 5 pt.
    Section Total (out of 50 pt.)___________

    Final Total (out of 250 pt.)_____________________

    Team__________________________



    Acknowledgments
 
We thank our students, without whom this research would have been impossible, and the California State University, Los Angeles, Innovative Instruction Award Program for financial support for the development of and reliability testing of assessment instruments. We also thank Dr. K. Goodhue-McWilliams and Dr. D. Paulson for service as external evaluators for this project.

Received for publication August 31, 2004. Accepted for publication January 15, 2005.


    REFERENCES
 TOP
 Abstract
 Introduction
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 APPENDIX A: GRADING CRITERIA...
 APPENDIX B. GRADING CRITERIA...
 REFERENCES
 

  1. Advisory Committee to the National Science Foundation Directorate for Education, and Human Resources. Shaping the Future: New Expectations for Undergraduate Education in Science, Mathematics, Engineering, and Technology. Washington, DC: National Science Foundation, 1996.
  2. Angelo TA and Cross KP. Classroom Assessment Techniques: A Handbook for College Teachers (2nd ed.). San Francisco, CA: Jossey-Bass, 1993.
  3. Baker EL. The Role of Domain Specifications in Improving the Technical Quality of Performance Assessment. Los Angeles, CA: University of California at Los Angeles, Center for Research on Evaluation, Standards, and Student Testing, 1992.
  4. Baker KH. Item validity by the analysis of variance: an outline of method. Psychol Rec 3: 242–248, 1939.
  5. Banta TW. Building a Scholarship of Assessment. San Francisco: Jossey-Bass, 2002.
  6. Barrow DA. The use of portfolios to assess student learning: a Florida college’s experiment in general chemistry. J Coll Sci Teach 22: 148–153, 1992–1993.
  7. Berg BL. Qualitative Research Methods for the Social Sciences (4th edition). Needham Heights, MA: Allyn and Bacon, 2001.
  8. Brennan RL and Johnson EG. Generalizability of performance assessments. Educ Meas Issues Practice 14: 9–12, 1995.
  9. Carmines EG and Zeller RA. Reliability and Validity Assessment. Thousand Oaks, CA: Sage, 1987.
  10. Crocker L and Algina J. Introduction to Classical and Modern Test Theory. New York: Holt, Rinehart, and Winston, 1986.
  11. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 16: 297–334, 1951.[CrossRef][Web of Science]
  12. Cronbach LJ, Gleser GC, and Rajaratnam N. Theory of generalizability: a liberalization of reliability theory. Br J Stat Psychol 16: 137–163, 1963.
  13. Cronbach LJ, Gleser GC, Nanda H, and Rajaratnam N. The Dependability of Behavioral Measurements. New York: John Wiley, 1972.
  14. Cronbach LJ. Test validation. In: Educational Measurement, edited by Thorndike RL. Washington, DC: American Council on Education, 1971.
  15. Doherty AT, Riordan T, and Roth J. Student Learning: A Central Focus for Institutions of Higher Education. Milwaukee, WI: Alverno College Institute, 2002.
  16. Fisher RA. The Design of Experiments. London: Oliver and Boyde, 1935.
  17. Frick T and Semmel MI. Observer agreement and reliabilities of classroom observational measures. Rev Educ Res 48: 157–184, 1978.[CrossRef]
  18. Goldberg S. Thinking Methodologically. New York: HarperCollins, 1992.
  19. Halpern DF and Hakel MD. Learning that lasts a lifetime: teaching for long-term retention and transfer. New Dir Teach Learn 89: 3–7, 2002.
  20. Jackson RWB. Reliability of mental tests. Br J Psychol 29: 267–287, 1939.
  21. Krilowicz BL and Downs T. Use of course-embedded projects for program assessment. Adv Physiol Educ 276: S39–S54, 1999.[Abstract/Free Full Text]
  22. Manyon AT, Feeley TH, Panzarella KJ, and Servoss TJ. Development of an assessment tool measuring medical students’ integration of scientific knowledge and clinical communications skills. Assessment Update 15: 1–2, 14–15, 2003.
  23. McIntosh WJ. Assessment in higher education: establishing continuous feedback between students and instructors. J Col Sci Teach 26: 52–53, 1996.
  24. Mehler AH. Integration of examinations and education. Biochem Educ 20: 10–14, 1992.
  25. Stanley JC. Reliability. In: Educational Measurement, edited by Thorndike RL. Washington, DC: American Council on Education, 1971.
  26. Western Association of Schools, and Colleges. WASC 2001 Handbook of Accreditation [Online]. Accrediting Commission for Senior Colleges and Universities. http://www.wascweb.org/senior/handbook.pdf [5 January 2005].
  27. Walvoord BE and Anderson VJ. Effective Grading. San Francisco, CA: Jossey-Bass, 1998.
  28. Wright JC. Authentic learning environment in analytical chemistry using cooperative methods and open-ended laboratories in large lecture courses. J Chem Educ 73: 827–832, 1996.




This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Oh, D. M.
Right arrow Articles by Krilowicz, B. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Oh, D. M.
Right arrow Articles by Krilowicz, B. L.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online
Copyright © 2005 by the American Physiological Society.