Optimizing the usability of e-learning materials is necessary to maximize their potential educational impact, but this is often neglected when time and other resources are limited, leading to the release of materials that cannot deliver the desired learning outcomes. As clinician-teachers in a resource-constrained environment, we investigated whether heuristic evaluation of our multimedia e-learning resource by a panel of experts would be an effective and efficient alternative to testing with end users. We engaged six inspectors, whose expertise included usability, e-learning, instructional design, medical informatics, and the content area of nephrology. They applied a set of commonly used heuristics to identify usability problems, assigning severity scores to each problem. The identification of serious problems was compared with problems previously found by user testing. The panel completed their evaluations within 1 wk and identified a total of 22 distinct usability problems, 11 of which were considered serious. The problems violated the heuristics of visibility of system status, user control and freedom, match with the real world, intuitive visual layout, consistency and conformity to standards, aesthetic and minimalist design, error prevention and tolerance, and help and documentation. Compared with user testing, heuristic evaluation found most, but not all, of the serious problems. Combining heuristic evaluation and user testing, with each involving a small number of participants, may be an effective and efficient way of improving the usability of e-learning materials. Heuristic evaluation should ideally be used first to identify the most obvious problems and, once these are fixed, should be followed by testing with typical end users.
- iterative design
- user-centered design
- interface design
the development of engaging e-learning materials for students and professionals in the health sciences is often resource intensive. It therefore becomes critical to evaluate and optimize these materials to maximize their educational impact. The usability of user interfaces is an important element that needs to be considered when designing e-learning resources. This is an underappreciated factor that, if ignored, may have a major impact on learning (30, 34). Usability describes the ease with which a technology interface can be used and has been defined as the “Extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” (1). A poorly designed user interface imposes an additional, extraneous, cognitive load and impedes learning as users struggle with the interface as well as with the challenges of the content presented.
More recently, this traditional view of usability is being extended and affective dimensions such as aesthetics, fun, and flow are receiving increased attention as designers seek to enhance user motivation and ensure pleasurable user experiences (11, 16, 34). For example, a study by Miller (19) reported that students working in an online environment with enhanced aesthetic design had reduced cognitive load, increased motivation, and increased performance compared with those working with a low-aesthetic interface. It also seems that users' perception of the aesthetics of an interface may be negatively affected by poor usability (32).
Design approaches that routinely include usability evaluation are well established in the software development industry (3, 10, 13, 18, 21, 24, 31), but this is seldom the case in the development of e-learning resources, especially in the area of medical education (30). The aim of usability evaluation is to improve a system or application by identifying usability problems and then prioritizing fixing them based on their impact. A usability problem can be defined as any aspect of a design that, if changed, could result in an improvement in usability. There may be several iterations of design, testing, and redesign before an application is released.
Usability can be evaluated by empirical user testing, where typical end users are observed using an application in laboratory or field settings. Think-aloud protocols, largely based on the work of Ericsson and Simon (9), are often used. Users are encouraged to speak their thoughts aloud while working with the application or immediately afterward (8). This increases the number of problems identified compared with simply observing users. Formal modeling is an approach that can be applied early in the development cycle and aims to determine and model the knowledge a user needs and/or the actions a user should perform to accomplish specified tasks. By considering users' mental models, designers attempt to predict and therefore prevent potential usability problems (6).
Another approach is to use usability inspection by experts. This approach, which is the focus of this report, relies on the considered judgment of expert inspectors and includes methods such as heuristic evaluation, cognitive walkthroughs, guideline review, and consistency inspection (26). Heuristic evaluation and cognitive walkthroughs are two commonly used methods. Heuristic evaluation involves experts evaluating an interface against a set of generally accepted principles for good design (the heuristics) (25), whereas cognitive walkthrough is based on a theory of learning by exploration and involves a group of inspectors walking through the interface and doing a step-by-step analysis of a hypothetical user's potential actions and mental processes while performing particular tasks (17, 33).
Each approach has its own cost and time requirements and examines a particular aspect of usability. With user testing, end users may be expensive or difficult to recruit, and the recording and analysis of testing sessions may be expensive and time consuming. Cost and time pressures are common in many environments and may lead to the evaluation of new resources being neglected. Inspection methods offer appealing options in such resource-constrained situations, since skilled experts could evaluate the application quickly, without the need to involve end users.
It should be noted that the average problem detection rate of individual inspectors is generally low (22), and, therefore, using small groups of inspectors is recommended. A review of 11 usability studies (12) found that inspectors evaluating an interface detected different sets of problems, with the average agreement between any 2 inspectors ranging from 5% to 65%. This “evaluator effect” appears to exist for both novice and experienced inspectors and for both the detection of usability problems as well as the assessment of problem severity. The authors of this review also recommend that this unavoidable effect be dealt with by involving multiple inspectors.
Heuristic evaluation is the most commonly used of the inspection methods. Each inspector evaluates the application independently, usually working through it at least twice. On the first pass, the overall flow of the application is evaluated, and on the second pass, each interface element is examined in detail. Inspectors may be asked to categorize the problems found with respect to their severity and the heuristic(s) violated, and they may also suggest solutions to the problems identified. Compared with other inspection methods, heuristic evaluation appears to be a better predictor of problems that are encountered by end users and also identifies more severe usability problems (14, 22). The ideal inspectors would be “double experts” at usability and the domain of the application being evaluated (22), but such individuals are likely to be hard to find and may be expensive to employ.
While heuristic evaluation is often the most common approach used by practitioners in the field of human-computer interactions, its impact on influencing software design is often rated by these usability professionals as being well below that of tests conducted with real users (18, 29). Software developers and project managers appear less willing to make design changes based on expert reviews, which they believe may include many “false alarms” that may not necessarily affect real users, than when end users have been observed first hand encountering problems with the interface (8). The comparative usability evaluation study of Molich and Dumas (20), however, found no significant differences between the results of usability testing and expert reviews. They consider reviews by expert practitioners comparable to usability testing and point out that usability testing should not be seen as a “gold standard” as it overlooks usability problems like any other method.
We (5) have developed a Web-based multimedia application to help medical students and practicing colleagues acquire expertise in the diagnosis and treatment of electrolyte and acid-base disorders. This e-learning resource is available at http://www.learnphysiology.org/sim1/. It provides instruction and hands-on practice via an interactive treatment simulation. We (4, 5) have previously described the development of our “Electrolyte Workshop” and the results of user testing with 15 residents and fellows in internal medicine and its subdisciplines. Briefly, the usability software tool Morae was used to facilitate the recording and analysis of the interaction of participants with the application. Measures of effectiveness (task completion rates and usability problem counts) and measures of efficiency (time on task and mouse activity) were studied. This evaluation revealed several serious problems that rendered the application unusable for a large proportion of study participants. An interactive treatment simulation, for example, was successfully completed by only 20% of participants.
While the evaluation with typical end users was extremely valuable, it was, however, very resource intensive, especially in regard to recruiting suitable participants and in terms of the time required to log and analyze the recordings of the testing sessions. The study was eventually completed over the course of several months. We therefore followed up this study by exploring whether usability inspection by experts might provide an equally effective but more efficient alternative.
This report details the heuristic evaluation of our Electrolyte Workshop conducted by a panel of experts. The findings were also compared with those previously obtained by user testing to try and identify the most efficient method for improving our e-learning resources.
Ethics approval for the project was granted by the Committee for Human Research of the Faculty of Medicine and Health Sciences of Stellenbosch University (project no. N08/05/158).
The e-learning resource: our Electrolyte Workshop.
This Adobe Flash application (http://www.learnphysiology.org/sim1/) consists of case-based tutorials and can be accessed over the internet by any Web browser. Each case consists of a series of slides, with the navigation and therefore the pace of the tutorial controlled by the user. There are two main sections to the Electrolyte Workshop: the first uses a “look and learn” approach and is called the WalkThru section. A clinical problem is presented, followed by a demonstration of how an expert would analyze the data and embark on treatment. Animation is used to illustrate changes in body fluid compartment sizes, brain cell size, and plasma Na+ concentrations. The “look and learn” concept is analogous to the use of worked-out examples in disciplines such as mathematics and physics and allows students to appreciate underlying principles rather than being focused on finding solutions to the problem presented (28).
The second section, called the HandsOn section, is more interactive. Each case includes a simulation that provides the opportunity for deliberate practice of the treatment of patients with electrolyte disorders and, in particular, the accurate prescription of intravenous fluid therapy. HandsOn cases begin with a series of “lead-in” slides containing the clinical and laboratory data, which set the scene for the treatment simulation. Within the simulation, users select from a menu of therapies and receive immediate feedback on the treatment applied via on-screen text messages and animations. Upon completion of the simulation, a final summary slide displays several “take-home messages.”
At present, there is one case in each section. The WalkThru case is that of a young woman with acute hyponatremia related to the use of the drug Ecstasy, and the HandsOn case is that of chronic hyponatremia due to Addison's disease. There is also a glossary that can be accessed via text hyperlinks on the slides or via a tab in the main navigation menu.
Heuristic evaluation procedures.
In this study, a panel of six experts conducted a heuristic evaluation. The panel consisted of a usability expert, two e-learning experts with expertise in instructional design, an internist with an additional qualification in medical informatics, and two experienced nephrologists as the subject matter experts. Inspectors were supplied with a website link to the application and worked independently. Written instructions included information about the purpose of the application and stated that their participation was aimed at improving the application and formed part of a research project. They were required to work through the different sections of the application and evaluate it according to a set of commonly used heuristics (Table 1) based on those of Nielsen (25) and as used by Karat et al. (15). A template for recording and grading the usability problems detected was provided. Inspectors were asked to indicate the heuristic(s) relevant to each problem and to assign severity scores based on its frequency, persistence, and impact. The severity rating scale of Nielsen (23) was used as follows: 1 = cosmetic problem only, need not necessarily be fixed; 2 = minor usability problem, fixing this should be a low priority; 3 = major usability problem, fixing this should be a high priority; and 4 = usability catastrophe, may cause task failure and must be fixed before releasing the application. Each inspector submitted a written report based on the template provided.
All problems found were cataloged and categorized according to severity, the interface element involved, and the heuristic(s) involved. Problems with severity scores of 3 and 4 were grouped together as serious problems and were then compared with the serious problems previously found by user testing.
The evaluation was completed within 1 wk of supplying the inspectors with their documentation and the link to the application. Their overall impression of the application was uniformly positive, with comments such as “easy to use,” “good visuals,” and “an excellent application.”
A total of 22 distinct usability problems were identified. Examples of these, with the interface element involved, the heuristics violated, and potential solutions, are shown in Table 2. There were 11 problems categorized as serious; each of these was detected by a median of 2 inspectors (range: 1–4). Each inspector detected a median of 4 of the 11 serious problems (range: 3–7).
Several usability problems were identified that related to the heuristic of ensuring the visibility of system status and providing appropriate user feedback. Two inspectors were concerned about the long loading time of the application over slower internet connections; one inspector suggested adding a progress indicator to keep users informed during the loading process. Inappropriate or unhelpful feedback and error messages in the interactive simulation were highlighted by four inspectors. It was also suggested that treatments previously applied by users be displayed to them, accompanied by useful feedback.
Problems related to the heuristic of user control and freedom included the inability to navigate back to the lead-in slides after entering the treatment simulation. This was identified by four inspectors. It was suggested by three inspectors that the take-home message summary slide after the completion of the simulation be displayed to all users and not only to those who had completed the simulation successfully. In the WalkThru case, inspectors recommended adding the functionality to allow users to replay animations on the slides rather than requiring them to navigate away from the slide and then back again to have the animation replayed.
The heuristic of ensuring a match with the real world was violated by the use of the same character, Suzie, in both the WalkThru and HandsOn cases (and with different diagnoses). This was highlighted as potentially confusing. In the WalkThru case, clinical and laboratory parameters on the patient data panel were not updated appropriately after the successful treatment of the patient, also violating this heuristic.
Two problems were identified that resulted from violations of the heuristic of providing an intuitive visual layout. With regards to the lead-in slides of the HandsOn case, two inspectors pointed out that a laboratory data panel displaying the patient's blood and urine chemistry results could easily be missed by users and suggested that their attention be drawn to it in some way. This panel slides open on clicking its tab at the side of the screen (Fig. 1). The problem of the open panel obscuring other on-screen information was also identified.
The heuristic of consistency and conformity to standards was violated by the use of too-small font sizes for the text on the slides. This was highlighted by two inspectors.
Inspectors also recommended reducing the word count and eliminating unnecessary animation on certain slides to conform to the heuristic of aesthetic and minimalist design.
The heuristic of error prevention and tolerance was violated in the design of the selection and application of treatments in the simulation. This was identified by four inspectors as a serious usability problem. The simulation was designed to permit treatments to be applied sequentially, and not simultaneously, so that appropriate feedback could be given after each step. Treatment options are grouped and displayed in separate panels (Fluid, Salt Treatment, and Drug Treatment) with only one panel open at a time (Fig. 2). Moving from one panel to the next causes the previous panel to be closed and any selected option in that panel to be deselected. Because the first panel closes, users might not realize that the first treatment option was no longer selected and unsuccessfully attempt to select and apply multiple treatments simultaneously.
Inspectors also made suggestions relating to cosmetic changes and relatively minor usability problems. Examples of these included suggestions for font changes, adding a period after each glossary entry, and using the singular “Select your character” and not “characters” to indicate that only a single case scenario was presently available in each section of the application. There were also new feature requests that did not address an identified usability problem. An example of this was the suggestion that users have the ability to print summary notes of the cases upon completion.
A comparison of the detection of the most important usability problems by heuristic evaluation versus user testing is shown in Table 3. Among the problems identified by heuristic evaluation but not user testing were the need for a progress indicator while loading the application, text with too-small font sizes, unnecessary words and animation, the need to be able to replay the animations, and the problem with navigation. The most important problems identified by user testing but missed by the heuristic evaluation were the difficulties with using the slider control to select dosages in the treatment simulation (Figs. 2 and 3). User testing also highlighted the underutilization of the glossary: no participants accessed it from text hyperlinks as they worked through the slides. Those who viewed the glossary did so via the main navigation tab and at the end of the session, most likely only because this was required by the written instructions.
Heuristic evaluation of our Electrolyte Workshop by a panel of experts proved to be an efficient approach to improving usability. The evaluation was completed in a short space of time and detected most of the serious usability problems found by previous user testing as well as serious additional problems not identified by user testing. Heuristic evaluation thus presents an appealing option when time and financial resources are limited, as is often the case when developing e-learning materials. An additional advantage of using heuristic evaluation is that expert inspectors may often suggest solutions to problems found and may also highlight the strengths of a design.
A team of four to five experts can be expected to identify ∼70% of usability problems (27). However, more problems will be missed when inspectors are inexperienced or lack domain expertise. Nielsen (22) found that novice inspectors uncovered 22% of problems, general usability professionals discovered 41%, and “double experts” who were specialists in usability as well as in the particular domain of the interface being tested were best found 60% of the problems. It is therefore important to have a good mix of experience and expertise when assembling a panel of inspectors.
Observing typical end users interacting with the application remains important however, as they may expose problems that experts, with their advanced computer skills, would not encounter (Table 3). The problem with the slider control is a case in point, where none of our expert panel had any difficulty dragging the slider to indicate the treatment dose in the simulation, whereas several participants in our earlier user testing study (4) could not work out how to use this at all, rendering the simulation unusable for these individuals. Another potential drawback of only using heuristic evaluation is that problems identified by inspection methods do not seem to have the same credibility with software developers and managers as those identified through testing “real” users (7).
User testing with the collection of subjective data by validated questionnaires is another attractive option when resources are limited. The System Usability Scale (2), for example, is freely available and easy to administer and yields a score of overall usability, which is useful for comparison with other applications and with different iterations of the same application. However, it does not generate a list of usability problems to fix and, on its own, would be of limited use when the aim is improving the application.
Heuristic evaluation and user testing each appear to identify important usability problems overlooked by the other method. It has therefore been suggested that both methods be used to supplement each other, with heuristic evaluation being used first to identify and correct the more obvious problems and, after the subsequent redesign, user testing be used to try and uncover the remaining problems (8, 23).
Heuristic evaluation is an efficient way of improving the design of e-learning materials in resource-constrained environments, considerably reducing the cost and time of evaluating usability. In terms of effectiveness, it compares well with user testing where typical end users are directly observed while using the application. In our study, heuristic evaluation detected several serious usability problems with our Electrolyte Workshop, each of which could have resulted in a substantial loss of educational impact. However, at least one serious problem was missed by heuristic evaluation, and we therefore support the recommendation that a combination of methods be used whenever possible, to increase the likelihood that most of the serious usability problems are detected and addressed. Ideally, heuristic evaluation should be used first and at an early stage in the development cycle. Combining heuristic evaluation with user testing, and involving a small number of participants with each cycle of testing, should provide valuable and rapid feedback to guide the development of usable e-learning materials for our health sciences programs.
This work was supported by grants from the South African Universities Health Sciences IT Consortium and Stellenbosch University's Fund for Innovation and Research into Learning and Teaching.
No conflicts of interest, financial or otherwise, are declared by the author(s).
Author contributions: M.R.D., U.M.E.C., and M.L.H. conception and design of research; M.R.D. analyzed data; M.R.D., U.M.E.C., and M.L.H. interpreted results of experiments; M.R.D. prepared figures; M.R.D. drafted manuscript; M.R.D. and U.M.E.C. edited and revised manuscript; M.R.D., U.M.E.C., and M.L.H. approved final version of manuscript.
- Copyright © 2013 the American Physiological Society