This article was written as an assignment in EME6606 at the University of Florida during Fall 2014. It was later published as:
Wilson, M., Sahay, S., & Calhoun, C.D. (2014). ADDIE Explained: Evaluation. Retrieved from http://www.aritzhaupt.com/addie_explained/evaluation/.
The final stage of the ADDIE model has arrived. Evaluation has been reached. Yet, is it the end? Evaluation is an integral part of the ADDIE model, but it is also an integrated part. In this chapter, evaluation itself will be evaluated, and by the end of this chapter you will be able to:
- Summarize the theoretical foundations of evaluation and its application.
- Define evaluation.
- Categorize tasks roles within the three types of evaluation.
- Explain the process of evaluation and its role within instructional design.
- Recognize leaders in the domain of evaluation.
- Develop a plan for ongoing evaluation in an instructional design project.
Why do we evaluate?
Evaluation helps us to determine whether our instructional implementation was effective in meeting our goals. As you can see in figure 1, evaluation sits at the center of the ADDIE model, and it provides feedback to all stages of the process to continually improve our instructional design. Evaluation can answer questions such as: Have the learners obtained the knowledge and skills that are needed? Are our instructional goals effective for the requirements of the instructional program? Are our learners able to transfer their learning into the desired contextual setting? Do our lesson plans, instructional materials, media, assessments, etc. meet the learning needs? Does the implementation provide effective instruction and carry out the intended lesson plan, and instructional objectives? Do we need to make any changes to our design to improve the effectiveness and overall satisfaction with the instruction? These questions help shape the instruction, confirm what and to what extent the learn is learning, and validates the learning over time to support the choices made in the instructional design, as well as how the program holds up over time.
What Is Evaluation?
To get started with evaluation, it is crucial to understand the overall picture of the process. The use of varied and multiple forms of evaluation throughout the design cycle is one of the most important processes for an instructional designer to employ. To that end, this first section of the evaluation chapter attempts to explain what evaluation is in terms of the varying types of evaluation; evaluation’s overarching relationship throughout the ADDIE model; the need for both validity and reliability; how to develop standards of measurement; and evaluation’s application in both education and training.
What Does Evaluation Look Like?
The first, most important step to understanding evaluation is to develop a knowledge-base about what the process of evaluation entails. In other words, a design must comprehend evaluation in terms of the three components of evaluation: formative, summative, and confirmative. Each of these forms of evaluation is examined in detail here, both through the definition of the form itself and an explanation of some of the key tools within each.
Historically speaking, formative evaluation was not the first of the evaluation processes to have been developed, but it is addressed first in this chapter because of its role within the design process. Yet, it is important to place the development of the theory behind formative evaluation in context. Reiser (2001) summarizes the history of formative evaluation by explaining that the training materials developed by the U.S. government in response to Sputnik were implemented without verifying their effectiveness. These training programs were then later demonstrated to be lacking by Michael Scriven, who developed a procedure for testing and revision that became known as formative evaluation (Reiser, 2001).
Formative evaluation is the process of ongoing evaluation throughout the design process for the betterment of design and procedure within each stage. One way to think about this is to liken it to a chef tasting his food before he sends it out to the customer. Morrison, Ross, Kalman, and Kemp (2013) explain that the formative evaluation process utilizes data from media, instruction, and learner engagement to formulate a picture of learning from which the designer can make changes to the product before the final implementation. Boston (2002, p. 2) states the purpose of formative evaluation as “all activities that teachers and students undertake to get information that can be used diagnostically to alter teaching and learning.” Regardless if instructional designers or classroom practitioners conduct the practice, formative evaluation results in the improvement of instructional processes for the betterment of the learner.
To effectively conduct formative evaluation, instructional designers must consider a variety of data sources to create a full picture of the effectiveness of their design. Morrison et al. (2013) propose that connoisseur-based, decision-oriented, objective-based, public relations, constructivist evaluations are each appropriate data points within the formative process. As such, an examination of each format in turn will provide a framework for moving forward with formative evaluation.
Subject-matter experts, or SMEs, and design experts are the primary resources in connoisseur-based evaluations. These experts provide instructional analyses, performance objectives, instruction, test and other assessments to verify objectives, instructional analysis, context accuracy, material appropriateness, test item validity, sequencing. Each of these points allow the designer to improve the organization and flow of instruction, the accuracy of content, the readability of materials, the instructional practices, and total effectiveness (Morrison et al., 2013). In short, SMEs analyze the instruction from which they make suggestions for improvement.
Often as instructional designers, one must make choices within the program of study being developed that require reflective thought and consideration. Morrison et al. (2013) describe this type of formative evaluation as decision-oriented. The questions asked during decision-oriented evaluations may develop out of the professional knowledge of an instructional designer or design team. These questions subsequently require the designer to develop further tools to assess the question, and as such should be completed at a time when change is still an option and financial prudent (Morrison et al., 2013).
If a program of study is not delivering the desired results, a provision for possible change should be considered. Through an examination of the goals of a course of instruction, the success of a learner’s performance may be analyzed. This is the primary focus for objective-based evaluation. While making formative changes are best conducted during earlier stages of the ADDIE cycle, these changes may come later if the situation dictates it. Objective-based evaluations may generate such results. According to Morrison et al. (2013), when summative and confirmative evaluations demonstrate undesirable effects, then the results may be used as a formative evaluation tool to make improvements. Morrison et al. (2013) recommend combining the results of objective-based evaluations with connoisseur-based because of the limited ability to make changes from the data from pre-test/post-test format objectives-based assessment typically employs. However, Dimitrov and Rumrill (2003) suggest that analysis of variance and covariance statistical tests can be used to improve test design. The application of statistical analyses improve the validity and reliability of the design. Therefore, this may also suggest that similar comparisons may also be useful in improving overall instruction.
Occasionally the formative process for a program may call for showing off the value of the project as it is being developed. Morrison et al. (2013, p. 325) refers to this form of formative data as “public-relations-inspired studies.” Borrowing from components of the other formats discussed above, this type of evaluation is a complementary process that combines data from various sources to generate funding and support for the program (Morrison et al., 2013). However, it should be noted that this process should happen during the later stages of development, because the presentation of underdeveloped programs may do more harm than good in the development process (e.g., the cancellation of pilot programs due to underwhelming results).
Some models of evaluation are described as being behavior driven and biased. In response to those methods, multiple educational theorist have proposed that the use of open-ended assessments allowing for multiple perspectives that can be defended by the learner (Richey, Klein, & Tracey, 2011). Such assessments pull deeply from constructivist learning theory. Duffy and Cunningham (1996) make the analogy that “an intelligence test measures intelligence but is not itself intelligence; an achievement tests measures a sample of a learned domain but is not itself that domain. Like micrometers and rulers, intelligence and achievement tests are tools (metrics) applied to the variables but somehow distinct from them” (p. 17). How does this impact the formative nature of assessment? Constructivist methods are applicable within the development of instruction through the feedback of the learner to shape the nature of learning and how it is evaluated.
Dick et al. (2009) claim the ultimate summative evaluation question is “Did it solve the problem” (p. 320)? That is the essence of summative evaluation. Continuing with the chef analogy from above, one asks, “Did the customer enjoy the food?” The parties involved in the evaluation take the data, and draw a conclusion about the effectiveness of the deigned instruction. However, over time summative evaluation has developed into a process that is more complex than the initial question may let on. In modern instructional design, practitioners investigate multiple questions through testing to assess the learning that ideal happens. This differs from the formative evaluation above in that summative assessments are not typically used to assess the program, but the learner. However, summative evaluations can also be used to assess the effectiveness of learning, efficiency and cost effectiveness, lastly attitudes and reactions to learning (Morrison et al., 2013).
Like the overall process of summative evaluation is summarized above with one simple question, so can its effectiveness. How well did the student learn? Perhaps even, did we teach the learner the right thing? “Measurement of effectiveness can be ascertained from test scores, ratings of projects and performance, and records of observations of learners’ behavior” (Morrison et al., 2013, p. 328). However, maybe the single question is not enough. Dick et al. (2009) outline a comprehensive plan for summative evaluation throughout the design process, including collecting data from SMEs and during field trials to feedback. This shifts the focus from the learner to the final form of the instruction. Either way, the data collected tests the successfulness of the instruction and learning.
Learning Efficiency and Cost Effectiveness
While learning efficiency and cost effectiveness of the instruction are certainly distinct constructs, the successfulness of the former certainly impacts the later. Learning efficiency is a matter of resources (e.g., time, instructors, facilities, etc.), and how those resources are used within the instruction to reach the goal of successful instruction (Morrison et al., 2013). Dick et al. (2009) recommend comparing the materials against an organization’s needs, target group, and resources. The end result is the analysis of the data to make a final conclusion about the cost effectiveness based on any number of prescribed formulas. Morrison et al. (2013) acknowledge the relationship between this form of summative evaluation and confirmative, and sets the difference at the time it takes to implement the evaluation.
Attitudes & Reactions to Learning
The attitudes and reactions to the learning, while integral to formative evaluation, can be summatively evaluated, as well. Morrison et al. (2013) explain there are two uses for attitudinal evaluation: evaluating the instruction and evaluating outcomes within the learning. While a majority of objectives within learning are cognitive, psychomotor and affective objectives may also be goals of learning. Summative evaluations often center on measuring achievement of objectives. As a result, there is a natural connection between attitudes and the assessment of affective objectives. Conversely, designers may utilize summative assessments that collect data on the final versions of their learning product. This summative assessment measures the reactions to the learning.
The customer ate the food and enjoyed it. But, did they come back? The ongoing value of learning is the driving question behind confirmative evaluation. Confirmative evaluation methods may not differ much from formative and summative outside of the element of time. Confirmative evaluation seeks to answer questions about the learner and the context for learning. Moseley and Solomon (1997) describe confirmative evaluation as falling on a continuum between a customer’s or learner’s expectation and assessments.
Evaluation’s Relationship within the ID Process
Whenever examining the premises evaluation, one must look at the connections to other areas of the instructional design process. Each form of evaluation (formative, summative, and confirmative) can make significant difference to the quality of instruction when applied throughout the stages of the ADDIE model. Each of these is examined in depth later in this chapter, so are only briefly summarized here. During analysis, evaluation tools can be developed to assist in the breakdown of content into task, content, and tasks. Formative and summative assessment of the instructional design conducted by the SMEs involved in the project shape and finalize these lists prior to the design stage. Alternately, confirmative analysis may be used as a new form of learner analysis from which redesign may develop. Design decisions, like the sequencing, strategies used, and the instructional message, are once again areas to use formative assessment. Feedback from SMEs and focus groups frame the decision made by designers in this area. Also, design looks at the objectives and the assessment of those objectives as part of the consideration before moving forward to develop instructional materials. Development of materials by designers requires examining the best practices within research to create materials. Furthermore, designers examine the overall program cost. As referenced above, both of these are key factors gauged with summative, and more likely confirmative, evaluations. As instruction is implemented, the picture of evaluation is painted with all three brushes. Learners are assessed using all three methods to shape learning and pacing, get the measure of performance, and make long-term determinations about the change in student practice. Clearly, evaluation is a deep and ongoing process throughout the design of instruction.
Validity & Reliability
Who evaluates the evaluators? This is the question of validity and reliability. Morrison et al. (2013) define both validity and reliability as the evaluation measures learning and the evaluation is a consistent measure, respectively. In Figure 2, Trochim (2006) illustrates a common analogy for validity and reliability.
Figure 2 Validity vs. Reliability Scatterplots
As can be seen with the third portion of the figure, validity and reliability do not always go together. Each must be considered separately to achieve both within the design process.
Both validity and reliability require evaluation in and of themselves to evaluate the instructional design and learning that is taking place. Shepard (1993) states the four types of validity are content validity, construct validity, concurrent validity, and predictive validity, with the first three representing the primary methodology. Each of these validity forms has a preferred methodology unique to the type. For example, Morrison et al. (2013) suggest one method for achieving content validity is the use of a performance-context matrix to verify test question content. Instructional designers can evaluate reliability through a number of methods directly related to quantitative research methods. Some manners might be analysis of a test-retest method, split-half correlations within a single test administration, or examining reliability coefficients (Morrison et al., 2013).
When and who does it?
When a new program has been endorsed and accepted by an institution or an organization, there emerges the need for an evaluation. An instructional designer is approached to undertake this task of evaluation. It is significant for the implementing agency to realize that the achievements of the new program are aligned to the goals and purposes of the program. For this evaluation, an instructional designer conducts three layers of the abovementioned types of evaluation respectively- formative, summative and confirmative evaluation.
When a teacher or a designer develops a lesson plan, one must keep in mind that what seems plausible as an idea or in the initial stages might not work out the same way when put to a full blown functional program. Here, formative assessment takes a significant role in the instructional design process of the lesson plan. As has been defined above, the first phase of evaluation- the formative evaluation is conducted since the inception of the idea, concept stage to the development and tryouts of the instructional materials and lesson plans stage. The purpose of this assessment by the instructional designer becomes crucial because one might figure out that it is not working out as expected before one puts and wastes a lot of valuable time and resources (Morrison, Ross, Kalman, Kemp, 2011).
When it is believed and tested that the initial concepts and design of the program are sound, the program is implemented with one’s target audience over a period of time. An instructional designer prepares and comes out with the summative evaluation that aims to test the success of the program at the end of the training. The different tools used for evaluation will be explained in the next section of this chapter. The purpose of the summative evaluation by the designers again has a significant role to play in order to assess the success or the failure of the training program. Usually the designer of the program conducts posttests and a final examination to test where the learners stand in gaining the newly trained knowledge and also assessing how far the training stayed close to the objectives of the program.
The third type of evaluation conducted by program evaluators is confirmative evaluation which was originally introduced by Misanchuk (1978) based on the logic that evaluation needs to move beyond summative evaluation in order to get an improved, large-scale, replicative in nature training model.
It has be said that in order to have an effective teaching, one requires frequent feedback from the learners to check the learning progress and also monitor the efficacy of the pedagogical process selected for teaching (Heritage, 2007). An instructional designer can evaluate both the teacher and the learner’s initial reaction to a new pedagogical instruction. Liz Hollingworth (2012) maintains that formative assessments are metacognitive tools designed to analyze the initial teaching and learning profile for both teachers and students in order to track their reaction and progress over a period of time.
Evaluating reactions is the first step towards forming a sound assessment base. It will be beneficial if an evaluator or an instructional designer evaluates the initial reactions towards the newly introduced training program. Once it is believed that there is no resistance by the learners towards the new program, one may assume that learners will not drop out of the program due to reasons like non-acceptance or inability to understand the ideas and concepts of the training in the first place itself. It also helps the evaluator to control the gear of the program as one move ahead in the training phase. It leaves less frustration and vagueness in the evaluator’s mind if one knows that all the learners are positively oriented towards undertaking the training.
An evaluator or the instructional designer of the program continues with the process of evaluation as the teaching and learning unfold with the implementation of the training program. Several studies in the field of educational measurement have suggested that assessments and evaluations lead to higher quality leaning. Popham (2008) calls this new aspect of assessment in the evaluation process as “Transformative Assessment” where an evaluator identifies learning progression of the learners by analyzing the sequence of skills learnt over the period of study program. This also helps the evaluator or the instructional designer to invent ways to assess how much has the learners mastered the learning material, making one go back to formative assessments.
Evaluating learning is an ongoing process in the phase of instructional assessment. It is important to evaluate whether an evaluator continued to be focused on the original problem and also whether the training materials developed solved the problems that were identified. These objectives can be tested by evaluating the knowledge gained by the learners through ongoing formative evaluation as well summarizing summative evaluation techniques (discussed in the next section). This is one of the most important aspects to the success of the program. When a trainee masters the content of the training or exhibits proper learning through tests outcome, one can assume the effectiveness of the program and also draw out what did not work if the learning outcomes show adverse results.
Attitudes and behavior are important indicators towards the acceptance and success of a training program. Dick (2009) mentions that an evaluator needs to write directions to guide the learner’s activities and construct a rubric (a checklist, a rating scale, etc.) in order to evaluate and measure performance, products, and attitudes. A learner develops several intellectual and behavioral skills and an evaluation can suggest what changes have been brought in the attitude and behavior of the learners. Testing behavior would fall under the novel approaches which evaluators are conducting these days along with developing the traditional tests for evaluating test results and so on.
With every training program, evaluating results is the most significant task by an evaluator that determines how closely one has been able to achieve success in the implementation of the program. An evaluator conducts a summative evaluation in order to test the effectiveness of learner’s learning. The evaluator also measures several other factors while evaluating the result of the program. As mentioned by Morrison, Kemp et al. (2011), an evaluator will also measure the efficiency of learning in terms of materials mastered and time taken; cost of program development; continuing expenses; reactions towards the program; and long-term benefits of the program. Apart from the summative evaluation at this stage, an evaluator might also conduct a confirmative evaluation with the results that will be a step towards confirming what worked and what did not work in the program for its large-scale replication, if needed.
How do we do it?
Formative evaluation occurs during instructional design and is the process of evaluating instruction and instructional materials to obtain feedback that in turn drives revisions to make instruction more efficient and effective. It is an iterative process that includes at least three phases (Figure 3). Begin with one-to-one evaluation, then small group evaluation, and finally a field trial. Results from each phase of evaluation are fed back to the instructional designers to be used in the process of improving design.
After each phase, the ID should consider the results of the evaluation and meet with project stakeholders to make decisions about instructional changes or whether to move to the next stage of evaluation. Data and information are collected and summarized in the formative evaluation and used to make decisions about whether or not the instruction is meeting its intended goals. Instructional elements can then be revised to improve instruction. The development and implementation of formative evaluation requires the involvement of Instructional Designers, Subject Matter Experts, Target Learners, and Target Instructors.
The purpose of the one-to-one evaluation is to identify and remove the most obvious errors and to obtain initial feedback on the effectiveness of the instruction. During this evaluation IDs should be looking for clarity, impact and feasibility (Dick, Carey, & Carey, 2009, p. 262). Results from one-to-one evaluation can be used to improve instructional components and materials before a pilot implementation.
Select a few learners who are representative of the target learners. Choose learners that represent the variety of learners that will participate in instructions. Don’t choose learners that represent the extremes of the population. It would be good to have one average, one above average and one below average learner. This will ensure the instructional materials are accessible to a variety of learners.
The one-to-one evaluation is much like a usability study. Ensure the learner that what is being evaluated is the instruction and instructional materials, not the learner. The learner should be presented with the instructional materials that will be provided during the instruction. Encourage the learner to discuss what they see, write on materials as appropriate, note any errors, etc. The ID can engage the learner in dialog to solicit feedback on the materials and clarity of instruction.
There are many technological tools that can facilitate a one-on-one evaluation. In Don’t Make Me Think (Krug, 2014) Steve Krug describes a process of performing a usability study for web site development. The steps he provides are a good guide for performing a one-to-one evaluation. Krug recommends video recording the session for later analysis. If instruction is computer based, there are also tools available that can record the learner interaction as well as video record the learner’s responses. Morae from Techsmith is a tool that allows you to record user interactions and efficiently analyze the results.
Small group evaluation is used to determine the effectiveness of changes made to the instruction following the one-to-one evaluation and to identify any additional problems learners may be experiencing. Additionally, the question of whether or not learners can use the instruction without interaction from the instructor is evaluated here.
In the small group evaluation, the instructor administers the instruction and materials in the manner in which they are designed. The small-group participants complete the lesson(s) as described. The instructional designer observes but does not intervene. After the instructional lesson is complete, participants should be asked to complete a post-assessment designed to provide feedback about the instruction.
After the recommendations from the small group evaluation have been implemented it is time for a field trial. A field trial is conducted exactly as you would conduct instruction. The selected instruction should be delivered as close as possible to the way they are designed to be implemented in the final instructional setting and instruction should occur in a setting as close to the targeted setting as possible. Learners should be selected that closely match the characteristics of the intended learners. All instructional materials for the selected instructional section, including the instructor manual, should be complete and ready to use.
Data should be gathered on learner performance and attitudes. In addition data should be gathered about the time required to use the materials in the instructional context and the effectiveness of the instructional management plan. During the field trial the ID does not participate in delivery of instruction. The ID and the review team will observe the process and record data about their observations.
The purpose of a summative evaluation is to evaluate instruction and/or instructional materials after they are finalized. It is conducted during or immediately after implementation. This evaluation can be used to document the strengths and weaknesses in instruction or instructional materials to make a decision about whether or not to continue instruction. Or it could be used to determine whether or not to adopt instruction. External evaluators for decision makers usually conduct summative evaluation. Subject matter experts may be needed to ensure integrity of the instruction and/or instructional materials.
The summative evaluation has two phases the expert judgment and the field trial. The expert judgment tries to answer the question of whether or not the developed instruction meets the organizational needs. The purpose of the field trial is to answer the question of whether or not the developed instruction is successful in producing the intended learning gains. Table 2 gives examples of the questions that are being asked in each of the two phases of the summative evaluation.
|Expert Judgment Phase||Field Trial Phase|
|Do the materials have the potential for meeting this organization’s needs?||Are the materials effective with target learners in the prescribed setting?|
|Congruence Analysis: Are the needs and goals of the organization congruent with those in the instruction? Content Analysis: Are the materials complete, accurate, and current? Design Analysis: Are the principles of learning, instruction, and motivation clearly evident in the materials? Feasibity Analysis: Are the materials convenient, durable, cost-effective, and satisfactory for current users.||Outcomes Analysis: Impact on Learners: Are the achievement and motivation levels of learners’ satisfactory following instruction? Impact on the Job: Are learners able to transfer the information, skills, and attitudes from instructional setting to the job setting or to subsequent units of related instruction? Impact on Organization: Are learners’ changed behaviors (performance, attitudes) making positive differences in the achievement of the organization’s mission and goals (e.g. reduced dropouts, resignations, improved attendance, achievement, increased productivity, grades)? Management Analysis: Are instructor and manager attitudes satisfactory?Are recommended implementation procedures feasible?Are costs related to time, personnel, equipment, and resources reasonable?|
Table 2: (Dick, Carey, & Carey, 2009, p. 321)
Expert judgment (Congruence Analysis)
The expert judgment phase consists of performing a congruence analysis consisting of a content analysis, design analysis, utility and feasibility analysis, and current user analysis (Figure 4). You will need copies of the organizational goals and instructional materials for the evaluation. See Table 2 for a list of questions that you are trying to answer during this phase. As each phase is conducted a no, no go, decision is made. If no, then the materials are sent back with feedback for further design and development. This phase is conducted with the instructional designer, the subject matter experts and often an external reviewer. Target learners are not involved in this stage of evaluation.
Figure 4 Sequence of the Stages of Expert Judgment
Quality Matters Peer Review
One example of an expert judgment summative review would be the Quality Matters (QM) peer review process. The Maryland Online (MOL) consortium initiated the QM review process. It was established in 1999 to leverage the efforts of and increase the collaboration among institutions involved in online learning. The QMRubric was developed under a grant by the U.S. Department of Education’s, Fund for Improvement of Postsecondary Education (FIPSE). The goal of the QM project is to improve student learning, engagement, and satisfaction in online courses through better design. The QM peer review is a faculty driven initiative; in which online learning experts develops standards of quality course design and online faculty carry out a peer review in conjunction with the course designer. Rubrics are now available for Higher Education, K-12, and Continuing and Professional Development. (Introduction to the Quality Matters Program, 2013).
The QM Rubric is based on eight general standards of quality course design. While this rubric was developed for online environments, it is easy to see how it could be adapted for use in evaluating face-to-face instruction, or even other delivery methods. The general standard categories are:
- The Course Overview and Introduction
- Learning Objectives and Competencies
- Assessment and Measurement
- Instructional Materials
- Learner Interaction and Engagement
- Course Technology
- Learner Support
During the peer review process, a team of three reviewers reviews the course. The peer review team may consist of a team leader, a subject matter expert, and at least one external member that is not associated with the subject or area of instruction. This allows for a balance view of instructional design and materials. The ID will prepare a course overview sheet for the design team that includes information about supplementary materials such as the course textbook, or any other information that the review team may need in performing the review.
As the review team reviews the course, they evaluate the instructional materials & design against the standards in the rubric. Each standard includes annotations that give examples of effective design elements. Reviewers decide if a standard is Met or Not Met, and then gives thoughtful constructive feedback to the ID that will help them to improve the course. A standard passes if the majority (2 out of 3) reviewers agree the standard passes. The purpose of the review is to improve the course design. The goal is that all courses will pass the review with the 85% met standard. The review process is iterative allowing the ID to make changes as necessary to ensure the course passes the review (see figure 4).
Once you have completed the expert judgment and determined instructional materials meet the goals of the instruction and are appropriate for the learners, you are ready to begin planning implementation of a field trial. The field trial helps to determine the question of whether or not the instruction is effective with the target learners in their normal context. It also looks at whether or not the implementation plan is feasible and whether or not time, costs and resource allocations on target.
The field trial should be held in a context that closely matches the intended context for instruction. Learners should be selected that closely match the intended learner population. The number of learners selected should also closely match that of the intended implementation. At this phase, the field trial could be for a select section of the instruction, or could be a pilot implementation of the full instructional design.
The purpose of a confirmative evaluation is to determine if instruction is effective and if it met the organization’s defined instructional needs. In effect, did it solve the problem? Confirmative evaluation goes beyond the scope of formative and summative evaluation and looks at whether or not the long term effects of instruction is what we were hoping to achieve. Is instruction affecting behavior or providing learners with the skills needed as determined by the original goals of the instruction?
Confirmative evaluation should be conducted on a regular basis. The interval of evaluation should be based on the needs of the organization and the instructional context. The focus of confirmative evaluation should be on the transfer of knowledge or skill into long-term context. To conduct a confirmative evaluation, you may want to use observations with verification by expert review. You may also develop or use checklists, interviews, observations, rating scales, assessments, and a review of organizational productivity data.
Who are the leaders?
An early and popular planning model was first developed by Tyler in his famous book, ‘Basic Principles of Curriculum and Instruction (1949). It has been claimed that the strength of this model and the taxonomies lies in the fact that it emphasized accountability. May (1986) defines that Tyler’s model considers three primary sources of curriculum- students, society, and subject matter in formulating tentative general objectives of the program that reflects the philosophy of education and the psychology of learning. It’s a linear model where the sequence is followed strictly. A unit plan is prepared on precise instructional objectives that lead to selection and organization of content and learning experiences for the learners. The model stresses an in-depth evaluation of learner’s performance.
Stufflebeam (1971) describes CIPP model as a sound framework for both proactive evaluation to serve decision-making and retroactive evaluation to serve accountability. The model defines evaluation as the process of delineating, obtaining, and providing useful information for judging decision alternatives. The model includes three steps in the evaluation process- delineating, obtaining, and providing. And, it includes four kinds of evaluation- context, input, process, and product (the first letters of the names of these four kinds of evaluation gave the acronym- CIPP). The model provides how the steps in evaluation process interact with these different kinds of evaluation.
Stake in 1969 created an evaluation framework to assist an evaluator in collecting, organizing, and interpreting data for the two major operations (or countenances) of any evaluation, which include a) complete description and b) judgment of the program. Popham (1993) defines that Stake’s schemes draw attention towards the differences between the descriptive and judgmental acts according to their phase in an educational program, and these phases can be antecedent, transaction, and outcome. This is a comprehensive model for an evaluator for completely thinking through the procedures of an evaluation.
Scriven provides a transdisciplinary model of evaluation in which one draws from an objectivist view of evaluation. Scriven defines three characteristics to this model: epistemological, political, and disciplinary. Some of the important features of Scriven’s goal free evaluation stress on validity, reliability, objectivity/credibility, importance/timeliness, relevance, scope and efficiency in the whole process of teaching and learning.
Kirkpatrick’s model of training evaluation proposes four levels of evaluation criteria which include- reactions, learning, behavior and result. Alliger and Janak (1989) identified the three problematic assumptions of the model as – levels being arranged in ascending order from reactions to result; levels being casually linked; and levels being positively inter-correlated.
Though, Kirkpatrick’s model of training evaluation has provided a significant taxonomy of evaluation to the field of instructional design.
Evaluation is the process of determining whether or not the designed instruction meets its intended goals. In addition, evaluation helps us to determine whether or not learners are able to transfer the skills and knowledge learned back into long-term changes in behavior and skills required for the target context. Evaluation also provides the opportunity for instructional designers to ensure all stakeholders are in agreement that the developed instruction is meeting the organizational goals
In this chapter we reviewed what evaluation looks like and its relationship with in the ADDIE ID process. We looked at several models of evaluation including Kirkpatrick’s Model and the four levels of evaluation: Evaluating Reaction, Evaluating Learning, Evaluating Behavior, and Evaluating Results. We also looked at the three phases of evaluation including formative evaluation, summative evaluation and confirmative evaluation. And finally, we reviewed the leaders in the field and their primary contributions to evaluation including: Tyler’s Model, CIPP Model, Stake’s Model, Scriven’s Model and Kirkpatrick’s Model.
Based on the chapter’s reading, discuss the following questions:
- Where does evaluation stand in the ADDIE model? How will your flow chart look when you describe evaluation in relation to the other stages of the ADDIE model?
- Describe the three stages of evaluation. Give an example to explain how an instructional designer will use these three stages in any particular situation.
- Which are the five types of formative evaluation methods mentioned in the chapter that assist in collecting data points for the initial evaluation? Which two of these methods will be your preferred choice for your formative evaluation and why?
- What all will be parameters to evaluate the success of the instructional training?
- Validity and reliability are important concepts to be kept in mind while conducting evaluation. How can an instructional designer ensure that these are well covered in one’s evaluation process?
- How will you place evaluation reaction, learning, behavior, and results in your pyramid of evaluating the whole process of the success or failure of the instructional training and explain the reason for your choice?
- There are different ways of assessment discussed in the chapter. Name all of them and discuss any two in detail.
- What are some of the techniques to conduct formative and summative evaluation?
- Several models of evaluation have been discussed in the chapter. Discuss any two of these models in detail and explain how will you apply these models in your evaluation process?
Chapter Quiz Items (15 items)
1. An analogy for the formative stage of evaluation may say formative evaluation is like when…
- The chef asks, “Did the customer enjoy the food?”
- The chef tastes the food before presenting it.
- The chef recreates the recipe.
- The chef checks to see if a customer keeps coming back for the food.
2. Which of these is not typically tested by summative analysis?
- learning effectiveness
- learning efficiency and cost effectiveness
- diagnostic information shaping ID
- attitudes & reactions to learning
3. An instructional designer finds that an evaluation does not consistently measure the learning, but distinctly measures the learning in a program. This evaluation is…
- reliable, but not valid
- valid, but not reliable
- neither valid nor reliable
- both valid and reliable
4. Formative evaluation takes place in a cycle that starts with design and then follows which of the following patterns?
- small group, field trial, one-to-one
- one-to-one, small group, field trial
- one-to-one, field trial, small group
- field trial, one-to-one, small group
5. Another term for expert judgment in the summative phase of evaluation is…
- congruence analysis
- subject-matter expert analysis
- decision-oriented analysis
6. Which of the following behaviors are not best evaluated with essay questions?
7. Which of the following is a specific decision within the expert judgment phase of summative analysis?
- feasibility analysis
- learner analysis
- job impact
- organizational behavior change
8. Using a “quality matters peer review,” when the cycle deviates from its standard path who role is it to make changes?
- an institutions
- faculty reviewers
- instructional designers
- faculty course reviewers
9. Which educational theory base, attempts to address the biased and behavior driven nature of evaluation?
10. The application of formative and summative evaluations by subject-matter experts typically occurs during which phrase of the ADDIE model?
11. Target learners are not usually involved in which analyses?
12. To evaluate behaviors resulting from a program of learning, instructional designers should look at which population?
- subject-matter experts
13. When conducting small group analyses, the learner will…
- examine first drafts of the instruction to give feedback
- examine completed materials ready for implementation
- work with instructional designers to evaluate materials
- as realistically as possible, but not in the materials’ final form
14. Whose model of evaluation provides a transdisciplinary look from which one draws from an objectivist view of evaluation?
15. Why is evaluation best situated in a mental model at the middle of the design process?
- Because there is no logical place to put evaluation linearly.
- Because ID is cyclical around evaluation.
- Because evaluation is used to provide feedback at all levels of the ADDIE model.
Assignment Exercises (5 exercises)
For the following exercises, you may use an instructional module that you are familiar with from early childhood, K-12, Higher Ed, Career and Technical, Corporate or other implementation where instructional design is needed. Be creative and use something from an educational setting that you are interested in. Be sure to describe your selected instructional module as it relates to each of these exercises. You may need to do some additional online research to answer these questions. Be sure to include your references in your response.
- Describe how you would conduct the three phases of the formative evaluation. Give an example of the type of evaluation you would conduct in each of the three phases. How will the results guide you in improving the instruction.
- Draw a diagram of the iterative formative evaluation process. What items are considered at each stage of the process? How is this information used to improve the design of instruction?
- Describe the context and learner selection process you would use for setting up a formative evaluation field trial.
- What materials should the designer include in a field trial?
- You have been asked to serve as an external evaluator on a summative evaluation of a training model designed by one of your colleagues. What phases of the summative evaluation use an external reviewer? What might a rubric look like that you could use during an external review?
As part of a group of instructional designer, project manager, teachers, students, administrators and clients of the project, conduct an evaluation study in order to understand how well an instructional training has been successful in achieving the goals of the training program. Keep in mind the group project conducted in the previous development and implementation chapters and conduct an evaluation study in order to assess the success of achieving the goals and objectives of the training program. In order to assist in evaluation study, you may look at the below case scenario for the group assignment:
A private educational firm approaches an instructional design training center to create a training package for undergraduate students on how to use a new video production software and evaluate how well it could have been implemented in the teaching and learning process across five different programs of a university over a period of one semester.
How would each of the stakeholders of the program evaluate the success of the training program (Keep in mind the three stages of evaluation while preparing your answer from each stakeholder’s viewpoint)?
Key Vocabulary Terms
formative evaluation – The process of ongoing evaluation throughout the design process for the betterment of design and procedure within each stage.
summative evaluation – The process of evaluation used to assess the effectiveness of learning, efficiency and cost effectiveness, and attitudes and reactions to learning.
confirmative evaluation – The process of evaluation employed to corroborate the validity and reliability of instructional design component or program.
validity – The confirmation that an evaluation distinctly measures the learning of a program.
reliability – The confirmation that an evaluation
consistently measures the learning if a program.
Table of Figures:
Figure 1: ADDIE Model of Design (Fav203, 2012)
Figure 2 Validity vs. Reliability Scatterplots.
Figure 3: The cycle of formative evaluation.
Figure 4 QM Review Cycle.
(n.d.). Kappan, 89(2), 140-145.
Alliger, G. M., & Janak, E. A. (1989). Kirpatrick’s levels of training criteria: thirty years later. Personnel Psychology.
Boston, C. (2002). The concept of formative assessment. ERIC Digest.
Dick, W., Carey, L., & Carey, J. O. (2009). The systematic design of instruction. Upper Saddle River, New Jersy: Pearson.
Dimitroy, D., Rumrill, J., & Phillip, D. (2003). Pretest-posttest designs and measurement of change. Work: A Journal of Prevention, Assessment and Rehabilitiation, 20(2), 159-165.
Duffy, T., & Cunningham, D. (1996). Constructivism: Implications for the design and delivery of instruction. Handbook of Research for Educational Communications and Technology, 170-198.
Fav203. (2012). ADDIE_Model_of_Design.jpg. Retrieved from http://commons.wikimedia.org/wiki/File:ADDIE_Model_of_Design.jpg#filelinks
Heritage, M. (2007). Formative assessment: What do teachers need to know and do? Phi Delta.
Hollingworth, L. (2012). Why leadership matters: Empowering teachers to implement formative assessment. Journal of Educational Administration, 50(3), 365-379. Retrieved from http://search.proquest.com/docview/1018479196?accountid=10920
(2013). Introduction to the Quality Matters Program. Quality Matters Program.
Krug, S. (2014). Don’t Make Me Think. New Riders.
May, W. (1986). Teaching students how to plan: The dominant model and alternatives. Journal of Teacher Education, 37(6), 6-12. Retrieved from http://search.proquest.com/docview/63247118?accountid=10920
Misanchuk, E. (19778). Beyond the formative-summative distinction. Journal of Instructional Development(2), 15-19.
Morrison, G. R., Ross, S. M., Kalman, H. K., & Kemp, J. E. (2013). Designing Effective Instruction (7th ed.). Hoboken, NJ: John Wiley & Sons.
Moseley, J., & Solomon, D. (1997). Confirmative evaluation: A new paradigm for continuous improvement. Performance Improvement, 36(5), 12-16.
Popham, W. (1993). Educational Evaluation (3rd ed.). Boston, MA: Allyn and Bacon.
Popham, W. (2008). Tranformative Assessment. Alexandria, VA: Association for Supervision and Curriculum Development.
Reiser, R. (2001). A history of instructional design and technology: Part II: A history of instructional design. Educational Technology Research and Development, 49(2), 57-67. doi:10.1007/BF02504928
Reiser, R. A. (2012). Trends and issues in instructional design and technology. Boston, MA: Pearson.
Richey, R. C., Klein, J. D., & Tracey, M. W. (2011). The Instructional Design Knowledge Base. New York and London: Taylor & Francis.
Shepard, L. (1993). Evaluating test validity. Review of Research in Education, 405-450.
Stufflebeam, D. (1971). The relevance of the CIPP evaluation model for educational accountability. Retrieved from http://search.proquest.com/docview/64252742?accountid=10920
Trochim, W. (n.d.). Reliability & Validity. Retrieved from http://www.socialresearchmethods.net/kb/relandval.php
Wood, B. (2001). Stake’s countenance model: Evaluating an environmental education professional development course. Journal of Environmental Education, 32(2), 18-27. Retrieved from http://search.proquest.com/docview/62371885?accountid=10920