The article I selected for this critique is Benefits of emotional design in multimedia instruction by Richard E. Mayer, and Gabriel Estrella. In this study Mayer and Estrella (2014) attempt to determine whether or not emotional design in multimedia instruction has an effect on student learning. The research lists two hypothesis.

Hypothesis 1. The major prediction in this study is that students who receive multimedia lessons with graphics based on emotional design principles (enhanced group) should perform better on a learning outcome test (i.e., a comprehension test in which they write explanations) than students who receive the same lesson without emotional design (control group).

Hypothesis 2. A secondary prediction is that the enhanced group will report more effort and less difficulty than the control group, although the use of subjective self-report items to measure these variables provides only a limited and preliminary test.

Participants and Design: The design of the student included two experiments. In experiment 1, students received an 8-slide multimedia lesson on how viruses cause a cold. They were time limited to 5 minutes for viewing the lesson. The study uses a multiple group design with an enhanced group and a control group. The enhanced group received instruction using a lesson that has been enhanced with graphics that are designed according to emotional design principles. The control group, was given the same set of slides, with the same time limit, however, their graphics were not enhanced to include emotional design principles. In experiment 2, students received the experiment, with the same slides, and the same instruments, but they were not restricted by time.

Participants for experiment 1 were recruited from a paid subject pool and paid $10 for their participation. The mean age was 19.5 years, the mean knowledge score based on the participant questionnaire was 3.9 out of 12, the average class standing was sophomore, and the percentage of women was 76%. Thirty students served in the enhanced group, and 34 served in the control group. Participants for experiment 2 were recruited from the Psychology Subject Pool and received class credit for their participation. The mean age was 18.6 years, the mean knowledge score based on the participant questionnaire was 4.9, the average class standing was freshman, and the percentage of women was 68%. Twenty-three students served in the enhanced group, and 24 served in the control group.

Instruments: The study uses three distinct instruments including a participant questionnaire, a post test, and a post questionnaire. The participant questionnaire is designed to gather basic demographic information as well as set a baseline prior knowledge score for each participant. The post test is designed to determine what the student learned from the multimedia lesson. The post questionnaire was designed to gather information about the students’ perceived experience in participating in the lesson. The article includes very specific detail about each instrument including instrument design, exact questions, and scoring methods.

Results: Hypothesis 1 addressed the question of whether adding emotional design features in a lesson could improve learning gains after the lesson. This study shows that the enhanced group performed significantly better than the control group on the learning test in Experiment 1 and Experiment 2, but they did not perform significantly better than the control group on the transfer part of the test. In Experiment 1 and in Experiment 2, the groups did not differ significantly on their mean rating for appeal of the lesson, or for their desire for more similar lessons.

Hypothesis 2 addressed the questions of whether emotional design had an effect on learners’ effort and difficulty in completing the lesson. The results of the study show that neither of the two groups in Experiment 1 or Experiment 2 showed significant differences in the way they rated their experiences of affect, effort or difficulty.

Reliability and Validity

Reliability addresses the quality of measurement and the consistency of results. Validity addresses the question of how well the test measures what you intended to measure. Reliability is a requirement, but it does not necessarily imply validity.

Have acceptable operational definitions been provided for all important terms in research hypotheses or questions? Tell why or why not.

There are three terms that are defined in the beginning of the document. These are emotional design, personification, and visual appeal. Each of these terms are adequately defined so that the reader will understand how they are used in the description of the research and results. The most important of these is emotional design since this is the main focus of the study. It is important to know how they define emotional design in order to even understand whether the study adequately addresses emotional design as they have defined it. I do not see any other terms in the hypotheses that would need further definition.

Have the formats of the questions and responses of all instruments been adequately described?

There are three instruments used in this research experiment. These are paper-based materials consisting of a participant questionnaire, 6 test sheets, and a post-questionnaire, each printed on 8.5 x 11 inch sheets of paper.

The participant questionnaire solicited information concerning the participant’s age, gender and year in school. In addition, students were asked to rate their knowledge, using a 5-point scale, on a list of 19 knowledge indicators. A knowledge score was computed by tallying the number or responses and ratings.

The post-test consisted of six-sheets containing one retention essay question, and five transfer essay items. The sheets were rated based on 19 idea units. The total test score was computed by adding the retention and transfer sheet scores. The posttest questionnaire contained five items students rated on a 5-point scale.

Mayer and Estrella were quite specific in their descriptions of each of the instruments. Their descriptions included the format of the instruments and the exact wording of all questions, rating scales and scoring methods. It would be very easy to recreate this study from the descriptions in the article.

Have the appropriate data for essential psychometric characteristic, such as validity and reliability been cited?

In asking the question of whether or not the groups are equivalent on basic characteristics, Mayer and Estrella cited that they used appropriate statistical tests (t-test, or chi-square with p<.5) however they did not include these measures in the article. In looking at inter-rater reliability, they reported reliability scores of 0.92 and 0.82 and described the process by which they resolved differences (consensus).

With the participant questionnaire, Mayer and Estrella discussed the knowledge questionnaire was similar to questionnaires used in previous research as a way to gage the learner’s level of prior knowledge without asking specific questions about the content of the lessons. They also addressed their rationale for not including a pre-test so as to prevent a testing effect.

Meyer and Estrella do question the reliability of the post questionnaire on its ability to accurately measure motivation, effort and difficulty. They based their questioning on whether or not students could effectively self-rate this aspects. They recommended that this could be an area of additional study.


Kline, R. B. (2009). Becoming a Behavioral Science Researcher. New York, NY: Guilford.

Mayer, R. E., & Estrella, G. (2014). Benefits of emotional design in multimedia instructions. Learning and Instruction, 12-18.

Trochim, W. M. (2009). Research Methods Knowledge Base. Retrieved September 15, 2014, from