The article I selected for this critique is Benefits of emotional design in multimedia instruction by Richard E. Mayer, and Gabriel Estrella. In this study Mayer and Estrella (2014) attempt to determine whether or not emotional design in multimedia instruction has an effect on student learning. The research lists two hypothesis.

Hypothesis 1. The major prediction in this study is that students who receive multimedia lessons with graphics based on emotional design principles (enhanced group) should perform better on a learning outcome test (i.e., a comprehension test in which they write explanations) than students who receive the same lesson without emotional design (control group).

Hypothesis 2. A secondary prediction is that the enhanced group will report more effort and less difficulty than the control group, although the use of subjective self-report items to measure these variables provides only a limited and preliminary test.

Participants and Design: The design of the student included two experiments. In experiment 1, students received an 8-slide multimedia lesson on how viruses cause a cold. They were time limited to 5 minutes for viewing the lesson. The study uses a multiple group design with an enhanced group and a control group. The enhanced group received instruction using a lesson that has been enhanced with graphics that are designed according to emotional design principles. The control group, was given the same set of slides, with the same time limit, however, their graphics were not enhanced to include emotional design principles. In experiment 2, students received the experiment, with the same slides, and the same instruments, but they were not restricted by time.

Instruments: The study uses three distinct instruments including a participant questionnaire, a post test, and a post questionnaire. The participant questionnaire is designed to gather basic demographic information as well as set a baseline prior knowledge score for each participant. The post test is designed to determine what the student learned from the multimedia lesson. The post questionnaire was designed to gather information about the students’ perceived experience in participating in the lesson. The article includes very specific detail about each instrument including instrument design, exact questions, and scoring methods.

Results: Hypothesis 1 (H1) addressed the question of whether adding emotional design features in a lesson could improve learning gains after the lesson. This study shows that the enhanced group performed significantly better than the control group on the learning test in Experiment 1 and Experiment 2, but they did not perform significantly better than the control group on the transfer part of the test. In Experiment 1 and in Experiment 2, the groups did not differ significantly on their mean rating for appeal of the lesson, or for their desire for more similar lessons.

Hypothesis 2 (H2) addressed the questions of whether emotional design had an effect on learners’ effort and difficulty in completing the lesson. The results of the study show that neither of the two groups in Experiment 1 or Experiment 2 showed significant differences in the way they rated their experiences of affect, effort or difficulty.

Describe the statistical analyses that were used to address the hypotheses or questions. To what extent did the researcher(s) provide sufficient rationale for the selection of specific statistics?

The Descriptive statistics provided include a tables of mean and standard deviation for retention, transfer and total scores for both the enhanced (treatment) group and the control group.

The inferential statistics provided include t-test, p-value and Cohen’s d for the retention, transfer and total scores. The t-test is an independent two sample test with 62 degrees of freedom in Experiment 1 and 45 degrees of freedom in Experiment 2. Cohen’s d is used to show the effect size for the difference in means. I do not see where Mayer & Estrella discuss their rationale for selection of statistics other than these are commonly used statistical result for an evaluation of difference of means between two independent groups.

The results for Hypothesis 1 (H1) show that the enhanced (treatment) group performed significantly better than the control group. The statistical analyses is as follows:

α=0.05, Power[1]Experiment 1Experiment 2
Total:t(62) = 2.67, p=.01, d=0.69Reject H0, Power=0.86t(45) = 2.20, p=.03, d=0.65Reject H0, Power=0.70
Retention:t(62) = 2.56, p=.01, d=0.69Reject H0, Power=0.86t(45) = 2.50, p=0.2, d=0.73Reject H0, Power=0.79
Transfer:t(62)=1.16, p=.25, d=0.29t(45) = 0.83, p=0.41

The results for Hypothesis 2 (H2) show that students in the treatment group did not rate appeal, or desire for similar lessons any higher than the control group (alpha=0.05). They did however indicate a higher level of effort, but did not differ significantly on difficulty ratings. The statistical analysis for experiment one is as follows:

α=0.05, Power[2]Experiment 1Experiment 2
Appeal of the lessonp=.57Control: (M=2.24, SD=0.89),   Treatment: (M=2.10, SD=0.99) p=.30
Control: M=2.71, SD = 1.08Treatment: M=2.43, SD=0.66
Enjoyment of the lessonP=.28
Control: M=2.71, SD=0.84Treatment: M=2.93, SD=0.83
Control: M=2.47, SD=0.88
Treatment: M=2.70, SD=0.92
Desire for more similar lessonsp=.91Control: (M=2.79, SD=0.88)   Treatment: (M=2.77, SD=0.97) p=.08
Control: M=2.31, SD=1.02
Treatment: M=2.70, SD=0.82
Higher rating of effortt(62) = 2.57, p=.01, d=0.65Control group: (M=2.76, SD = 1.17)   Treatment group: (M=3.47, SD=1.02) Reject H0, Power=0.82 t(45) = 0.31, p=.76Control: M=3.13, SD=0.85   Treatment: M=3.04, SD=0.98
Lower rating of difficultyp=.33, d=0.25Mean & Standard Deviation not reported.t(45) = 1.87, p=.07, d=0.55Control: M=2.83, SD=0.92   Treatment: M=2.3, SD=1.02

Do the researchers indicate a desired level of power for the study? Do the researchers discuss any measures taken to increase power or any threats that could result in a reduced level of power?

I do not see where they discussed the level of power. They do however report Cohen’s d for all t-tests were they are recommending to reject H0.

Is the basis for the sample size indicated? Is evidence given that prior knowledge or pilot studies (power studies) were used to ensure the needed sensitivity?

I don’t see where they discussed the basis for sample size. For experiment 1, the sample size is n=30 for the treatment group and n=34 for the control group. For experiment 2, the sample size is n=23 for the treatment group and n=24 for the control group.

Using G*Power I was able to compute a post hoc achieved power using α, sample size and effect size (Cohen’s d). See tables above for the computed achieved power valued.

There is reference to prior studies. This study is essentially a duplicate of a study conducted by Um et al. (2012) and Plass et al. (2014). However Mayer & Estrella do not include in their discussion of these groups any reference to sample size or power.

Is there any reason to suspect a threat due to the inflation of the Type 1 error rate? If so, please state the reason.

No. This is an experimental design with a sample rate of over 30. Both experiments produced similar results. Estrella report Cohen’s d for all of his statistics. In all instances where he rejects the null hypothesis, the Cohen’s d score is above 0.65 which indicates a moderate to high practical significance.


Mayer, R. E., & Estrella, G. (2014). Benefits of emotional design in multimedia instructions. Learning and Instruction, 12-18.

Plass, J. L., Heidig, S., Hayward, E. O., Homer, B. D., & Um, E. (2014). Emotional design in multimedia learning: effects of shape and color on affect and learning. Learning and Instruction, 29, 128-140.

Um, E. R., Plass, J. L., Hayward, E. O., & Homer, B. D. (2012). Emotional design in multimedia learning. Journal of educational psychology, 104(2), 485-498. doi: 10.1037/a0026609

[1] Power calculations were computed using G*Power and α=0.05, sample size, and the reported Cohen’s d. These calculations are not in the original article.

[2] Power calculations were computed using G*Power and α=0.05, sample size, and the reported Cohen’s d. These calculations are not in the original article.