The Importance of Replications in Strategic Message Design

Part of my research program focuses on strategic health message design to promote both persuasion and interpersonal information sharing. Strategic message design is viewed as an important element in addressing many health challenges. For example, effectively designed persuasive messages can promote regular exercise, fruit and vegetable consumption, sunscreen use, regular sleeping habits, safer sex practices, helmet use, and so forth. Likewise, specific message features can be included in health messages so that message recipients are more likely to share that health information with others, including their family members and friends. Health message designers should not, however, have to “grope around hoping to stumble across some way of making their messages more persuasive” (O’Keefe, 2015, p. 106). Instead, their choices for designing a health message that is both persuasive and diffusive should be informed by communication theory that is supported with empirical evidence.

Theoretical frameworks are useful in guiding health message design. For example, the Extended Parallel Process Model (Witte, 1992) posits that arousing fear in a message recipient is a strategic message choice that promotes health behavior adoption, but only when also accompanied by content that is designed to increase perceptions of response-efficacy (i.e., beliefs that the recommended action is effective at averting the threat) and self-efficacy (i.e., beliefs about whether the individual feels able to perform the recommended behavior.) Additional frameworks can also be used to guide strategic health message design, including the Theory of Planned Behavior (Ajzen, 1988, 1991, 2011a, 2011b), the Integrative Model (Fishbein, 2000), Social Cognitive Theory (Bandura, 1986, 1995, 1997), and so forth.

These theoretical frameworks, however, are only likely to be useful if they are informed by empirical evidence about the persuasiveness of alternative message variations (O’Keefe, 2015). To continue with the above example, the Extended Parallel Process Model (Witte, 1992) posits that strong fear appeals include both strong threat and efficacy components; weak fear appeals do not include both of these elements (e.g., a weak fear appeal may include a strong threat component but no efficacy component or vice versa). If strong fear appeals are generally more persuasive (i.e., they promote positive health behavior changes) than weak fear appeals empirically, then health message designers can guide their choices based on this theory and its supporting evidence. This parallels evidence-based medicine, such that if drug A is generally more effective than drug B in treating a particular condition, medical professionals can guide treatment choices accordingly.

O’Keefe (2015) recently published an essay that articulates the evidentiary requirements for generalizations that support evidence-based persuasive message design. The first proposition offered by O’Keefe (2015) is that evidence should take the form of replicated randomized trials in which message features are varied. Little debate exists over the importance for randomized trials and variation in message features. Indeed, randomized trials are a reliable method for uncovering causal relationships, and message design research is oftentimes focused on experimentally varying specific message features to examine the subsequent effects on persuasive outcomes. A key idea in this proposition, therefore, lies in the importance of replication.

Replication is critically important in social scientific research. There is a long tradition of promoting replication broadly in the social sciences (see, e.g., Asendorpf et al., 2013; Campbell & Jackson, 1979; Collins, 1985; Kline, 2009; Koole & Larkens, 2012; Rosenthal, 1976, 1991; Schmidt, 2009), as well as specifically within communication (see, e.g., Benoit & Holbert, 2008; Boster, 2002; Kelly, Chase, & Tucker, 1979), and, even more specifically, within message design (see, e.g., Jackson & Jacobs, 1983; O’Keefe, 2015). Replications are important because they allow more definitive statements about the nature of the relationships studied in social sciences by enhancing the certainty (i.e., reliability) of the results. However, replication studies are not oftentimes published in communication, which undermines progress (Boster, 2002).

Many different scholars have offered typologies of replications; I focus here on two in particular: Asendorf et al. (2013) and Schmidt (2009). Asendorpf et al. (2013) distinguish between data reproducibility, replicability, and generalizability. Reproducibility of results refers to the ability of researcher B to obtain the same results that researcher A reports in the original study, using the same raw data, codebook, and analyses. Reproducibility is a necessary, but not sufficient, requirement for replicability. In other words, an outside researcher must be able to obtain the same results with the data from an original study, but in itself that does not mean that the results have been replicated.

Replicability is the idea that the same finding can be found in other random samples drawn from a multidimensional space that contains important facets of the original research design. Indeed, Asendorpf et al. (2013) note that “replication is obtained if differences between the finding in the original study A and analogous findings in replication studies B are insubstantial and due to unsystematic error, particularly sampling error, but not to systematic error, particularly differences in the facets of the design” (p. 109). The underlying idea is that studies do not only sample participants, but also sample situations, operationalizations, and time points that can also be affected by sampling error. Replicability is therefore a necessary, but not sufficient, requirement for generalizability.

A research finding is generalizable if it does not depend on an originally unmeasured variable that has a systematic effect (e.g., a moderator). For example, undergraduate student samples in the social sciences often contain a high proportion of young adults, leaving it unclear the extent to which the results obtained with these samples are generalizable to a population with varying ages (i.e., if age is a moderating variable). If a research finding is shown across participants, situations, operationalizations, time points, and so forth (i.e., beyond a potential moderating effect), then it is generalizable.

Schmidt (2009) also classifies replications by differentiating between two types, direct replications and conceptual replications, which are analogous to Asendorpf et al.’s (2013) concepts of replicability and generalizability, respectively. A direct replication occurs when a new research study repeats all of the relevant aspects of an original study. In other words, a direct replication occurs when everything in the original study is exactly the same, except that a new, independent sample of the same size is taken from the same population (Cumming, 2008). A conceptual replication tests the hypotheses of an original study using a different study design (Schmidt, 2009). Direct replications therefore align with the conceptual goal of replicability offered above, whereas conceptual replications align more closely with a goal of generalizability.

Direct replications are important for several reasons. First, the chance that a given original empirical finding will occur again in a direct, independent, and explicitly replicated subsequent study is inherently unpredictable (Miller, 2009; Miller & Schwarz, 2011). For example, a given research is prone to idiosyncrasies in both procedural details and the means through which the experimenter interacts with participants, suggesting that different researchers might obtain different results due to some idiosyncratic detail associated with the original study (Koole & Larkens, 2012). Direct replications can therefore confirm or disconfirm an original set of findings (LeBel & Peters, 2011), whereas conceptual replications cannot. In this manner, direct replications “provide the strongest tests of the robustness of an original finding” (Koole & Larkens, 2012, p. 609). Rotello, Heit, and Dubé (2015), however, note the importance of recognizing incompatibilities among dependent measures, analysis tools, and the properties of data that can lead to fundamental interpretive errors in direct and conceptual replication attempts.

In message design research, an observed message effect suggesting that a particular variation in message features is more persuasive than another should be directly replicated to establish a robust, reliable, and empirically supported message effect. Additionally, message design researchers oftentimes explore potential moderators of theoretical message effects. However, as O’Keefe and Nan (2012) note, the only convincing evidence that a given variable moderates a message effect, in both general instances or under a specified condition, involves replicated moderating effects of that variable. For example, in the gain-loss framing literature, several potential moderators have been proposed, including: the effectiveness of the recommended action (Bartels et al., 2010), the amount of effort required to perform the recommended action (Gerend et al., 2008), and the ease of imagining disease symptoms (Broemer, 2004). However, each of these moderators has only a single supporting study (O’Keefe & Nan, 2012). Direct replications of these original studies would provide evidence that the original finding of a moderating variable is robust and directly replicable.

Direct replications, while important, are not enough to establish generalizability of message variation effects or moderating effects of message design principles. Even if a message effect or moderating effect is seen in one study and directly replicated, there is no guarantee that it will replicate with other messages, other situations, and so forth. Instead, conceptual replications, which refer to replications of an original study using a unique study design (Schmidt, 2009), promote generalizability. Indeed, “a study that keeps some features of the original and varies others can give a converging perspective, ideally both increasing confidence in the original finding and starting to explore variables that influence it” (Cumming, 2013, p. 4). I talk here about one type of conceptual replication of particular interest to strategic message design: message replications.

A single-message design does not enable generalizability to a category of messages, as there will always be some uncontrolled (i.e., unsystematic) variation among different examples of what may appear to be a single message category (Jackson & Jacobs, 1983). For example, consider the range of messages about outdoor exercise as a way to prevent and/or reverse overweight and obesity. There are some known, systematic sources of variation, including types (e.g., academic article, newspaper article, non-profit advocacy message, nutrition brochure) and function (e.g., inform, educate, persuade), but beyond systematic types, functions, and so forth exists variation that is unattributable to known systematic sources.

Some researchers in message design attempt to control all aspects of a message except for the variable of interest to account for this variation. For example, studies that examine Language Expectancy Theory (LET; Burgoon, 1990; Burgoon, Denning, & Roberts, 2002; Burgoon, Jones, & Stewart, 1975; Burgoon & Miller, 1985) oftentimes use single message and the only variation among messages is replacing specific words with those deemed intense (in the intense conditions) and neutral (in the neutral conditions; see, e.g., Burgoon, Jones, & Stewart, 1975; Burgoon & King, 1974). However, as Jackson and Jacobs (1983) note, “What is needed is not control, but simultaneous assessment of many cases of each level of [a message feature], so that the comparison between two levels will reflect more than the difference between two individual – and in themselves, uninteresting – examples” (p. 173). Message replications are therefore a key part of conceptual replications to determine if a given message effect or moderating effect holds true across different cases (i.e., messages). Indeed, experimental examination of multiple members of a category (i.e., message replications) is necessary to generalize about the category of messages (Jackson & Jacobs, 1983; O’Keefe, 2015). Scholars have more recently started to include message replications within primary research, which promotes generalizability of the results beyond the specific message used (see, e.g. within LET, Worthington, Nussbaum, & Parrott, 2015).

In addition to message replications, conceptual replications in message design research should include conceptual replications of both message variation effects and moderating effects across different participant characteristics, cultures (see Dutta, 2007), general physical setting, control agent (i.e., the experimenter), specific task variables, and so forth (Schmidt, 2009). In this manner, conceptual replications serve as “proof that the experiment reflects knowledge that can be separated from the specific circumstance (such as time, place, or persons) under which it was gained” (Schmidt, 2009, p. 90).

The importance of replicated findings is magnified in studies with small sample sizes (O’Keefe & Nan, 2012). With a small sample, a statistically significant result requires a relatively large effect size, and, given the bias of publication in favor of statistically significant results, sample size and effect size are generally negatively correlated in the published research literature (Levine, Asada, & Carpenter, 2009). This suggests that a statistically significant result that is obtained in a study with a small sample size may exaggerate the true effect (O’Keefe & Nan, 2012). Indeed, Iaoanndis (2008) notes, “When true discovery is claimed based on crossing a threshold of statistical significance and the discovery study is underpowered, the observed effects are expected to be inflated” (p. 640).

How to Encourage Replications

In light of the importance of replication across social scientific fields, recent discussion has turned to why social scientists disregard replications, both direct and conceptual, and, more importantly, how best to encourage replications (e.g., Asendorpf et al., 2013; Boster, 2002; Koole & Larkens 2012). First, social scientists may disregard replications because they are not incentivized to do so. For example, Koole and Larkens (2012) note that in psychology:

The current incentive structure… gives rise to a social dilemma in which researchers’ collective interests run in the opposite direction as the interests of individual researchers. The research community collectively benefits if researchers take the trouble to replicate each other’s work, because this improves the quality and reputation of the field by showing which observations are reliable. Yet, individually, researchers are better off by conducting only original research, because this will typically yield more publications and citations and thus ultimately greater rewards in terms of better jobs and more grant money. (p. 610)

Researchers in communication science are similarly incentivized, thus limiting individual message design researchers’ individual interests in replications, even though collectively, message design research as a whole would benefit from such studies. Indeed, it is very difficult to publish replication studies in the social sciences (although the situation is different in the natural sciences; Boster, 2002; Schmidt, 2009).

Several recommendations have been made to address the concern that replications are dis-incentivized, including: one, replications should be published in high-impact journals in special sections or online editions; two, replications should be cited along with the original research through a co-citation process; and three, replications should be used as a teaching device in graduate seminars in experimental methods (Koole & Larkens, 2012; for an extended discussion of these recommended changes to the incentive structure, see the original article).

Second, social scientists may disregard publications due to well-known processes that distort the research results that appear in published form in favor of statistically significant results. Replications that confirm or fail to confirm an original set of findings are important from a scientific perspective; however, the general tendency across social scientific disciplines is to publish only confirmatory results (Fanelli, 2012; Levine et al., 2009; Sterling, Rosenbaum, & Weinkam, 1995). Asendorpf and colleagues (2013) note that in order to encourage replication studies, reviewers and editors should publish well-designed studies, regardless of whether or not they include null findings or results that run counter to the prespecified hypotheses.

An additional pragmatic concern relates to researchers’ inability to reproduce original findings and replicate an original study based on a lack of available methodological information. To determine if results are reproducible, a researcher must have access to the raw data, the codebook, including variable names and labels, value labels, and codes for missing data, and knowledge about the analyses that were performed (e.g., a syntax file of a statistics program; Asendorpf et al., 2013). Direct and conceptual replication requires, at minimum, access to the codebook and knowledge about the analyses. In order to increase replications, additional information about the data and methods used in a study should therefore be made readily available on the Internet (Asendorpf et al., 2013; Evanschitzky et al., 2007). This can be in the form of an open repository (i.e., an open-access online data bank; see www.opendoar.org; Asendorpf et al., 2013), and individual researchers can also make such information available on their personal academic websites (see, e.g., here). As an additional benefit, some research suggests that papers that offered data in any form were cited twice as often as comparable papers with such availability of data, when controlling for type of article, co-authorship, age of paper, length of the paper, and characteristics of the author (Gleditsch, Petty, & Strand, 2003).

In sum, “any assertion that cannot be demonstrated in a replication is not regarded as a scientific statement” (Schmidt, 2009, p. 92). Indeed, multiple replications are necessary, because a single confirmatory finding (or one that fails to confirm) provides little certainty that a true effect exists or not (Cumming, 2008). Our aim in strategic health message design is to establish empirically robust theoretical frameworks that are practically useful in effectively designing persuasive health messages (and, more recently, messages that are diffusive). To achieve this goal, we must make it a priority to replicate our findings, both directly and conceptually. Indeed, Kmetz (2002) wrote that:

After publishing nearly 50 years worth of work…the three terms most commonly seen in the literature are “tentative,” “preliminary,” and “suggest.” As a default, “more research is needed.” After all these years and studies, we see nothing of “closure,” “preponderance of evidence,” “replicable,” or “definitive.” (p. 62)

Instead of falling into the same sequence of events with message design research, let us instead work towards a cumulative knowledge base that provides “definitive” conclusions from replicated research.

References

Ajzen, I. (1988). Attitudes, personality and behaviour. Milton Keynes, UK: Open University Press.

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior & Human Decision Processes, 50, 179-211.

Ajzen, I. (2011a). The theory of planned behavior. In P. A. M. van Lange, A. W. Kruglanski, & E. T. Higgins (Eds.), Handbook of theories of social psychology. London, UK: Sage.

Ajzen, I. (2011b). The theory of planned behavior: Reactions and reflections. Psychology and Health, 26(9), 1113-1127.

Asendorpf, J. B., Conner, M., De Fruyet, F., De Houwer, J., Denissen, J. J. A., Fiedler, K.,…Wicherts, J. M. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108-119.

Bandura, A. (1986). Social foundations of though and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice Hall.

Bandura, A. (1995). Exercise of personal and collective efficacy in changing societies. In A. Bandura (Ed.), Self-efficacy in changing societies (pp. 1-45). New York, NY: Cambridge University Press.

Bandura, A. (1997). Self-efficacy: The exercise of control. New York. NY: W. H. Freeman.

Bartels, R. D., Kelly, K. M., Rothman, A. J. (2010). Moving beyond the function of the health behavior: The effect of the message frame on behavioural decision-making. Psychology and Health, 25, 821-838.

Benoit, W. L., & Holbert, R. L. (2008). Empirical intersections in communication research: Replication, multiple quantitative methods, and bridging the quantitative-qualitative divide. Journal of Communication, 58, 615-628.

Boster, F. J. (2002). On making process in communication science. Human Communication Research, 28(4), 473-490.

Broemer, P. (2004). Ease of imagination moderations reactions to differently framed health messages. European Journal of Social Psychology, 34, 103-119.

Burgoon, M. (1990). Social influence. In H. Giles & P. Robinson (Eds.), Handbook of language and social psychology (pp. 51-72). London: Wiley.

Burgoon, M., Denning, V. P., & Roberts, L. (2002). Language expectancy theory. In J. P. Dillard & M. Pfau (Eds.), The persuasion handbook: Developments in theory and practice (pp. 117-136). London: Sage.

Burgoon, M., Jones, S. B., & Stewart, D. (1975). Toward a message-centered theory of persuasion: Three empirical investigations of language intensity. Human Communication Research, 1(3), 240-256.

Burgoon, M., & Miller, G. R. (1985). An expectancy interpretation of language and persuasion. In H. Giles & R. St. Clair (Eds.), Recent advances in language, communication and social psychology (pp. 199-229). London: Lawrence Erlbaum.

Burgoon, M., & King, L. B. (1974). The mediation of resistance to persuasion strategies by language variables and active-passive participation. Human Communication Research, 1, 30-41.

Campbell, K. E., & Jackson, T. T. (1979). The role and need for replication research in social psychology. Replications in Social Psychology, 1, 3-14.

Collins, H. M. (1985). Changing order: Replication and induction in scientific practice. Beverly Hills, CA: Sage.

Cumming, G. (2008). Replication and p interval: P values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, 286-300.

Cumming, G. (2013). The new statistics: Why and how. Psychological Science, 25, 7-29. Dutta, M. J. (2007). Communicating about culture and health: Theorizing culture-centered and cultural sensitivity approaches. Communication Theory, 17(3), 304-328.

Evanschitzky, S., Baumgarth, C., Hubbard, R., & Armstrong, J. S. (2007). Replication research’s disturbing trend. Journal of Business Research, 60, 411-415.

Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891-904.

Fishbein, M. (2000). The role of theory in HIV prevention. AIDS Care, 12(3), 273-278. Gerend, M. A., Shepherd, J. E., & Monday, K. A. (2008). Behavioral frequency moderates the effects of message framing on HPV vaccine acceptability. Annals of Behavioral Medicine, 35, 221-229.

Gleditsch, N., Petter, M., & Strand, H. C. (2003). Posting your data: Will you be scooped or will you be famous? International Studies Perspectives, 4, 89-97.

Ioannidis, J. P. A. (2008). Why most discovered true associations are inflated. Epidemiology, 19, 640-648.

Jackson, S., & Jacobs, S. (1983). Generalizing about messages: Suggestions for design and analysis of experiments. Human Communication Research, 9, 169-181.

Kelly, C. W., Chase, L. J., & Tucker, R. K. (1979). Replication in experimental communication research: An analysis. Human Communication Research, 5(4), 338-342.

Kline, R. B. (2009). Becoming a behavioral science researcher: A guide to producing research that matters. New York, NY: The Guilford Press.

Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure and simply way to improve psychological science. Perspectives on Psychological Science, 7, 608-614.

Kmetz, J. L. (2002). The skeptic’s handbook: Consumer guidelines and a critical assessment of business and management research (rev. ed.). Retrieved April 16, 2015, from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=334180

LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of defiance in modal research practice. Review of General Psychology, 15, 371-379.

Levine, T., Asada, K. J., & Carpenter, C. (2009). Sample sizes and effect sizes are negatively correlated in meta-analyses: Evidence and implications of a publication bias against non-significant findings. Communication Monographs, 76, 286-302.

Miller, J. (2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16, 617-640.

Miller, J., & Schwarz, W. (2011). Aggregate and individual replication probability within an explicit model of research process. Psychological Methods, 16, 337-360.

O’Keefe, D. J. (2015). Message generalizations that support evidence-based persuasive message design: Specifying the evidentiary requirements. Health Communication, 30, 106-113.

O’Keefe, D. J., & Nan, X. (2012). The relative persuasiveness of gain- and loss-framed messages for promoting vaccination: A meta-analytic review. Health Communication, 27, 776-783.

Rosenthal, R. (1976). Experimenter effects in behavioral research. New York, NY: Irvington.

Rosenthal, R. (1991). Replication in behavioral research. In J. W. Neulip (Ed.), Replication in the social sciences (pp. 1-30). Newbury Park, CA: Sage.

Rotello, C. M., Heit, E., & Dubé, C. (2015). When more data steer us wrong: Replications with the wrong dependent measure perpetuate erroneous conclusions. Psychonomic Bulletin & Review, 22(4), 944-954.

Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90-100.

Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The effect of the outcome of statistical significance on the decision to publish and vice versa. American Statistician, 49, 108-112.

Witte, K. (1992). Putting the fear back into fear appeals: The extended parallel process model. Communication Monographs, 59, 329-349.

Worthington, A. K., NussBaum, J. N., & Parrott, R. P. (in press). Organizational credibility: The role of issue involvement, value-relevant involvement, elaboration, author credibility, message quality, and message effectiveness in persuasive messages from organizations. Communication Research Reports.