You must be signed in to read the rest of this article.
Registration on CDEWorld is free. Sign up today!
Forgot your password? Click Here!
To provide evidence-based therapy, dental clinicians need to stay abreast of current information that can impact their therapeutic decisions. Articles published in journals provide the most common source of recent data. Any scientific study, however, must be critically appraised and therapists need to decide which facts are important. For instance, when outcomes of clinical trials are published, practitioners are often left to ponder the clinical relevance of statistically significant results. Therefore, it is imperative that practitioners determine whether or not treatments provide clinically meaningful benefits with respect to efficacy, time, and safety. This article addresses what content should be included in each section of a clinical trial and discusses how data can be evaluated so its merits and/or shortcomings are recognized, correctly interpreted, and appropriately applied to patients.
Hierarchy of Evidence
Dental research articles provide different levels of evidence based on their design, validity, and applicability to patient care (Table 1).1 Data from a systematic review of a topic, which contains a meta-analysis of all relevant randomized clinical trials (RCTs) or three or more RCTs of good quality with similar results, furnishes the highest level of proof upon which to base therapy (level I) (Table 1).1 For an individual study, a carefully planned RCT supplies a high level of evidence (level II), and a well-planned controlled trial minus randomization is considered level III (Table 1). Topics covered in this article pertain to levels of evidence presented in Table 1.
A research article consists of four main sections: introduction, methods and materials, results, and discussion. Also, an abstract appears below the article title, and the author(s) may choose to include a conclusion section. The introduction explains why the research was performed, and the methods and materials section delineates steps taken to perform the study. Generated data are listed in the results segment, while the discussion section explains the meaning of outcomes and potential limitations of the presented data. Understanding the value and deficiencies of each article subdivision helps the reader decide if the article has clinical utility.
The introduction of an article should explain the reason for doing the study. With respect to the topic of interest, it ought to provide a chronology of previous evaluations and clarify why additional information is needed. Typically, primary and secondary study objectives are presented in the last sentence of the introduction.
Methods and Materials Segment
This section discusses how the clinical trial was performed and describes the process of patient enrollment. Studies need to clearly define the study population, specifying inclusion and exclusion criteria used to recruit individuals and the methodology employed to enroll subjects. Inadequate definition of eligible patients can result in data that may not be applicable to a general population. For instance, a study that excludes patients who smoke or are diabetic may not demonstrate relevant results for these specific populations.
Treatment methods, if applicable, should be defined in detail and include how and when outcome evaluations were recorded for experimental groups. Where measurements were made (eg, line angle of tooth) needs to be denoted so that assessments can be repeated.
This section should include a description of clinical measurements and radiographic errors that can occur. If study measurements were performed by more than one individual, calibration of examiners and inter-examiner reliability should be executed to minimize variations caused by different evaluators. If the study is a long-term trial, an appraisal of intra-examiner reliability should be done to ensure assessments are standardized over time. The protocol after therapy should be addressed.
Errors in the design and study methods can cause unintended bias and create inaccurate results that are difficult to interpret. Subjective evaluations are impossible to quantify and cannot be statistically assessed. Clinical trials provide a high level of evidence, and appropriate test and control groups must be identified. For RCTs, randomization of therapies is done. This is a process where participants are assigned by chance to separate groups that are given different treatments. Neither the researcher nor the participant should be able to choose which therapy patients receive. In addition, blinding of examiners is necessary to avoid evaluator bias. The methods and materials section is critically important, because if the reader does not agree with the premises of a study, he or she should not accept its conclusions.
It should also be noted that, in accordance with US Food and Drug Administration regulations, research involving human subjects requires Institutional Review Board approval, and this consent should be stated in the methods and materials section.
Statistical procedures used to evaluate study results are typically presented in the methods and materials section. In a controlled clinical trial, these statistical tests are used to determine if a difference exists between experimental groups. When conducting a study there is a research and a null hypothesis.2-4The research hypothesis considers if a treatment provides a clinically meaningful improvement, whereas the null hypothesis is a statistical assessment and assumes there is no difference between a test group and a control group. Therefore, when a study is done and there is a statistically significant difference between experimental groups, the null hypothesis is rejected.
Numerous statistical techniques are available to assess data from clinical trials; however, such a discussion is beyond the scope of this article.4 Nonetheless, several frequently used tests are mentioned. A student's t-test is often, but not solely, employed to determine if there is a difference between two experimental groups.4 When comparisons are done among more than two groups or time periods, an analysis of variance (ANOVA) test is conducted to determine if any variations exist among the groups, and then post-hoc tests are employed to assess if dissimilarity occurs between two individual groups within the analysis.5 In some articles the terms "parametric" and "non-parametric" test are used. Parametric assessments (eg, t-test, ANOVA) assume that the population data are normally distributed, whereas non-parametric tests (eg, Mann-Whitney, Kruskal-Wallis) do not.6 When the data are presented as frequencies of events, such as implant failures, a chi-squared analysis or survival analysis can be performed.7 A chi-square test delineates whether two variables are independent of one another, while a survival analysis is utilized to evaluate the expected length of time until one event occurs, such as death in biological organisms or failure in mechanical systems.7 Ultimately, the type of statistical test employed to determine if a difference exists between groups is based on the kind of variable being studied, data distribution, and how many groups are being studied.
In statistical testing, it is desirable to minimize the probability of two types of error that can occur: type I - false positive, and type II - false negative; every clinical trial should state its acceptable boundaries for these mistakes.3,4 A type I error occurs when the null hypothesis is rejected, but it is true. In other words, in a type I error, a difference is found between the experimental groups, but none really exists. The largest amount of inaccuracy that is usually tolerated is denoted as the significance level (also called the probability value, P value, level of statistical significance, alpha level). Usually, this level is set at <.05 (5%) and represents the error that is acceptable when different therapies are compared, assuming fundamental assumptions in the study are true.3,4 In the above example, a "statistically significant" result implies that the probability of a type I error occurring is small (<.05). In other words, if the experiment were to be done 100 times, the same result would occur 95 times while the other five times would yield a different outcome.
It is interesting to note that P < .05 (5%) significance level, which is typically used in scientific studies, was arbitrarily adopted.8 It initially was recommended by Fisher, who noted that measures of 1.96 standard deviations on either side of the mean of a Gaussian curve included 95% of the data.8 He concluded that the outermost 5% of the data were unusual and significant in their divergence. Subsequently, this significance level was adopted because it was expedient, and it became ingrained as scientific dogma. Fisher later stated that this fixed level of significance was "absurdly academic" and that it should be flexible based on the evidence.8 For instance, if a large clinical effect is found that has a P value of .06 or .07 (not statistically significant because it is >.05), it may be imprudent to disregard a possibly clinically significant finding. Note that while, as indicated, the threshold value of P < .05 is arbitrarily and commonly used in studies, the significance level is sometimes set to equal values such as .10, .05, and .01. Selecting an alternate threshold can make the "significance test" more stringent by setting it at .01 (1%) or less stringent by selecting .10 (10%). A P value of 1% (.01) usually requires a large sample size to show statistical significance.
It also should be underscored that the finding of no statistically significant difference between two therapies does not "prove" the therapies are equivalent.9 To be able to reach that conclusion, specific types of statistical tests termed "equivalency tests" must be used; these tests have different null hypotheses, criteria for a power analysis, and statistical techniques.9 Failure by clinicians to discriminate between clinical trials that assess statistical differences or equivalences may support inappropriate therapy in clinical practice.9
Clinicians need to recognize that a statistically significant result does not imply the result is clinically significant.10 Studies often focus on reporting a statistically significant difference between a test and control group instead of discussing if there is a clinically significant dissimilarity between the groups. The finding of a statistically significant difference only means that the likelihood that the variance occurred by chance is small.3 It does not denote the difference is large or important.3,10 On the other hand, clinical significance is determined by clinicians looking at the magnitude of the problem being treated and the expected outcome. An important query with respect to clinical significance is, how minor an improvement is clinically important? There is no exact answer to this question in biomedical science regarding outcome variables, because the answer will be tailored for each situation. A clinically meaningful result is one that alters the way a patient is treated. Clinical significance may be interpreted differently depending on one's discipline: researcher, clinician, patient, industry, or regulatory agency.10It also should be noted that authors of studies may manipulate conclusions. For example, it may be reported that a treatment outcome was decreased by 50%. Yet, if the failure rate was two out of 1,000 and treatment reduced failure to one out of 1,000, then the author's statement likely lacks clinical significance despite being mathematically correct.
Ideally, a clinically significant result should also be a statistically significant finding, meaning that the difference has a low probability of occurring by chance.10,11 Some researchers have contended, however, that this concept may not be true in all situations when other factors are considered, such as length of therapy, safety, cost, and the fact that interpreting clinical significance is a subjective evaluation, not a mathematical one.12 Also, others have suggested that clinical significance can be found despite the absence of statistical significance.13,14 For example, if confidence intervals pertaining to primary or secondary variables manifest values that mirror effect sizes that are meaningful to patients, then clinical significance of the results could be considered.13
The second type of error, type II (false negative error), occurs when the null hypothesis is not rejected, but an actual difference in results exists.3,4 The most common cause of type II error is lack of an adequate amount of subjects within the study to ascertain whether a difference exists.15 To avoid this dilemma, a power analysis should be done before initiating a study to determine how many patients need to be enrolled to limit the probability of a type II error.15 There is no specific dogmatic convention as to what is an acceptable type II error for a study. Many articles use a 20% probability for type II error when performing a power analysis.15,16 In other words, when the power analysis is performed a statistical power is often set at 80% (the probability that the test will correctly lead to the rejection of a null hypothesis when it is false, or the probability of a type II error not occurring). A power analysis consists of several steps15,16: The desired size of an outcome variable is selected plus its standard deviation. Then a significance level (P value, eg, .05) and the statistical power (eg, 0.8) are picked. The final step can be achieved by looking at an online sample size/power calculator, which will indicate the number of individuals that need to be enrolled for the selected outcome, statistical power, and P value to demonstrate statistically significant difference, if it exists.15
Study results can be delineated in the text, with or without graphs and/or tables. Statistically significant differences with respect to assessed variables between experimental groups should be easily identifiable.
Deficiencies in reporting results can be misleading and can occur for various reasons. If a study has only a few subjects, it is called a pilot or proof of principal study.17 These types of assessments are preliminary evaluations to determine if a treatment method has validity without committing to a large or expensive clinical trial. If the pilot study demonstrates effective therapy, then a larger study needs to be performed and the study size should be defined by a power analysis before treatment data are extrapolated to patients.
Readers need to assess if there is consistency among the text, tables, and grafts in the results section. Researchers need to account for all patients that started the study. A high dropout rate in the test or control groups can invalidate results. A general rule is that <5% attrition leads to little bias, while >20% attrition poses serious threats to study validity.18 Inappropriate statistical methods also can prejudice a clinical trial. For instance, when assessing results, any questions with respect to the unit for statistical analysis (eg, tooth sites versus patients) need to be addressed. In general, the patient is the preferred unit of analysis, because sites do not account for the biology of patients.19 For example, a patient who is a poor healer and loses multiple dental implants in a study can skew results. If sites are used as the unit of analysis, statistical techniques (eg, generalized estimating equation) can be utilized to adjust for multiple treatment sites being included for an individual.20 Another technique that may be used if sites are employed is to increase the sample size to dilute the effects of individual patients.Table 2 describes additional issues of concern with respect to interpreting data presented in scientific articles.3,10,21-30 Euphemisms for statistical non-significance that have been used in the dental literature are presented in Table 3.21
In the discussion section, authors should interpret their findings and address the data's statistical significance and clinical relevance. To understand the scope of new information, researchers should compare their findings to earlier works and discuss similarities and differences. Appropriate conclusions should be drawn. Authors can editorialize in the discussion section; therefore, readers need to differentiate fact from opinion. The reader must decide if deductions in the discussion section reflect author bias and assess whether the data support the article's conclusions. No new data should be introduced in the discussion section that was not delineated in the results portion. If appropriate, authors should suggest next steps for future research. Ultimately, the reader needs to decide if the results are meaningful and whether they could affect patient treatment.
This article underscores the need for clinicians to carefully analyze published clinical trials. Interpretation of results should focus on the clinical importance of findings, not statistical assessments that indicate differences did not usually occur by chance. Ultimately, only therapies that provide clinically significant improvements should be applied to patients. The clinician plays a critical role in defining what is a clinically meaningful result and should relate various monitored clinical parameters to the goals of therapy. The clinician also might consider the size of the effects, time needed for therapy, ease of implementation, cost, side effects, duration of results, and consumer acceptability.
About the Authors
Gary Greenstein, DDS, MS
Former Clinical Professor, Department of Periodontology, College of Dental Medicine, Columbia University, New York, New York; Private Practice, Freehold, New Jersey
John Grbic, DMD, MS, MMSc
Professor of Dental Medicine, Senior Associate Dean for Faculty Affairs, and Director, Division of Foundational Sciences, College of Dental Medicine, Columbia University, New York, New York
Queries to the author regarding this course may be submitted to firstname.lastname@example.org.
1. Evidence Based Practice Toolkit. Winona State University website. https://libguides.winona.edu/ebptoolkit/Levels-Evidence. Accessed July 6, 2023.
2. Kramer MS. Clinical Epidemiology and Biostatistics. New York, NY: Springer-Verlag; 1988:137-145.
3. Riegelman RK, Hirsch, RP. Studying a Study and Testing a Test: How to Read the Medical Literature. 2nd ed. Boston, MA: Little, Brown and Co.; 1989:25-46.
4. Marshall E. The Statistics Tutor's Quick Guide to Commonly Used Statistical Tests. https://www.statstutor.ac.uk/resources/uploaded/tutorsquickguidetostatistics.pdf. Accessed July 6, 2023.
5. Park E, Cho M, Ki CS. Correct use of repeated measures analysis of variance. Korean J Lab Med. 2009;29(1):1-9.
6. Kitchen CM. Nonparametric vs parametric tests of location in biomedical research. Am J Ophthalmol. 2009;147(4):571-572.
7. Bagdonavičius V, Levuliene R, Nikulin MS, Tran QX. On chi-squared type tests and their applications in survival analysis and reliability. J Math Sci. 2014;199:88-99.
8. Feinstein AR. Clinical biostatistics. LI. Quantitative significance and statistical indexes for a contrast of two groups. Clin Pharmacol Ther. 1980;27(4):567-578.
9. Greene WL, Concato J, Feinstein AR. Claims of equivalence in medical research: are they supported by the evidence? Ann Intern Med. 2000;132(9):715-722.
10. Greenstein G. Clinical versus statistical significance as they relate to the efficacy of periodontal therapy. J Am Dent Assoc. 2003;134(5):583-591.
11. Kingman A. Statistical vs clinical significance in product testing: can they be designed to satisfy equivalence? J Public Health Dent. 1992;52(6):353-360.
12. Sharma H. Statistical significance or clinical significance? A researcher's dilemma for appropriate interpretation of research results. Saudi J Anaesth. 2021;15(4):431-434.
13. Page P. Beyond statistical significance: clinical interpretation of rehabilitation research literature. Int J Sports Phys Ther. 2014;9(5):726-736.
14. Hawkins AT, Samuels LR. Use of confidence intervals in interpreting nonstatistically significant results. JAMA. 2021;326(20):2068-2069.
15. Sullivan GM, Feinn R. Using effect size-or why the P value is not enough. J Grad Med Educ. 2012;4(3):279-282.
16. Godwin M. Hypothesis: the research page. Part 3: power, sample size, and clinical significance. Can Fam Physician. 2001;47:1441-1453.
17. Lancaster GA, Dodd S, Williamson PR. Design and analysis of pilot studies: recommendations for good practice. J Eval Clin Pract. 2004;10(2):307-312.
18. Dettori JR. Loss to follow-up. Evid Based Spine Care J. 2011;2(1):7-10.
19. Gunsolley JC, Chinchilli VN, Savitt ED, et al. Analysis of site specific periodontal bacteria sampling schemes. J Periodontol. 1992;63(6):507-514.
20. Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;https://www.jstor.org/stable/i34341944(4):1049-1060.
21. Minitab Blog Editor. What can you say when your P-value is greater than 0.05? December 3, 2015. Minitab website. https://blog.minitab.com/en/understanding-statistics/what-can-you-say-when-your-p-value-is-greater-than-005. Accessed July 6, 2023.
22. Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019;8:e48175.
23. Kamangar F. Confounding variables in epidemiologic studies: basics and beyond. Arch Iran Med. 2012;15(8):508-516.
24. Abt E, Gold J, Frantsve-Hawley J. Issues of bias and confounding in clinical studies. Pocket Dentistry website. https://pocketdentistry.com/issues-of-bias-and-confounding-in-clinical-studies/. Accessed July 6, 2023.
25. Armstrong RA, Hilton AC. Post hoc tests. In: Statistical Analysis in Microbiology: Statnotes. John Wiley & Sons, Inc.; 2010:34-36. https://onlinelibrary.wiley.com/doi/10.1002/9780470905173.ch7. Accessed July 6, 2023.
26. Collapsing Data. LearnR4DS website. 2019. https://learnr4ds.com/html/collapsing-data.html. Accessed July 6, 2023.
27. Schober P, Boer C, Schwarte LA. Correlation coefficients: appropriate use and interpretation. Anesth Analg. 2018;126(5):1763-1768.
29. Greenstein G, Lamster I. Understanding diagnostic testing for periodontal diseases. J Periodontol. 1995;66(8):659-666.
30. Liu JX, Werner J, Kirsch T, et al. Cytotoxicity evaluation of chlorhexidine gluconate on human fibroblasts, myoblasts, and osteoblasts. J Bone Jt Infect. 2018;3(4):165-172.