A systematic review of comparisons of effect sizes derived from randomised and non-randomised studies.

TitleA systematic review of comparisons of effect sizes derived from randomised and non-randomised studies.
Publication TypeJournal Article
Year of Publication2000
AuthorsMacLehose RR, Reeves BC, Harvey IM, Sheldon TA, Russell IT, Black AM
JournalHealth technology assessment (Winchester, England)
Date Published2000
KeywordsBreast Neoplasms; Clinical Trials as Topic; Decision Making; Female; Folic Acid; Great Britain; Humans; Male; Mammography; Mass Screening; Neural Tube Defects; Outcome and Process Assessment (Health Care); Pregnancy; Quality Control; Randomized Controlled Trials as Topic; Research Design; Sensitivity and Specificity; Technology Assessment, Biomedical

BACKGROUND: There is controversy about the value of evidence about the effectiveness of healthcare interventions from non-randomised study designs. Advocates for quasi-experimental and observational (QEO) studies argue that evidence from randomised controlled trials (RCTs) is often difficult or impossible to obtain, or is inadequate to answer the question of interest. Advocates for RCTs point out that QEO studies are more susceptible to bias and refer to published comparisons that suggest QEO estimates tend to find a greater benefit than RCT estimates. However, comparisons from the literature are often cited selectively, may be unsystematic and may have failed to distinguish between different explanations for any discrepancies observed. OBJECTIVES: The aim was to investigate the association between methodological quality and the magnitude of estimates of effectiveness by comparing systematically estimates of effectiveness derived from RCTs and QEO studies. Quantifying any such association should help healthcare decision-makers to judge the strength of evidence from non-randomised studies. Two strategies were used to minimise the influence of differences in external validity between RCTs and QEO studies: a comparison of the RCT and QEO study estimates of effectiveness of any intervention, where both estimates were reported in a single paper a comparison of the RCT and QEO study estimates of effectiveness for specified interventions, where the estimates were reported in different papers. The authors also sought to identify study designs that have been proposed to address one or more of the problems often found with conventional RCTs. METHODS: DATA SOURCES: Relevant literature was identified from: The Cochrane Library, MEDLINE, EMBASE, DARE, and the Science Citation Index. References of relevant papers already identified experts. Electronic searches were very difficult to design and yielded few papers for the first strategy and when identifying study designs. CHOICE OF INTERVENTIONS TO REVIEW FOR STRATEGIES 1 AND 2: For strategy 1, any intervention was eligible. For strategy 2, interventions for which the population, intervention and outcome investigated were anticipated to be homogeneous across studies were selected for review: Mammographic screening (MSBC) of women to reduce mortality from breast cancer. Folic acid supplementation (FAS) to prevent neural tube defects in women trying to conceive. DATA EXTRACTION AND QUALITY ASSESSMENT: Data were extracted by the first author and checked by the second author. Disagreements were negotiated with reference to the paper concerned. For strategy 1, study quality was scored using a checklist to assess whether the RCT and QEO study estimates were derived from the same populations, whether the assessment of outcomes was 'blinded', and the extent to which the QEO study estimate took account of possible confounding. For strategy 2, a more detailed instrument was used to assess study quality on four dimensions: the quality of reporting, the generalisability of the results, and the extent to which estimates of effectiveness may have been subject to bias or confounding. All quality assessments were carried out by three people. DATA SYNTHESIS AND ANALYSIS: For strategy 1, pairs of comparisons between RCT and QEO study estimates were classified as high or low quality. Seven indices of the size of discrepancies between estimates of effect size and outcome frequency were calculated, where possible, for each comparison. Distributions of the size and direction of discrepancies were compared for high- and low-quality comparisons. FOR STRATEGY 2, THREE ANALYSES WERE CARRIED OUT: Attributes of the instrument were described by k statistics, percentage agreement, and Cronbach's a values. Regression analyses were used to investigate -variations in study quality. (ABSTRACT TRUNCATED)

Alternate JournalHealth Technol Assess