Interrater reliability in assessing quality of diagnostic accuracy studies using the QUADAS tool. A preliminary assessment.

TitleInterrater reliability in assessing quality of diagnostic accuracy studies using the QUADAS tool. A preliminary assessment.
Publication TypeJournal Article
Year of Publication2006
AuthorsHollingworth W, Medina SL, Lenkinski RE, Shibata DK, Bernal B, Zurakowski D, Comstock B, Jarvik JG
JournalAcademic radiology
Volume13
Issue7
Pagination803-10
Date Published2006 Jul
ISSN1076-6332
KeywordsBrain Neoplasms; Consensus; Diagnostic Services; Evidence-Based Medicine; Humans; Magnetic Resonance Spectroscopy; Observer Variation; Peer Review, Research; Quality Control; Questionnaires; Reproducibility of Results; Review Literature as Topic
Abstract

RATIONALE AND OBJECTIVES: Quality Assessment of Diagnostic Accuracy Studies (QUADAS) is a new tool to measure the methodological quality of diagnostic accuracy studies in systematic reviews. We used data from a systematic review of magnetic resonance spectroscopy (MRS) in the characterization of suspected brain tumors to provide a preliminary evaluation of the inter-rater reliability of QUADAS.

MATERIALS AND METHODS: A structured literature search identified 19 diagnostic accuracy studies. These publications were distributed randomly to primary and secondary reviewers for dual independent assessment. Reviewers recorded methodological quality by using QUADAS on a custom-designed spreadsheet. We calculated correlation, percentage of agreement, and kappa statistic to assess inter-rater reliability.

RESULTS: Most studies in our review were judged to have used an accurate reference standard. Conversely, the MRS literature frequently failed to specify the length of time between index and reference tests or that the clinicians were unaware of the index test findings when reporting the reference standard. There was good correlation (rho = 0.78) between reviewers in assessment of the overall number of quality criteria met. However, mean agreement for individual QUADAS questions was only fair (kappa = 0.22) and ranged from no agreement beyond chance (kappa < 0) to moderate agreement (kappa = 0.58).

CONCLUSION: Inter-rater reliability in our study was relatively low. Nevertheless, we believe that QUADAS potentially is a useful tool for highlighting the strengths and weaknesses of existing diagnostic accuracy studies. Low reliability suggests that different reviewers will reach different conclusions if QUADAS is used to exclude "low-quality" articles from meta-analyses. We discuss methods for improving the validity and reliability of QUADAS.

DOI10.1016/j.acra.2006.03.008
Alternate JournalAcad Radiol