Logo KCE

METHODOLOGICAL APPROACHES

SEARCH FOR EVIDENCE (GCP - HTA)

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

This chapter describes the methods of a literature review for the KCE. It provides guidance for reviewers on the various steps of the search, appraisal and presentation of the results.

New evidence may change some of the recommendations made, thereby researchers should consider this as a ‘living document’ for which yearly updates will be required.

This document is mainly based on the following sources of information:

  1. KCE Process Documents and Notes (KCE and Deloitte, 2003)
  2. The Cochrane Collaboration Handbook (Higgins and Green, 2011)
  3. SIGN 50 (SIGN, 2008)
  4. CRD’s guidance for undertaking reviews in health care (Centre for Reviews and Dissemination (CRD), 2009)
  5. The QUOROM statement (Moher et al., 1999)
  6. GRADE (Grade org)
  7. The KCE Process Notes GCP (Van den Bruel et al., 2007), HSR (Van de Voorde and Léonard, 2007), HTA (Cleemput et al., 2007).

An evidence report consists of the following steps:

1. Introduction

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

A protocol for carrying out a review is equivalent to, and as important as, a protocol for a primary research study. A review is less likely to be biased if the questions are well developed beforehand, and the methods that will be used to answer them are decided on before gathering the necessary data and drawing inferences. In the absence of a protocol, it is possible that study selection and analysis will be unduly driven by (a presumption of) the findings.

A search strategy consists of several aspects. The research question (in a structured format, see Building a search question) should be used as a guide to direct the search strategy. For electronic searches, it is important to list the databases in which studies will be sought. Other sources can be consulted in order to identify all relevant studies. These include reference lists from relevant primary and review articles, journals, grey literature and conference proceedings, research registers, researchers and manufacturers, and the internet.

In practice, it is uncommon for a single search to cover all the questions being addressed within a review. Different questions may be best answered by different databases, or may rely on different study types. Authors are encouraged to take an iterative approach to the search, carrying out a search for high-level evidence first. After evaluating the results of this first search, the questions may need to be redefined and subsequent searches may need to be focused on more appropriate sources and study types.

In some cases, directly relevant good-quality evidence syntheses (secondary sources), such as good-quality systematic reviews or Health Technology Assessments (HTA), will be available on some of the issues that fall within the remit of the review. In these circumstances reference will be made to the existing evidence rather than repeating work that already has been done. All HTA reports or systematic reviews that are identified must be evaluated on their quality and must be shown to have followed an acceptable methodology before they can be considered for use in this way.

In other cases existing evidence may not be directly relevant or may be found to have methodological weaknesses. In these cases, existing evidence cannot be used in the review. Nevertheless, excluded systematic reviews or HTA reports still can be a useful source of references that might be used later on in the review.

In conclusion, literature searches for the KCE should follow an iterative approach, searching for evidence syntheses first and subsequently complementing this search by searching for original studies. Various resources are listed in the following paragraph.

2. Building a search question

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

Constructing an effective combination of search terms for searching electronic databases requires a structured approach. One approach involves breaking down the review question into ‘facets’. Several generic templates exist, e.g. PICOS (Population, Intervention, Comparator, Outcome and Study design), PIRT (Population, Index test, Reference test, Target disorder), SPICE, ECLIPSE, SPIDER, etc. (See Appendices).

The next stage is to identify the search terms in each ‘facet’ which best capture the subject. The group of search terms covering each facet of the review question should include a range of text words (free text to be searched in the title or abstract of studies). Text words and their variants can be identified by reading relevant reviews and primary studies identified during earlier searches or a pre-assessment of the literature. Information on the subject indexing used by databases can be found by consulting the relevant indexing manuals and by noting the manner in which key retrieved articles have been indexed by a given database.

The final search strategy will be developed by an iterative process in which groups of terms are used, perhaps in several permutations, to identify the combination of terms that seems most sensitive in identifying relevant studies. This requires skilled adaptation of search strategies based on knowledge of the subject area, the subject headings and the combination of ‘facets’ which best capture the topic.

3. Searching electronic sources

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

The decision on which source to use depends on the research question. The three electronic bibliographic databases generally considered being the richest sources of primary studies - MEDLINE, EMBASE, and CENTRAL - are essential in any literature review for the KCE. However, many other electronic bibliographic databases exist.

Systematic reviews can be found in the Cochrane Database for Systematic Reviews, in DARE or in Medline. Search strategies have been developed to enhance the identification of these types of publications (Kastner, 2009; Montori, 2005).

HTA reports can be found in the HTA database of CRD or at individual agencies’ sites (see HTAi vortal under "HTA Agencies and Networks").

Specifically for drugs and technology reviews, data from the US Federal Drug Administration (FDA) or EMA can be helpful.

Providing an exhaustive list of all potential sources is not possible here. The KCE library catalogue provides a list of such sources.

 


 

Access to electronic resources happens through the following digital libraries:

logo VDIC  More than 10.000 e-journals and 8700 Ebooks (IP recognition)

logo CDLH Access to databases, journals and eBooks via CEBAM DLH (login required)

3.1 Sources of biomedical literature

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

Core databases

  1. MEDLINE contains records from 5600 journals (39 languages) in the of biomedical field, from 1946 onwards (Access for KCE | Free access through PubMed).
  2. EMBASE: Records from 7600 journals (70 countries, 2000 not covered by Medline) in biomedical field, from 1974 onwards (Access for KCE).
  3. CENTRAL - The Cochrane Controlled Trials Register, part of the Cochrane Library: Records of randomised controlled trials and controlled clinical trials in healthcare identified through the work of the Cochrane Collaboration including large numbers of records from MEDLINE and EMBASE as well as much material not covered by these databases (Dickersin, 2002). (Access for KCE through CDLH | Free access to abstracts)

Databases for systematic reviews

  1. CRD Database of reviews of effectiveness (DARE) contains structured abstracts, including critical appraisal, of systematic reviews identified by regular searching of bibliographic databases, and handsearching of key journals. [the update of CRD DARE has ceased March 2015]
  2. Cochrane Database of Systematic Reviews (CDSR, part of the Cochrane Library) lists the results of systematic reviews (full text) conducted by Cochrane groups, but also ongoing projects (Access for KCE through CDLH | Free access to abstracts)
  3. Special queries exist for Medline or Embase to limit the identified records to articles identified as Systematic reviews. See appendix.

Databases for HTA reports

  1. The CRD HTA database is a bibliographical database of published HTA reports; it aslo lists ongoing HTA projects. Members of INAHTA are regularly invited to update their information on the HTA database. Access is free, records of the HTA database are also searchable via the Cochrane Library.
  2. HTA reports can also be found at individual agencies’ sites, the HTAi vortal lists HTA organisations, it also provides a custom Web search engine that limits the Google results to pages published on the website of HTA organisations listed on the HTAi vortal.

Databases for specific topics

  • Nursing: CINAHL (Cumulative Index to Nursing and Allied Health Literature), British Nursing Index (BNI) (Access for KCE through CDLH)
  • Physiotherapy: PEDro (contains records of RCTs, systematic reviews and evidence-based clinical practice guidelines in physiotherapy, from 1929 onwards; most trials in the database have been rated for quality to quickly discriminate between trials that are likely to be valid and interpretable and those that are not; free access)
  • Psychology and Psychiatry: PsycInfo (Access for KCE)
  • More bibliographic databases are listed on the KCE library catalogue (e.g. CAM, ageing, ...)

3.2 Sources of economic literature

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

Core database

  • NHS Economic Evaluation Database (NHS EED) contains over 7000 abstracts of quality assessed economic evaluations. The database aims to assist decision-makers by systematically identifying and describing economic evaluations, appraising their quality and highlighting their relative strengths and weaknesses. [the update of CRD NHS EED has ceased March 2015]
  • Some of the search filters for Medline or Embase limit the records to articles related to Costs, Economic evaluations, Economics

Complementary databases

  • EconLit:database of economics publications including peer-reviewed journal articles, working papers from leading universities, PhD dissertations, books, collective volume articles, conference proceedings, and book reviews  (Access for KCE)

3.3 Sources of clinical practice guidelines

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

Often, specific guidelines can only be retrieved through local websites of scientific associations or government agencies. It is therefore recommended to combine a Medline search (with specific filters for guidelines) with a search of the following:

3.4 Sources of ongoing clinical trials

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

Ongoing trials may have limited use as a means of identifying studies relevant to systematic reviews, but may be important so that when a review is later updated, these studies can be assessed for possible inclusion. Several initiatives have been taken recently to register ongoing trials:

3.5. Sources of grey literature

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

More and more electronic sources describe "grey literature" (results of scientific research not published in scientific journals; e.g. reports, working papers, thesis, conference papers, ...)

Institutional repositories

3.6 Building a search strategy

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

For each database, search terms defined in the preparation phase will be mapped to the Thesaurus terms of the database (when available). Mapping can be achieved using the built-in functionality of the search interface, or manually by looking at the indexation of previously identified pertinent articles. Attention will need to be paid to the explosion tool (sometimes selected by default linke in PubMed, sometimes not like in OVID Medline).

The most important synonyms of the Thesaurus terms identified for each facet will also be added to the search strategy as text word. Advanced functionalities of the search interfaces will be used (see below: truncation, wildcard, proximity operators).

The terms within a specific facet will be combined with the Boolean operator ‘OR’ in order to group all articles dealing with this facet. For some concepts, special queries (also called search filters) have been developed (see below). The resulting groups of articles will then be combined using the Boolean operator ‘AND’.

It is recommended to validate each search strategy by a second reviewer.

3.6.1 Search tools

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

Boolean and proximity operators

In the context of database searching, Boolean logic refers to the logical relationships among search terms. Classical Boolean operators are ‘AND’, ‘OR’ and ‘NOT’, which can be used in most databases. Importantly, in some databases, such as PubMed, these Booleans need to be entered in uppercase letters. Other operators, the so-called proximity operators, are ‘NEAR’, ‘NEXT’ and ‘ADJ’. A more detailed overview of Boolean and proximity operators is provided in Appendix.

Truncation & wildcards

Truncation can be used when all terms that begin with a given text string are to be found. Different databases use different characters for truncation with different functionalities. For example, in PubMed, OVID and EMBASE ‘unlimited’ truncation is represented by the asterix ‘*’, but OVID Medline also uses ‘$’.

In OVID Medline the ‘optional’ wildcard character ‘?’ can be used within or at the end of a search term to substitute for 1 or 0 characters. In contrast, in EMBASE a question mark indicates exactly one character.

A more detailed overview is provided in appendix.

3.6.2 Search limits

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

When the amount of resulting hits is too high to be managed within the available timeframe / resources, search limits may be applied.

First, tools related to the Thesaurus should be considered:

  • Focus / Major Heading: limits to the articles that have been indexed with the term as Major Heading. This helps to reduce the amount of results (up to 40%) while keeping a good pertinence thanks to the human indexation of the full article (in case of Medline and Embase).
  • Subheading: these are also added to the description of an article by the indexers, but should be used with more precaution (can render the search strategy too restrictive).

Several search interfaces provide search limits that can also be applied to narrow the search. Classical examples are date and language limits, but some databases also provide limits according to age, gender, publication type etc. Before applying search limits, the risk of a too specific (i.e. narrow) search should be considered.

3.6.3. Search filters

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

In systematic reviews, if time and resources allow, specificity is often sacrificed in favour of sensitivity, to maximize the yield of relevant articles. Therefore, it is not unusual to retrieve large numbers (possibly thousands) of bibliographic references for consideration for inclusion in an extensive systematic review. This means that reviewers may have to spend a lot of time scanning references to identify perhaps a limited number of relevant studies.

Search filters are available to focus the search according to the type of study that is sought, for example to focus on randomized controlled trials, diagnostic accuracy studies, prognostic studies or systematic reviews (see example in appendix). Specific search filters also exist for well-circumscribed clinical problems/populations, e.g. child health (Boluyt, 2008), palliative care (Sladek, 2007), or nephrology (Garg, 2009).

Sources of filters include:

During the selection of an appropriate search filter, aspects of testing and validation should play an important role. Specific appraisal tools are available to evaluate the methodological quality of search filters (Bak, 2009; Glanville, 2009).

For diagnostic studies, it is recommended not to use a search filter.

3.7 Documenting a search strategy

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

The search strategy for electronic databases should be described in sufficient detail to allow that

  • the process could be replicated
  • an explanation could be provided regarding any study not included in the final report (identified by electronic sources search or not)

The template required by KCE to describe a search strategy is provided in attachment.

All identified references must be exported, preferably in a text file to be imported in a Reference Management Software (see appendix for technical description).

AttachmentSize
process_04_template_-_search_strategy_1.doc38 KB
process_04_template_-_search_strategy_1.odt10.76 KB

4. Searching supplementary sources

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

Checking references lists

  • Authors should check the reference lists of articles obtained (including those from previously published systematic reviews) to identify relevant reports. The process of following up references from one article to another is generally an efficient means of identifying studies for possible inclusion in a review.
  • Because investigators may selectively cite studies with positive results (Gotzsche 1987; Ravnskov 1992), reference lists should never be used as a sole approach to identifying reports for a review, but rather as an adjunct to other approaches.

Using related citation tools

  • Several electronic sources provide a "Find related" functionality. This functionality is often based on a poorly detailed (and thus difficult to describe and reproduce) algorithm (using theseaurus terms, keywords, ...). Therefore, we recommend to list the identified supplemental references under "Related citations".
  • Several electronic sources provide a "find citing articles" functionality. This functionality is often related to the quality of the references provided by the authors and thus not always exact. Therefore, we recommend to list the identified supplemental references under "Citing articles".

Other supplementary sources

  • Websites
  • Handsearching of journals
  • Experts in the field
  • Etc.

5. Searching for evidence on adverse effects

Author(s): 
Joan.Vlayen
Author(s): 
Patrice.Chalon

The first sources to investigate for information on adverse effects are reports from trials or other studies included in the systematic review. Excluded reports might also provide some useful information.

There are a number of specific sources of information on adverse effects of drugs, including:

In Belgium, there is currently no public database on adverse drug events. Regulatory authorities (such as the websites of FDA and EMA) and the drug manufacturer may be able to provide some information. Information on adverse effects should also be sought from other types of studies than those considered appropriate for the systematic review (e.g. cohort and case-control studies, uncontrolled [phase I and II] trials, case series and case reports). However, all such studies and reports are subject to bias to a greater extent than randomized trials, and findings must be interpreted with caution.

6. Selecting studies

Study selection is a multi-stage process. The process by which studies will be selected for inclusion in a review should be described in the review protocol.

6.1. Inclusion and exclusion criteria

The final inclusion/exclusion decisions should be made after retrieving the full texts of all potentially relevant citations. Reviewers should assess the information contained in these reports to see whether the criteria have been met or not. Many of the citations initially included may be excluded at this stage.
The criteria used to select studies for inclusion in the review must be clearly stated:

6.1.1. Types of participants

The diseases or conditions of interest should be described here, including any restrictions on diagnoses, age groups and settings. Subgroup analyses should not be listed here.

6.1.2. Type of interventions

Experimental and control interventions should be defined here, making it clear which comparisons are of interest. Restrictions on dose, frequency, intensity or duration should be stated. Subgroup analyses should not be listed here.

6.1.3. Types of outcome measures

Note that outcome measures do not always form part of the criteria for including studies in a review. If they do not, then this should be made clear. Outcome measures of interest should be listed in this section whether or not they form part of the inclusion criteria.

For most reviews it will be worthwhile to pilot test the inclusion criteria on a sample of articles (say ten to twelve papers, including ones that are thought to be definitely eligible, definitely not eligible and questionable). The pilot test can be used to refine and clarify the inclusion criteria, train the people who will be applying them and ensure that the criteria can be applied consistently by more than one person.

Even when explicit inclusion criteria have been specified, decisions concerning the inclusion of individual studies remain relatively subjective. There is evidence that using at least two authors has an important effect on reducing the possibility that relevant reports will be discarded (Edwards et al. 2002). Agreement between assessors may be formally assessed mathematically using Cohen's Kappa (a measure of chance-corrected agreement). Many disagreements may be simple oversights, whilst others may be matters of interpretation. These disagreements should be discussed, and where possible resolved by consensus after referring to the protocol. If disagreement is due to lack of information, the authors may have to be contacted for clarification. Any disagreements and their resolution should be recorded.
The influence of uncertainty about study selection may be investigated in a sensitivity analysis.

It is useful to construct a list of excluded studies at this point, detailing the reason for each exclusion. This list may be included in the report of the review as an appendix. The final report of the review should also include a flow chart or a table detailing the studies included and excluded from the review. In appendix a flow chart is provided for documenting study selection. If resources and time allow, the lists of included and excluded studies may be discussed with the expert panel. It may be useful to have a mixture of subject experts and methodological experts assessing inclusion.

6.1.4. Types of studies

Eligible study designs should be stated here, along with any thresholds for inclusion based on the conduct or quality of the studies. For example, ‘All randomised controlled comparisons’ or ‘All randomised controlled trials with blind assessment of outcome’. Exclusion of particular types of randomised studies (for example, cross-over trials) should be justified.

It is generally for authors to decide which study design(s) to include in their review. Some reviews are more restrictive, and include only randomized trials, while others are less restrictive, and include other study designs as well, particularly when few randomized trials addressing the topic of the review are identified. For example, many of the reviews from the Cochrane Effective Practice and Organization of Care (EPOC) Collaborative Review Group include before-and-after studies and interrupted time series in addition to randomized and quasi-randomized trials.

6.2. Selection process

Before any papers are acquired for evaluation, sifting of the search output is carried out to eliminate irrelevant material.

  • Papers that are clearly not relevant to the key questions are eliminated based on their title.
  • Abstracts of remaining papers are then examined and any that are clearly not appropriate study designs, or that fail to meet specific methodological criteria, will be also eliminated at this stage.
  • All reports of studies that are identified as potentially eligible must then be assessed in full text to see whether they meet the inclusion criteria for the review.

The reproducibility of this process should be tested in the initial stages of the review, and if reproducibility is shown to be poor more explicit criteria may have to be developed to improve it.

Authors must decide whether more than one author will assess the relevance of each report. Whatever the case, the number of people assessing the relevance of each report should be stated in the Methods section of the review. Some authors may decide that assessments of relevance should be made by people who are blind or masked to the journal from which the article comes, the authors, the institution, and the magnitude and direction of the results by editing copies of the articles (Berlin 1997; Berlin, Miles, and Crigliano 1997). However, this takes much time, and may not be warranted given the resources required and the uncertain benefit in terms of protecting against bias (Berlin 1997).

AttachmentSize
Process_06_Template_StudiesSelection_20161206.xls214.5 KB

7. Quality assessment of studies

Author(s): 
Joan.Vlayen
Author(s): 
Pascale.Jonckheer

Critical appraisal of articles is a crucial part of a literature search. It aims at identifying methodological weaknesses and assessing the quality in a coherent way. The methodological assessment is based on a number of key questions that focus on those aspects of the study design that have a significant influence on the validity of the results reported and conclusions drawn. These key questions differ according to the study type, and a range of checklists can be used to bring a degree of consistency to the assessment process. The checklists for systematic reviews, randomized controlled trials, cohort studies and case-control studies discussed below were selected during several internal workshops at the KCE. The other checklists (for diagnosis studies for instance) will also be discussed.

The process of critical appraisal consists of an evaluation by two independent reviewers who confront their results and discuss them with a third reviewer in case of disagreement. However,  because of feasibility it could be acceptable that one reviewer does the quality appraisal and that a second reviewer checks the other’s work.

If necessary, the authors of the evaluated study should be contacted for additional information.

The results of the critical appraisal should be reported in a transparent way.

Co-author(s): 
Jo.Robays

7.1. Critical appraisal of systematic reviews

Author(s): 
Nadia.Benahmed
Author(s): 
Pascale.Jonckheer

From the several instruments available to assess methodological quality of reviews (1); KCE recommends the use of AMSTAR 2 (2) that takes into account RCT but also non RCT studies.

An alternative is the ROBINS-tool which is more comprehensive for non randomized studies. (3)

 


References

(1) See among other overviews

(2) Shea Beverley J, Reeves Barnaby C, Wells George, Thuku Micere, Hamel Candyce, Moran Julian et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both BMJ 2017; 358 :j4008 

(3) Whiting P, Savovic J, Higgins JP, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225-34.

 


Updates

[Update 20180126] AMSTAR 2 replaces AMSTAR in the toolbox

AMSTAR 2 aims at responding to AMSTAR's criticisms, among others the fact that AMSTAR does not cover non RCT studies. 

 

[Update] Dutch Cochrane checklist removed from the toolbox

KCE experts initially selected 2 checklists for quality appraisal: AMSTAR and the Dutch Cochrane checklist. However, the Dutch Cochrane tool is not used anymore by its authors and was never formally validated. It has thus been removed from the toolbox.

AttachmentSize
KCEProcessbook_Amstar2-checklist.docx47.91 KB

7.2. Critical appraisal of randomized controlled trials for interventions

Author(s): 
Joan.Vlayen
Author(s): 
Pascale.Jonckheer

For the quality appraisal of randomized controlled trials for interventions, the Cochrane Collaboration’s Risk of Bias Tool is recommended [1]. This checklist contains hints on how to interpret and score the individual items, and is summarised in the attachement "Cochrane Collaboration's Risk of Bias Tool". It is also extensively explained in chapter 8 of the Cochrane Handbook (http://www.cochrane-handbook.org/). Each item can be scored with low, unclear or high risk of bias. Importantly, performance bias (blinding) and attrition bias (incomplete outcome data) should be assessed for each critical and important outcome as selected according to GRADE. If insufficient detail is reported of what happened in the study, the judgement will usually be unclear risk of bias.

The recommended level at which to summarize the risk of bias in a study is for an outcome within a study, because some risks of bias may be different for different outcomes. A summary assessment of the risk of bias for an outcome should include all of the entries relevant to that outcome: i.e. both study-level entries, such as allocation sequence concealment, and outcome specific entries, such as blinding.

Some methodological issues, such as the correctness of the statistical analysis, power, etc. are not specifically addressed in this tool, and should be assessed separately.

The scores can be filled in using the template in attachment.

 


[1] KCE experts initially selected 2 checklists for quality appraisal: the Risk of Bias Tool and the Dutch Cochrane checklist. However, the Dutch Cochrane tool is not used anymore by its authors and was never formally validated.

AttachmentSize
Cochrane Collaboration's Risk of Bias Tool.doc74.5 KB
Template Risk of Bias tool.doc41.5 KB

7.3. Critical appraisal of diagnostic accuracy studies

Author(s): 
Joan.Vlayen
Author(s): 
Pascale.Jonckheer

For the quality appraisal of diagnostic accuracy studies, the QUADAS 2 instrument is recommended (Whiting, 2003). The tool is structured so that 4 key domains are each rated in terms of the risk of bias and the concern regarding applicability to the research question. Each key domain has a set of signalling questions to help reach the judgments regarding bias and applicability. A background document on QUADAS 2 can be found on the website: http://www.bris.ac.uk/quadas/quadas-2.

In order to correctly appraise a diagnostic accuracy study, basic knowledge about key concepts is essential. An overview of these concepts is provided in the following table:

Three phases can be distinguished in the QUADAS tool:

  • Phase 1: State the review question using the PIRT format (Patients, Index test(s), Reference standard, Target condition)
  • Phase 2: Draw a flow diagram for the primary study, showing the process of recruiting, inclusion, exclusion and verification
  • Phase 3: Risk of bias and applicability judgments.

 The score can be filled in using the template in attachment.

AttachmentSize
Template QUADAS 2 tool.doc59 KB

7.4. Critical appraisal of observational studies

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen
Author(s): 
Pascale.Jonckheer

Unlike systematic reviews, randomized controlled trials, diagnostic studies and guidelines, the methodological research community has less agreement on which items to use for the quality appraisal of cohort studies, case-control studies and other types of observational evidence. The Dutch Cochrane Centre has a few checklists available (http://dcc.cochrane.org/beoordelingsformulieren-en-andere-downloads), but these are written in Dutch and were not formally validated. For the evaluation of prospective, non-randomized, controlled trials, the Cochrane Collaboration’s Risk of Bias Tool can be used. Other checklists can be found at: http://www.unisa.edu.au/Research/Sansom-Institute-for-Health-Research/Research-at-the-Sansom/Research-Concentrations/Allied-Health-Evidence/Resources/CAT/. GRADE also offers a number of criteria that can be used to judge the methodological quality of observational studies. These are further explained in the chapter on GRADE.

Mainly based on the checklists of SIGN and NICE, the KCE elaborated two new checklists for cohort studies and case-control studies (see attachment).

AttachmentSize
Cohort studies_template.docx49.9 KB
Case-control studies_template.docx48.3 KB

7.5. Critical appraisal of guidelines

Author(s): 
Joan.Vlayen
Author(s): 
Pascale.Jonckheer

For the quality appraisal of clinical practice guidelines, the AGREE II instrument (www.agreetrust.org) is recommended. AGREE II comprises 23 items organized into 6 quality domains: i) scope and purpose; ii) stakeholder involvement; iii) rigour of development; iv) clarity of presentation; v) applicability; and vi) editorial independence. Each of the 23 items targets various aspects of practice guideline quality and can be scored on a scale from 1 (strongly disagree) to 7 (strongly agree). Two global rating items allow an overall assessment of the guideline’s quality. Detailed scoring information is provided in the instrument in attachment.

Ideally, the quality appraisal of a guideline is done by 4 reviewers, but because of feasibility 2 reviewers can be considered acceptable.

AGREE II serves 3 purposes:

1. to assess the quality of guidelines;

2. to provide a methodological strategy for the development of guidelines; and

3. to inform what information and how information ought to be reported in guidelines.

AttachmentSize
AGREEII.pdf392.52 KB

8. Data extraction

Author(s): 
Joan.Vlayen

Data extraction implies the process of extracting the information from the selected studies that will be ultimately reported. In order to allow an efficient data extraction, the process should be detailed in the protocol before the literature search is started. Key components of the data extraction include:

  • information about study reference(s) and author(s);
  • verification of study eligibility;
  • study characteristics:
    • study methods
    • participants
    • interventions
    • outcomes measures and results
Co-author(s): 
Anja.Desomer
Co-author(s): 
Jo.Robays

Evidence tables

Author(s): 
Joan.Vlayen

All validated studies identified from the systematic literature review relating to each key search question are summarized into evidence tables. The content of the evidence tables is determined by the entire project group. Completion for all retained articles is done by one member of the project group and checked by another member. A KCE template for evidence tables was developed using the CoCanCPG evidence tables (www.cocancpg.eu/) and the GIN evidence tables (http://g-i-n.net/activities/etwg/progresses-of-the-etwg) as a basis, and can be found in attachment. A template is available for systematic reviews, intervention studies, diagnostic accuracy studies and prognostic studies.

AttachmentSize
evidence tables_final.docx66.61 KB
Co-author(s): 
Anja.Desomer
Co-author(s): 
Jo.Robays

GRADE evidence profiles

Author(s): 
Joan.Vlayen

To provide an overview of the body of evidence for each comparison relevant to the research question, GRADE evidence profiles are created and added to the appendix of the report. These evidence profiles can serve as a basis for the content discussions during the expert meetings. To create these evidence profiles it is highly recommended to use the GRADEpro software, which can be downloaded for free (http://ims.cochrane.org/revman/other-resources/gradepro/download).

When a meta-analysis is possible, it is recommended to extract the necessary information to Review Manager (RevMan) first, and subsequently to import this information from RevMan into GRADEpro (using the button ‘Import from RevMan’). More information on the use of RevMan can be found here: http://ims.cochrane.org/revman.

Once all information is extracted in GRADEpro, evidence profiles can be created by clicking the ‘Preview SoF table’ button, selecting the format ‘GRADE evidence profile’ and exporting them to a Word Document.

Co-author(s): 
Anja.Desomer
Co-author(s): 
Jo.Robays

9. Analysing and interpreting results

Once the eligible studies are selected and quality appraised, the magnitude of the intervention effect should be estimated. The best way to do this is by performing a meta-analysis (i.e. the statistical combination of results from two or more separate studies), although this is not always feasible. An interesting tool for doing a limited meta-analysis is the free Review Manager software of the Cochrane Collaboration.

The starting point of the analysis and interpretation of the study results involves the identification of the data type for the outcome measurements. Five different types of outcome data can be considered:

  • dichotomous data: two possible categorical response;
  • continuous data
  • ordinal data: several ordered categories;
  • counts and rates calculated from counting the numbers of events that each individual experiences;
  • time-to-event data

Only dichotomous data will be addressed here. Dichotomous outcome data arise when the outcome for every study participant is one of two possibilities, for example, dead or alive. These data can be summarised in a 2x2 table:

  Outcome    
  YES NO  
Intervention a b a + b
Control c d c + d
  a + c b + d  

The most commonly encountered effect measures used in clinical trials with dichotomous data are:

  • Relative risk (RR): the ratio of the risk (i.e. the probability with which the outcome will occur) of the outcome in the two groups, or [a/(a+b)]/[c/(c+d)]. For example, a RR of 3 implies that the outcome with treatment is three times more likely to occur than without treatment;
  • Absolute risk reduction (ARR): the absolute difference of the risk of the outcome in the two groups, or [a/(a+b)]-[c/(c+d)];
  • Number needed to treat (NNT): the number of persons that need to be treated with the intervention in order to prevent one additional outcome, or 1/ARR.
  • For diagnostic accuracy studies, the results will be expressed as
  • Sensitivity: the proportion of true positives correctly identified by the test: Sens=a/a+c
  • Specificity: the proportion of true negatives correctly identified by the test: Spec=d/b+d
  • Positive predictive value: the proportion of patients with a positive test result correctly diagnosed: PPV=a/a+b
  • Negative predictive value: the proportion of patients with a negative test result correctly diagnosed: NPV=d/c+d
  • Likelihood ratio: likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that that same result would be expected in a patient without the target disorder LR+=(a/a+c)/(b/b+d); LR-=(c/a+c)/(d/b+d)
  • Diagnostic odds ratio: ratio of the odds of having a positive index test result in a patient with the target condition over the odds of having this test result in a patient without the target condition: OR=ad/bc
  Target condition
Positive
Target condition
Negative
Index test positive a b
Index test negative c d

As discussed above, other types than dichotomous data are possible, each with their own outcome measures and statistics. It is beyond the scope of this document to describe and discuss all these types. Interested readers are referred to textbooks such as Practical statistics for medical research (Altman 1991) Modern Epidemiology (Rothman and Greenland 1998) and Clinical epidemiology : a basic science for clinical medicine (Sackett 1991) .

10. Reporting of the literature review

A literature search should be reproducible and therefore explicitly documented. The report of a literature search should contain the following items:

1. Description of the search methodology:

a. Search protocol

i. Search question

ii. Searched databases

iii. Search terms, their combinations and the restrictions used (e.g. language, date)

iv. In- and exclusion criteria for the selection of the studies

b. Quality appraisal methodology

c. Data extraction methodology

2. Description of the search results:

a. Number of retrieved articles, in- and excluded studies, and reasons for exclusion; use of flow chart

b. Results of quality appraisal

c. Evidence tables for each search question

LITERATURE REVIEW AND INTERNATIONAL COMPARISON (HSR)

See the attached document

Table of content

  1. Introduction
  2. Literature review
    1. Introduction
    2. Literature review
    3. Meta review: reveiw of review
  3. International comparison
    1. Rationales for international comparisons in HSR research: does an international comparison serve your problem?
    2. Adapt the “set-up” of the international comparison to theproblem you want to address
AttachmentSize
2015-72-(Method)_HSR Process Notes Literature review and international comparison_Report_0.pdf430.75 KB

DATA - COLLECTION AND ANALYSIS

<This chapter will be published in September 2013>

QUALITATIVE DATA (GCP - HSR - HTA)

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Although there is no unified definition of qualitative research, most authors agree about its main characteristics. Creswell formulated it like this: “Writers agree that one undertakes qualitative research in a natural setting where the researcher is an instrument of data collection who gathers words or pictures, analyzes them inductively, focuses on the meaning of participants, and describes a process that is expressive and persuasive in language” (Creswell, 1998). The gathering of qualitative data takes many forms, but interviewing and observing are among the most frequently used, no matter the theoretical tradition of the researcher. 

1. How to chose a qualitative method?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

We so far identified 4 types of QRM suitable for the KCE research projects useful to describe in a first report: interviewing (individually or in focus groups), observing and structuring discussions among experts with a Delphi survey. Others should be developed in the future.

Before entering in the practical aspect of each method, we will briefly describe them in order to give some guidance to choose the most appropriate one.

  1. Semi-structured individual interview aims at searching for data through questioning the respondent using conversational techniques, “…being shaped partly by the interviewer’s pre-existing topic guide and partly by concerns that are emergent in the interview.” (Bloor and Wood, 2006, p. 104). “It gives the opportunity to the respondents to tell their own stories in they own words” (Bowling, 1997, p. 336). The use of such a method in the KCE context is appropriate when the aim is to identify different point of views, beliefs, attitudes, experience of people such patients, practitioners, stakeholders, etc. when no interaction between the respondents is required or appropriate (according to the topic for example). It could also be chosen because of practical reasons, e.g. when participants are not easily ‘displaceable’, or lack time.
  2. Focus groups is a form of semi-structured interview. It consists on “a series of group discussions held with differently composed groups of individuals and facilitated by a researcher, were the aim is to provide data (via the capture of intra-group interaction) on groups beliefs and group norms in respect of a particular topic or set of issues” (Bloor and Wood, 2006, p. 88). This is useful “where we need interactivity to enhance brainstorming among the participants, gain insights and generate ideas in order to pursue a topic in greater depth” (Bowling, 1997, p 352). Focus groups ‘”worked well and provide the richest data in relation to public’s view of priorities for health services and (…) were less inhibiting for respondent that one-to-one interviews (Bowling, 1997, p. 354).
  3. Observation is useful to understand more than people say about (complex) situations (Bowling, 1997). In the KCE context, it will be useful for site visits, when preparing a report on a hospital or a health service, a procedure, etc.
  4. The Delphi survey aims to achieve consensus or define positions among experts panelists, through iterations of anonymous opinions and of proposed compromise statements from the group moderator (Bloor and Wood, 2006). For KCE reports, this method could be useful for setting priorities, clarify acceptability of a new technology or system or innovations.
Figure 1 – Decision tree to chose between the methods proposed in this report
Decision tree

2. How to set up?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Following the discussions we have heard in the different focus groups, not every KCE researcher expressed the need to use or understand QRM. Nevertheless, for those interested in QRM, we try to respond to the different researchers’ needs through this report and the notes that will be published in the KCE process book based on the present report[1].


[1]           For further reading: Silverman (2011) 

2.1 Why opt for a qualitative approach?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

“The goal of qualitative research is the development of concepts which help us to understand social phenomena in natural (rather than experimental) settings, giving due emphasis to the meanings, experiences, and views of all the participants” (Mays, 1995,p. 43). This quotation gives a nice summary of the specificities of qualitative research methods, which are discussed below.

2.1.1       Specificities of qualitative research methods

First, qualitative research encompasses all forms of field research performed with qualitative data. “Qualitative” refers to data in non­numeric form, such as words and narratives. There are different sources for qualitative data, such as observations, document analysis, interviews, pictures or video’s, etc. Each of these data-gathering techniques has its particular strengths and weaknesses that have to be reflected upon when choosing for a qualitative research technique. In the social sciences, the use of qualitative data is also closely related to different paradigms trying to develop insight in social reality. Elaboration on these paradigms is however outside the scope of this report[1].

Second, the aim of qualitative research is developing a “thick description[2]” and “grounded or in-depth understanding” of the focus of inquiry. The benefits of well developed qualitative data-collection are precisely richness of data and deeper insight into the problem studied. They do not only target to describe but help also to get more meaningful explanations on a phenomenon. They are also useful in generating hypotheses (Sofaer, 1999). Types of research questions typically answered by qualitative research are “What is going on? What are the dimensions of the concept? What variations exist? Why is this happening?” (Huston,1998). Qualitative research techniques are primarily used to trace “meanings that people give to social phenomena” and “interaction processes”, including the interpretation of these interactions (Pope, 1995). “They allow people to speak in their own voice, rather than conforming to categories and terms imposed on them by others.” (Sofaer, 1999, p. 1105). This kind of research is also appropriate to investigate social phenomena related to health (Huston,1998).

Third, one of the key strengths of qualitative research is that it studies people in their natural settings rather than in artificial or experimental ones. Since health related experiences and beliefs are closely linked to daily life situations it is less meaningful to research them in an artificial context such as an experiment. Therefore data is collected by interacting with people in their own language and observing them in their own territory (Kirk, 1986) or a place of their own choice. This is also referred to as naturalism. Therefore the term naturalistic methods is sometimes used to denote some, but not all, qualitative research (Pope, 2006). Also this characteristic is not always relevant to the use of QRM at the KCE. For example focus group interviews are usually not performed in the natural setting of the participants, but rather in the setting of a meeting room.

A fourth feature of qualitative research in health care is that it often employs several different qualitative methods to answer one and the same research question (Pope, 2006). This relates partly to what is called triangulation (see part 2.2).

Finally, qualitative research is always iterative starting with assumptions, hypotheses, mind sets or general theories which change and develop throughout the successive steps of the research process. It is desirable to make these initial assumptions explicit at the beginning of the process and document the acquired new insights or knowledge at each step.

2.1.2       Qualitative versus quantitative approaches

Although it is meaningful to do qualitative research in itself, qualitative research is often defined by reference to quantitative research. Often it is assumed that because qualitative research does not seek to quantify or enumerate, it does not ‘measure’. Qualitative research generally deals with words or discourses rather than numbers, and measurement in qualitative research is usually concerned with taxonomies or classifications. “Qualitative research answers questions such as, ‘what is X, and how does X vary in different circumstances, and why’, rather than ‘how big is X or how many X’s are there?(Pope, 2006, p3).

By emphasizing the differences the qualitative and quantitative approach are presented as opposites. However, qualitative and quantitative approaches are complementary and are often integrated in one and the same research project. For example in mixed methods research the strengths of quantitative and qualitative research are combined for the purpose of obtaining a richer and deeper understanding (Zang, 2012). Also qualitative data could be analyzed in a quantitative way by for example counting the occurrence of certain words.

Often health services researchers draw on multiple sources of data and multiple strategies of inquiry in order to explore the complex processes, structures and outcomes of health care. It is common that quantitative and qualitative methods answer different questions to provide a well-integrated picture of the situation under study (Patton, 1999). Especially in the field of health services research qualitative and quantitative methods are increasingly being used together in mixed method approaches. The ways QRMs could be used combined or not, are:

 

  • Qualitative research only:
  • To know the variation in experiences related to health or illness.
  • To build typologies regarding health services use, patient attitudes, health beliefs, etc.
  • Qualitative preliminarly to quantitative:
  • To explore new area, new concepts, new behaviour, etc.(Pope, 1995) before to start with measurement.
  • To build quantitative data collection tools (questionnaires): using appropriate wording (Pope, 1995), variables to submit, to develop reliable and valid survey instruments (Sofaer, 1999), etc.
  • To pre-test survey instruments(Sofaer, 1999).
  • In supplement to quantitative work:
  • As a part of a triangulation process that consist in confronting results coming from several data sources (Pope, 1995).
  • To reach a different level of knowledge (Pope, 1995): “If we focus research only on what we already know how to quantify, indeed only on that which can ultimately be reliably quantified, we risk ignoring factors that are more significant in explaining important realities and relationships.” (Sofaer, 1999, p. 1102).
  • In complement to quantitative work by exploring complex phenomena or areas that are not reachable with quantitative approaches (Pope, 1995).
  • Sofaer (Sofaer, 1999) provides us the insight that in many cases, inquiry can move from being unstructured, largely qualitative in nature, to being structured and largely quantitative in nature. This is how she describes the continuum: “(…) there is uncertainty not only about answers, but about what the right questions might be; about how they should be framed to get meaningful answers; and about where and to whom questions should be addressed. As understanding increases, some of the right questions emerge, but uncertainty remains about whether all of the right questions have been identified. Further along, confidence grows that almost all of the important questions have been identified and perhaps framed in more specific terms, but uncertainty still exists about the range of possible answers to those questions. Eventually, a high level of certainty is reached about the range of almost all of the possible answers.” (p. 1103).
  • In sum, over time investigations related to a certain area, start with qualitative research to explore the field, find the right questions, prepare for more focused questions and discover theories and hypotheses. Next, quantitative research is in place to test hypotheses and finally, qualitative research can be used to deepen the findings or to search for explanations quantitative research techniques cannot provide.

 

 


[1]           For those interested we refer to Denzin and Lincoln, 2008 a, Denzin and Lincoln, 2008 b, Bourgeault et al., 2012 or in Dutch, Mortelmans, 2009

[2]           A “thick description” of a human practice or behavior include not only the focus of the study, but its context as well, such it becomes meaningful to an outsider. The term was introduced in the social science literature by the anthropologist C. Geertz in his essay in 1973

2.2 How to evaluate QRM?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

In this section we want to address quality criteria for the use and evaluation of qualitative research. At the one hand it should guide those who want to apply QRM in their research project(s), at the other hand KCE researchers asked for criteria that allow them to evaluate existing qualitative studies or publications resulting from qualitative studies, for example in function of a systematic review.

2.2.1       Usefulness of quality criteria to evaluate qualitative research

Whatever the method, it needs to be well-defined, well-argued, and well-executed” (Snijders, 2007)

The increasing demand for qualitative research within health and health services research has emerged alongside an increasing demand for the demonstration of methodological rigor and justification of research findings (Reynolds, 2011) . Not only is qualitative research challenged by the current evidence-based practice (EPB) movement in healthcare, also the emergence of meta-analyses (e.g. meta-synthesis) of qualitative research findings urges for quality criteria. Although in quantitative health sciences research, there exist widely-recognized guidelines, no comparable standardized guidelines exist for qualitative research. This can be explained by a lack of consensus related to how to best evaluate “rigor” in qualitative research (Nelson, 2008). Every qualitative paradigm has its own implications regarding the definition of good quality research. First, we  introduce the reader briefly in the debate about quality criteria, second, we present the framework of Walsh and Downe (Walsh, 2006) as the most complete and comprehensible list of quality criteria to appraise qualitative research studies, and the framework of Côté and Turgeon as a shorter and practical alternative. For other checklists we refer to Appendix 1.

Among qualitative researchers there is a debate going on between those demanding for explicit criteria, for example in order to serve systematic reviewing and evidence-based practice, and those who argue that such criteria are neither necessary nor desirable (Hammersley, 2007). The quest for quality criteria assumes that qualitative research is a unified field, but this image does not fit reality. In fact, apart from a variety of other positions (e.g. symbolic interactionism, hermeneutics, phenomenology, ethnography) three main paradigms can be discerned in relation to this discussion:

  • The interpretativist paradigm assumes that social realities are multiple, fluid and constructed. This framework values research that illuminates subjective meanings and multiple ways of seeing a phenomenon. These researchers question the need for and the utility of quality criteria for qualitative research or apply specific criteria for qualitative research, such as clear delineation of the research process, evidence of immersion and self-reflection, demonstration of the researcher’s way of knowing (e.g. tacit knowledge) (Cohen, 2008).
  • The positivist approach stands at the other end of the continuum and assumes that there is a single objective reality that is knowable. Positivists apply traditional quantitative criteria, such as validity and reliability to qualitative work.
  • The realist perspective is positioned in between. It maintains a belief in an objective reality, but knowledge of reality is always imperfect (Cohen, 2008). Realists use techniques such as triangulation, member validation of findings, peer review of findings, deviant or negative case analysis and multiple coders of data, to promote to verify findings. The realist perspective adopts a philosophy of science that is in line with positivism, but at the same time embracing the complexity of social life and recognizing the importance of social meanings. “By maintaining a belief in an objective reality and positing truth as an ideal qualitative researchers should strive for, realists have succeeded at positioning the qualitative research enterprise as one that can produce research which is valid, reliable, and generalizable, and therefore, of value and import equal to quantitative biomedical research” (Cohen, 2008, p. 336).

The position one takes in the debate about quality criteria is heavily influenced by the paradigm one feels most attracted to, or identifies with.

2.2.2       General quality criteria

Most of the quality criteria are applicable to all research, both quantitative and qualitative. For example in 2008, Cohen and Crabtree (Cohen, 2008) reviewed and synthesized published criteria for good qualitative research. They identified the following general evaluative criteria: 1) ethical research, 2) importance of the research, 3) clarity and coherence of the research report, 4) use of appropriate and rigorous methods, 5) importance of reflexivity or attending to researcher bias, 6) importance of establishing validity or credibility, 7) Importance of verification or reliability. Researcher bias, validity, and reliability are most heavily influenced by quantitative approaches. Table 6 bridges quantitative and qualitative research by illustrating the parallels between criteria for conventional quantitative inquiries and qualitative research.

Table 6 – Lincoln and Guba’s translation of terms

Quantitative research

Qualitative research

Methods to ensure quality

Internal validity

Credibility:

Are the findings credible?

Member checks[a]; prolonged engagement in the field; data triangulation

External validity

Transferability:

Are the findings applicable in other contexts?

Thick description[b] of setting and/or participants

Reliability

Dependability:

Are the findings consistent and could they be repeated?

Audit – researcher’s documentation of data, methods and decisions; researcher triangulation

Objectivity

Confirmability:

To which extend are the findings shaped by the respondents and not researcher bias, motivation or interests?

Audit and reflexivity – e.g. awareness of position as a researcher and its influence on the data and findings

Source: Adapted from Finley,2006

In what follows we pay attention to some keywords appearing in Table 6.

Reflexivity

“Reflexivity is an awareness of the self in the situation of action and of the role of the self in constructing that situation.” (Bloor and Wood, 2006, p. 145)

Because in qualitative research, the researcher could not be ‘blinded’, he/she has to take into account subjectivity in an explicit way. To demonstrate this reflexive awareness during the research process, the following ‘good practices’ can be used (Green, 2009, p. 195):

  • Methodological openness: report steps taken in data production and analysis, the decisions made, and the alternatives not pursued.
  • Theoretical openess: theoretical starting points and assumptions should be adressed.
  • Awareness of the social setting of the research itself: be aware of the interactivity between the researcher and the researched.
  • Awareness of the wider social context, including historical and policy contexts and social values.

Triangulation

Qualitative research is inherently multimethod in focus (Flick, 2002, p.226-227). However, the use of multiple methods, or triangulation, reflects an attempt to secure an in-depth understanding of the phenomenon in question. Objective reality can never be captured. We know a thing only through its representations. Triangulation is not a tool or a strategy of validation, but an alternative to validation (Flick, 2002, p. 227). The combination of multiple methodological practices, empirical materials, perspectives, and observers in a single study is best understood, then, as a strategy that adds rigor, breadth, complexity, richness, and depth to any inquiry (See Flick, 2002, p. 229)” (Denzin and Lincoln, 2008, p. 7).

Triangulation is the use of several scientific methods, both qualitative and quantitative, to answer the same research question (Bloor, 2006. Often triangulation is understood as producing the same results by means of several methods, sources or analysts. However, different methods or types of inquiry are sensitive to different nuances, so that they may lead to somewhat different results. In fact, triangulation is more about finding inconsistencies to gain deeper insight into the relationship between the inquiry approach and the subject under study. Thus, finding inconsistencies do not weaken the credibility of the results, but rather strengthen it (Patton, 1999).

Five kinds of triangulation can contribute to the quality and consistency of qualitative data analysis:

  1. Methods triangulation: Information obtained through several methods is compared. These methods can be qualitative, or quantitative or both. Often qualitative and quantitative data can be fruitfully combined as they mostly elucidate complementary aspects of the same phenomenon (Patton, 1999) .
  2. Triangulation of sources: Information derived at different times and by different means is compared, e.g. comparing observational data with interview data, but also comparing what people say in public with what they say in private (Patton, 1999) .
  3. Analyst triangulation: Several observers, interviewers, researchers or analysts are used. By this way the potential bias that comes from a single person doing all the data collection and/or data analysis is reduced. In addition to several researchers or data analysts, analytical triangulation may also be to have those who were studied review the findings (Patton, 1999) .
  4. Theory/perspective triangulation:  It involves the use of different theoretical perspectives to look at the same data. Also, for example, data can be examined from the perspective of various stakeholder positions (Patton, 1999) .
  5. Member validation: It is a popular kind of triangulation that consists of “checking the accuracy of early findings with research respondents” (Bloor and Wood, 2006, p. 170).

These kinds of triangulation protect the researcher against the accusation that findings are an artifact of a single method, or source or investigator’s biases (Patton, 1999).

Transferability

Earlier in this report we argued that qualitative research is context sensitive and it is not aimed at making generalizations to the wider population. This may appear to contradict with the notion of transferability which is just about the extent to which findings of one study can be applied to other situations (external validity) (Merriam, 1998).

Transferability refers to the responsibility of the researcher to provide sufficient contextual information about the fieldwork to enable the reader to determine how far he can be confident in transferring the findings to other situations (Firestone, 1993). However, the situation might be complicated by the possibility that factors considered by the researcher to be unimportant, and consequently unaddressed in the research report, may be critical in the eyes of a reader(Firestone, 1993) .


2.2.3       Checklists

We have found four papers (Reynolds, 2011; Walsh, 2006; Cohen, 2008; Côté and Turgeon, 2005) reviewing the literature on quality criteria or guidelines for qualitative research. One of them (Walsh, 2006) provides us with a synthesis of eight existing checklists and summary frameworks (see Table 7). This checklist is quite detailed and is designed in function of meta-synthesis, which is a kind of systematic review of qualitative research papers.

The list of criteria was built in order to rigorously appraise studies first before submitting them to the meta-synthesis technique. Agreement on criteria to judge rigor was necessary in order to decide which studies to include in the meta-synthesis. Walsh and Downe (Walsh, 2006) tabulated the characteristics mentioned in each of the papers in their review. Then they mapped together the characteristics given in all the included papers, sorting them by the number of checklists in which they appeared. In the next step both authors independently attempted a synthesis before coming together to discuss. Redundant criteria were excluded if both authors agreed that the exclusion would not change the final judgment on the meaningfulness and applicability of a piece of qualitative research. Finally the table below was constructed, structured into three columns, namely stages, essential criteria and specific prompts. Although some criteria may seem self-evident, others are less obviously fundamental (Walsh, 2006). This list of criteria is very detailed. In some studies, especially those with short time frame, a shorter and more pragmatic hands-on list could be practical. Therefore we also added the grid of  Côté and Turgeon [c] (Table 8) which is shorter, adapted to the specific context of heath care and easier to use for researchers who are less familiar with qualitative research. Other checklists are described in Appendix 1.

The use of a checklist may improve qualitative research, however they should be used critically: not every criterion is appropriate to every research context (Barbour, 2001). For example the list of Coté and Turgeon mentions interpretation of results in an innovative way as a quality criterion (point 10, Table 8), while this is not necessarily the case. Most important is a systematic approach during research process. For example the credibility of data analysis could encompass the use of software (Table 7), triangulation and/or member checking (point 7, Table 8), whereas a systematic approach with a detailed description of each step in the research process could have been sufficient.

 

Table 7 – Summary criteria for appraising qualitative research studies

Stages

Essential criteria

Specific prompts

Scope and purpose

Clear statement of, and rationale for, research question / aims / purposes

  • Clarity of focus demonstrated
  • Explicit purpose given, such as descriptive/explanatory intent, theory building, hypothesis testing
  • Link between research and existing knowledge demonstrated

 

Study thoroughly contextualized by existing literature

  • Evidence of systematic approach to literature review, location of literature to contextualise the findings, or both

Design

Method/design apparent, and consistent with research intent

  • Rationale given for use of qualitative design
  • Discussion of epistemological/ontological grounding
  • Rationale explored for specific qualitative method (e.g. ethnography, grounded theory, phenomenology)
  • Discussion of why particular method chosen is most appropriate/sensitive/relevant for research question/aims
  • Setting appropriate

 

Data collection strategy apparent and appropriate

  • Were data collection methods appropriate for type of data required and for specific qualitative method?
  • Were they likely to capture the complexity/diversity of expereince and illuminate context in sufficient detail?
  • Was triangulation of data sources used if appropriate?

Sampling strategy

Sample and sampling method appropriate

  • Selection criteria detailed, and description of how sampling was undertaken
  • Justification for sampling strategy given
  • Thickness of description likely to be achieved from sampling 
  • Any disparity between planned and actual sample explained 

Analysis

Analytic approach appropriate

  • Approach made explicit (e.g. thematic distillation, constant comparative method, grounded theory)
  • Was it appropriate for the qualitative method chosen?
  • Was data managed by software package of by hand and why?
  • Discussion of how coding systems/conceptual frameworks evolved
  • How was context of data retained during analysis
  • Evidence that the subjective meanings of participants were portrayed
  • Evidence of more than one researcher involved in stages if appropriate to epistemological/theoretical stance
  • Did research participants have any involvement in analysis (e.g. member checking)
  • Evidence provided that data reached saturation or discussion/rationale if it did not
  • Evidence that deviant data was sought, or discussion/rationale if it was not

Interpretation

Context described and taken account of in interpretation

  • Description of social/physical and interpersonal contexts of data collection
  • Evidence that researcher spent time ‘dwelling with the data’, interrogating it for competing/alternative explanations of phenomena

 

Clear audit trail given

  • Sufficient discussion of research processes such that others can follow ‘decision trail’

 

Data used to support interpretation

  • Extensive use of field notes entries/verbatim interview quotes in discussion of findings
  • Clear exposition of how interpretation led to conclusions

Reflexivity

Researcher reflexivity demonstrated

  • Discussion of relationship between researcher and participants during fieldwork
  • Demonstration of researcher’s influence on stages of research process
  • Evidence of self-awareness/insight
  • Documentation of effects of the research on researcher
  • Evidence of how problems/complications met were dealt with

Ethical dimensions

Demonstration of sensitivity to ethical concerns

  • Ethical committee approval granted
  • Clear commitment to integrity, honesty, transparency, equality and mutual respect in relationships with participants
  • Evidence of fair dealing with all research participants
  • Recording of dilemmas met and how resolved in relation to ethical issues
  • Documentation of how autonomy, consent, confidentiality, anonymity were managed

Relevance and transferability

Relevance and transferability evident

  • Sufficient evidence for typicality specificity to be assessed
  • Analysis interwoven with existing theories and other relevant explanatory literature drawn from similar settings and studies
  • Discussion of how explanatory propositions/emergent theory may fit other contexts
  • Limitations/weaknesses of study clearly outlined
  • Clearly resonates with other knowledge and experience
  • Results/conclusions obviously supported by evidence
  • Interpretation plausible and ‘makes sense’
  • Provides new insights and increases understanding
  • Significance for current policy and practice outlined
  • Assessment of value/empowerment for participants
  • Outlines further directions for investigation
  • Comment on whether aims/purposes of research were achieved

Source: Walsh and Downe, 2006

Table 8 – Grid for the critical appraisal of qualitative research articles in medicine and medical education

 

Yes

+/-

No

Introduction

     

1. The issue is described clearly and corresponds to the current state of knowledge.

     

2. The research question and objectives are clearly stated and are relevant to qualitative research (e.g. the process of clinical or pedagogical decision-making).

     

Methods

     

3. The context of the study and the researchers’ roles are clearly described (e.g. setting in which the study takes place, bias).

     

4. The method is appropriate for the research question (e.g. phenomenology, grounded theory, ethnography).

     

5. The selection of participants is appropriate to the research question and to the method selected (e.g. key participants, deviant cases).

     

6. The process for collecting data is clear and relevant (e.g. interview, focus group, data saturation).

     

7. Data analysis is credible (e.g. triangulation, member checking).

     

Results

     

8. The main results are presented clearly.

     

9. The quotations make it easier to understand the results.

     

Discussion

     

10. The results are interpreted in credible and innovative ways.

     

11. The limitations of the study are presented (e.g. transferability).

     

Conclusion

     

12. The conclusion presents a synthesis of the study and proposes avenues for further research.

     

Source: Côté and Turgeon,2005


2.2.4       Conclusion

To conclude this chapter on quality criteria we wish to warn against a rigid use of checklists and quality criteria in qualitative research and to argue instead for flexible use. Moreover this also applies to quantitative research.

Barbour criticizes the widespread use and description of assumed quality indicators like theoretical sampling, grounded theory, multiple coding, and triangulation in scientific articles, as an unequivocal guarantee of robustness. These dimensions of qualitative research should be embedded within a broader understanding of the qualitative research design and not “stuck on as a badge of merit” (Barbour, 2001, p. 1115).

We agree with Walsh and Downe (Walsh, 2006) that a checklist is indicative of good quality research, but not a guarantee.

Key messages

  • Although in quantitative health sciences research, there exist widely-recognised guidelines, no comparable standardised guidelines exist for qualitative research.
  • Among qualitative researchers there is a debate going on between those demanding for explicit criteria, for example in order to serve systematic reviewing and Evidence-Based Practice, and those who argue that such criteria are neither necessary nor desirable.
  • The framework of Walsh and Downe as an comprehensible example of quality criteria checklist to appraise qualitative research studies. The grid of Côté and Turgeon is more simple and could be recommended as tool for evaluation in KCE reports.

 


[a]           Informants may be asked to read transcripts of dialogues in which they have participated to check whether their words match with what they actually intended (Shenton 2004), or they may be asked to check the accuracy of early findings (Bloor 2006) 35.

[b]           Thick description refers to rich qualitative data allowing not only the description of social behaviour, but also to connect it to the broader context in which it occurred (Mortelmans 2009).

[c]           A French-speaking version is also available (Côte 2002)

3. How to collect?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

<This chapter will be published in December 2013>

3.1 Interviewing (individuals, groups)

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

There are many ways to interview people, e.g. individually or in focus groups. However, they share some general principles and techniques. Therefore in what follows we address the general principles. After that we present a chapter on individual semi-structured interviews and a chapter on focus groups. 

3.1.1 General principles

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

General principles addressed in this chapter are (1) planning, (2) sampling issues, (3) the development of a topic list or interview guide, (4) running the data collection, (5) preparation of the data, (6) data analysis, (7) the validation of findings, (8) how to report and (9) common pitfalls.

3.1.1.1        How to plan the research design?

As with any data collection, interviewing (individually or in focus groups) has to be planned within the overall research approach taking into account the particular aims of the qualitative data collection.

The planning of data collection has to be prepared early in the process of the overall research. Qualitative research is time consuming, on the level of data-collection, data-analysis and reporting. All the steps are presented in the next figure.


Figure 2 – Flowchart: interviewing people

Flow chart interviewing people

 

3.1.1.2        Sampling issues in qualitative research: who and how many?

Selection of participants

In qualitative research we select people who are likely to provide the most relevant information (Huston 1998). In order to design the sample and cover all variability around the research issue, the researchers must have an idea about the different perspectives that should be represented in the sample. This is called “field mapping” of the key players who have a certain interest in the problem under study. The role of this explicit “field mapping” is often underestimated but essential in order to build a purposive sample. It is possible that this “field map” evolves during the data collection. The notion of “representativeness” here is not understood in the statistical way. The idea of representation is seen as a “representation of perspectives, meanings, opinions and ideas” of different stakeholders in relation to the problem researched and their interest. In order to select the participants for interviews or focus groups, one should ask “do we expect that this person can talk about (represent) the perspectives (meanings given to the situation) of this stakeholder group”. The aim is to maximize the opportunity of producing enough data to answer the research question (Green 2004).

Ideally there should be a mixture of different “population characteristics” to ensure that arguments and ideas of the participants represent the opinions and attitudes of the relevant population. Also the unit of analysis should be taken into account. This could be for example “individuals for their personal opinions/experience/expertise” or “individuals because they represent organizational perspectives”.

Moreover in order to make comparisons within and between types of participants, the sample design should take this already into account. In Table 9, two criteria for comparison, for example age and socio-economic status, are already included to allow comparative analysis between age or status groups.


Sampling approaches

There is a wide range of sampling approaches (e.g. Miles and Huberman 1994, Patton 2002, Strauss and Corbin 2008). It is not uncommon in qualitative research that the research team continues to make sampling decisions during the process of collecting and analysing data. However, a clear documentation of the sampling criteria is needed when doing qualitative research. These criteria should cover all relevant aspects of the research topic. The researcher should identify the central criteria and translate them in observable sample criteria. In addition, the chosen criteria should leave enough variation to explore the research topic (Mortelmans, 2009). For example, in a research about factors influencing the decision to have or refrain from having a refractive eye surgery in the two last years, sampling criteria were:

  1. To have experienced or to have considered a refractive surgery. We want to explore both the pro and cons.
  2. To be older than 20 and younger than 70. Refractive eye surgery is not an option for those younger than 20 or older than 70.

In what follows we describe a number of sampling strategies. All the sampling strategies are non-probabilistic. A randomized sample is not useful in qualitative research, since generalizability to the general population is not the aim. Moreover with a random sample the researcher would run the risk of selecting people who have no link with the research subject and thus nothing to tell about it (Mortelmans, 2009). In purposive sampling the point of departure are the sampling criteria as described above. There are different forms of purposive sampling:

  • Stratified purposive sampling46:      
    Purposive samples can be stratified (or nested) by selecting particular persons that vary according to a key dimension/characteristic (e.g. a sample of people from large hospitals, and a different sample with people from small hospitals) and the selection ideally represents the different positions within the ‘system’ or phenomenon under investigation. The stratification criteria are the equivalent of independent variables in quantitative research. The researcher should think ahead about independent variables which could provide new information regarding the research topic. For example, in the research project on refractive eye surgery we expected that reasons to chose or refrain from chosing for refractive eye surgery vary with age, with financial resources and can be different in the Dutch- and French-speaking part of the country. Therefore we added age, socio-economic status and region as criteria introducing heterogeneity. This results in the following matrix:
  • Homogeneous sampling:   
    In the case of homogeneous sampling variation between respondents is minimised. Participants are chosen because they are alike, in order to focus on one particular process or situation they have in common (Mortelmans, 2009) . However the homogenous character does not exclude comparisons between types of participants, because for example unanticipated dimensions might emerge from the data. It is also useful to take into account hierarchy, hence not to put for example nurses and specialists working in the same hospital together in a focus group, as this might create bias in the responses.This sampling strategy is used when the goal of the research is to develop an in-depth understanding and description of a particular group with similar characteristics or people on equal foot. For example for the KCE research project on alternative medicines48-50 only regular users were sampled.

Table 9 – Example of stratified purposive sample

 

Already had eye surgery or surgery planned

Considered eye surgery but refrained from having it

Age

20-30

31-40

>40

20-30

31-40

>40

Socio-economic status

a

b

c

a

b

c

a

b

c

a

b

c

a

b

c

a

b

c

Number of respondents

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

  • Heterogeneous or maximum variation sampling :
    In the case of heterogeneous sampling variation between repondents is maximised, relevant to the research question.
  • Extreme or deviant cases sampling:
    For some purposes it can be useful to search for outliers or highly unusual persons or representatives of opinions. A selection of persons that, emerging from an analysis, appear to be the 'exception to the rule' could be considered to get a better understanding of these outliers or “negative cases”. The process of identifying extreme or deviant cases occurs after of the data collection and analysis have been partially completed. Therefore it is a sampling strategy which is always conducted as complementary to other sampling strategies.
  • Typical case:         
    Cases are selected from which it is expected that they will provide information about a typical situation. This strategy is used in case of a new research area. If knowledge about a research topic is completely absent, a typical case can provide the basic knowledge necessary to construct theoretical explanations, preparatory to the search for more variation in cases. The typical case is one that occurs frequently (Mortelmans, 2009).
  • Critical case sampling:       
    This sampling is especially used in case studies, a research strategy “to understand social phenomena within a single or small number of naturally occurring settings. The purpose may be to provide description through a detailed example (…)” (Bloor and Wood, 200635, p. 27). It can be used when time or resource constraints limit the possibilities to recruit participants. A small number of important cases is selected to "yield the most information and have the greatest impact on the development of knowledge" (Patton, 200246, p. 236). It is crucial that the research team identifies the dimensions that make the participants “critical”. Snowball sampling can be used to identify critical informants who can provide a great deal of information about a phenomenon.
  • Theory-based or theoretical sampling:         
    Theoretical sampling refers to the process of selecting "incidents, slices of life, time periods, or people on the basis of their potential manifestation or representation of important theoretical constructs" (Patton, 199928, p. 238).
  • Confirming and disconfirming cases:          
    Identification of confirming and disconfirming case occurs after data collection and analysis has partially been completed. Cases are sought to lend further support to an initial analysis or theory (confirming cases), or to disconfirm the theory and provide rival explanations (disconfirming cases). Researchers seek out confirming and disconfirming cases in order to develop a richer, more in-depth understanding of a phenomenon and to lend credibility to one's research account.

Recruitment strategies

In order to achieve the expected sample, several ways to find and recruit participants could be suggested:

  • Convenience sampling:      
    It is a pragmatic solution, i.e. selecting respondents based on ease, speed, and low cost, without any in-depth considerations on the selection of the participants. This strategy should ideally be avoided33, but in some cases it is the only feasible option due to practical reasons (such as time, costs, etc.). A good description of the sample is especially important with convenience sampling, so that the reader can know how the results came about.
  • Snowball sampling:
    This strategy is especially used when the researcher has no clear idea about where to search for respondents or who could provide him with the information he envisions. Hence the researcher searches for one or a few respondents and asks them who else they know could provide information. These individuals are contacted and in their turn asked whether they know other potential respondents. Once indivuals the same names are mentioned, the sample has reached his maximum size (Mortelmans, 2009) .
  • At random, but still purposive:
    As already mentioned above a random selection as such is not useful in qualitative research. However, there is one exception: random selection can be used when the researcher by using one of the sampling strategies mentioned above, gets more cases than he can interview or observe with his available time and resources. In that situation randomness can be an additional selection criterion (Mortelmans, 2009).

Sample size

“Determining adequate sample size in qualitative research is ultimately a matter of judgement and experience in evaluating the quality of the information collected against the uses to which it will be put, the particular research method and purposeful sampling strategy employed, and the research product intended” (Sandelowski, 199551, p. 199).

Typically, in qualitative research one should continue sampling until saturation is reached– this is the point at which no new information or themes are emerging from the data35. Therefore sampling goes hand in hand with data analysis and cannot be planned totally in advance. In reality in every research institution, the sample size is also determined in function of the budget, the time and human resources available. This means often practical aspects of the research project may constrain the size of the sample before theoretical saturation is reached. This is also true for KCE working practice, since budgets and time schedules are limited and fixed.

Beware that saturation can be reached prematurely if one's sampling frame is too narrow, if one's analytical perspective is biased or limited; if the data collection method is not resulting in rich, in-depth information or when the researcher is unable to get beyond the surface.

First contact with a respondent

  • The first contact with a respondent is often made by telephone. It is very important as it will set the tone for the rest of the interviewing. During this telephone conversation the researcher must convince the respondent of the importance of the research and his participation. To convince the researcher could search for arguments that are important in the eyes of the respondent, rather than arguments in function of the importance of the research. Not too much information should be provided during this first contact. Additional information can be provided by means of an information letter. Box 1 presents information that can be provided during the invitation to participate.

Box 1: Information to be given during first contact

Background information

  • Goal of the interview
  • Person responsible for the research

  • Reason why the respondent is invited to participate
  • How the respondent was recruited
  • Reason why the respondent is called at that specific moment in time
  • The recording of the interview

Arguments pro praticipation

    How the results of the research will be reported, including (non)anonymity issues (e.g. in quotations)Influence of the results on policy making

Costs of the participation

  • The kind of information the respondent is expected to provide
  • The duration of the interview
  • (Non) anonymity of the provided information
  • How the recording of the interview will be treated after the interview

Source: Adapted from Emans, 198652 cited by: Mortelmans, 2009

 

It is important that people understand that participation in interviews or focus groups is completely voluntary, and that they may choose to leave at any time during the discussion. In addition, it is imperative that participants are aware that they will receive no tangible benefit for participation. That is why the question on offering incentives is often rather contentious37. Nevertheless their traveling costs could be reimbursed or they can receive a slight compensation or a small gift.

Also it is recommended to leave the choice of place (where the interview will take place) up to the respondent, in order to facilitate his participation. The context in which the interview takes place determines partly the interactions during the interview. For example a patient at home or in the waiting room of a hospital will disclose other kinds of information, not only because he/she feels more or less comfortable, but also because the setting triggers other associations and thoughts. The interviewer/researcher should be well aware of and anticipate the impact the interview location is likely to have on the data generated.

The same accounts for the characteristics of the interviewer. In the qualitative interview the researcher empathizes with his or her respondents and views their situation from their own points of view53. In general this empathic stance as well as gaining trust from the respondent, is facilitated if the interviewer resembles the respondent in terms of race or other characteristics relevant to the research topic. Gender however is an exception to this rule. There is a debate in the literature about whether same sex or opposite sex is preferable in order to achieve rapport during interviews. Some argue that men are more comfortable in talking with women (especially about intimate topics) that they are with other men53.


3.1.1.3        How to develop an interview guide?

An interview guide should be adapted to the language and vocabulary of the participant(s) and is generally built out of three components:

  1. A reminder of the goal of the research.
  2. The main topics or questions, the interviewer wants to address during the interview.
  3. Relaunching questions. They are an essential part of the interview. It may happen that the interviewee does not give an answer to the question or gives an unexpected answer. In that case the interviewer can probe in order to delve deeper. In case a respondent does mention an aspect you thought of in advance or you are particularly interested in, you can repose the question focused on that specific issue. For example the initial question could be: “Which difficulties you experienced after your surgery?”. The respondent mentions all kinds of worries and inconveniences, but you are particularly interested in the organization of after care. Hence you could ask: “How did you experience the organization of after care?”.

How to construct a topic list or semi-structured questionnaire?

A topic list covers all the topics the interviewer should ask during the interview. It enables the interviewer to guide the interview while allowing the discussion to flow naturally. The sequence of topics generally moves from the general to the specific. The sequencing of topics can be introduced in a flexible way, and within a general framework of topics, the focus of the discussion can be reset. A topic list is also used in preparation of the semi-structured questionnaire

In a questionnaire semi-structured questions are formulated in speaking language and are posed as such during the interview. The same questions with the same formulation, sometimes in the same sequence, are posed in each interview. The disadvantage however is that it can threaten the natural flow of the conversation.

Both for the topic list and the semi-structured questionnaire, questions/topics should evidently be selected in function of the research objectives. An open ended-formulation of the questions is important in order to enable the interviewee to talk freely without predispositions of the interviewer influencing the narrative. For example, rather than asking “Did you worry about the surgery?”, one could ask “How did you feel about the surgery?”.

A topic list or questionnaire may be adapted or improved in the course of the research, in line with the iterative nature of QRM (see 1.1.1). The more interviews you have done, the more you know and the more specific or detailed your questions can be (Mortelmans, 2009). However, continuity should be guarded. The topics of the first interview should also be represented in the following interviews, although the latter can also contain much more detailed questions.

For an example of a topic list and a semi-structured questionnaire, see Appendix 6 and Appendix 7 respectively.

What types of questions can be posed?[a]

The interview starts with an easy opening question which is mostly to set the interviewee at ease, break the ice and get to know each other. With this question the researcher does not expect to get a lot of useful information, the main function is to start up the conversation.

After that the conversation is started with a first general and easy to answer question addressing the content of the research. It can be an attitude question to enable the respondents to roll into the conversation. An example could be: “If you hear breast cancer screening, what are your first thoughts?”.

Next, transition questions involve the respondents in the research subject, for example through asking questions about personal experiences or specific behavior regarding the topic. Attitudinal questions are more difficult to answer and should therefore be addressed later in the interview. An example is “How did you experience your eye surgery?”.

Subsequently the key questions are addressed. These questions are the reason why the interview is done. The interviewer can make clear that the interviewee can take some time to answer these questions. An interview can count up to five key questions each taking up to fifteen minutes to answer them.

Finally, the interview is terminated by means of a concluding question and thanking the interviewee for his participation. Three types of concluding questions can be distinguished:

  1. Summary questions provide the interviewee with a summary of what he has told the interviewer,
  2. Final questions can address elements that have not been mentioned during the interview, for example: “Do you want to add something to this interview?”. Make sure you allow enough time for the concluding questions.

It is useful to conduct a pilot (focus group) interview in order to test, assess and validate the format and the appropriateness of the topic guide or questionnaire.

3.1.1.4        How to run the data collection?

Preparations for the interview

Preparations for the interview encompass the recruitment of participants and the making of appointments, becoming knowledgeable about the research topic, including learning the interview guide by heart, anticipating questions of participants regarding the research project, access to a physical space where the interviews can take place and preparation of the recording equipment54. Well functioning of the recorders is crucial, so batteries, tapes and microphones should be carefully checked. It could be practical to foresee a second recorder as back-up. Finally also a notebook, a pen, and of course the topic list or interview guide you prepared for the interview should not be forgotten.

Box 2: What to take to the interview?

Equipment

  • digital tape recorder (plus 1 extra, if available)
  • Spare batteries
  • Field notebook and pens

Interview packet

  • 1 interview guide (in the appropriate language)
  • informed consent forms (2 per participants: 1 for interviewer, 1 for each participant, in the appropriate language)
  • Participant reimbursement (if applicable)

Source: Adapted from Mack, 200554

Running the interview

Informed consent should be obtained from each participant before starting the interview. Also permission should be asked to record the interview. Also it should be explained how the tapes will be used and stored.

The research aims should be briefly repeated. Probably the research aims were already explained during the first contact with the respondent in order to convince him of participating. Next, all the topics or questions on the checklist or questionnaire need to be addressed. Participants are probed for elaboration of their responses in order to learn everything they want to share about the research topic54. Mobile phones should be switched off during the interview so as not to imply that the participant’s testimony is of secondary importance.

During the interview back-up notes could be taken, the interviewee’s behaviors and contextual aspects of the interview should be observed and documented as part of the field notes. Field notes are expanded as soon as possible after each interview, preferably within 24 hours, while the memory is still fresh54.

To get deeper or redirect the discussion, probing techniques can be used:

  • Repeat the question but in a different wording.
  • Summarise the anwer the relevant aspects of the interviewee’s answer, in an interrogative way. For example: “In sum, you say that…?”
  • Probe explicitly, for example: “What do you mean?” or “Could you give me a second example?”
  • Purposive probing, for example: “Why was it that you?” or “What happened then?”
  • Repeat the last couple of words in an interrogative way. For example: “R: (…) I think it is dangerous and I don’t trust doctors”. I: ”You don’t trust doctors?”
  • Introduce a short silence.
  • Verbalise emotions, for example: “I can see that thinking of that discussion makes you very angry.”

The interview is closed by thanking the participant(s).

3.1.1.5        How to prepare the data for analysis?

Transcribing is the procedure for producing a written version of the interview. Ideally, the information recorded during the interview will need to be transcribed in order to enable accurate data analysis. A transcript is a full written literal text of the interview. It often produces a lot of written text.

Good quality transcribing is not simply transferring words from the tape to the page. The wording communicates only a small proportion of the message. A lot of additional information is to be found in the way people speak. Tone and inflection, timing of reactions are important indicators too. With experienced observers and note-takers, a thematic analysis of the notes taken during the interviews could be used as a basis for analysis of the “non-verbal” aspects.

Transcribing is a time consuming and costly part of the study. The research team should consider in advance the question "who should do the transcribing”? Resources may be needed to pay an audio typist, a strategy usually more cost effective than a researcher. Be aware that “typists” are often unfamiliar with the terminology or language used in the interviews which can lead to mistakes and/or prolong the transcribing time.

It may not be essential to transcribe every interview. It is possible to use a technique known as tape and notebook analysis, which means taking notes from a playback of the tape recorded interview and triangulating them with the notes taken by the observers and note-takers. However, bias can occur if inexperienced qualitative researchers attempt tape and notebook analysis. It is certainly preferable to produce full transcripts of the first few interviews. Once the researcher becomes familiar with the key messages emerging from the data tape analysis may be possible. Transcripts are especially valuable when several researchers work with the same data.

3.1.1.6        How to analyse the data?

As in any research method, analysing collected data is a necessary step in order to draw conclusions. Analysing qualitative data is not a simple nor a quick task. Done properly, it is systematic and rigorous, and therefore labor-intensive and time-consuming “[…] good qualitative analysis is able to document its claim to reflect some of the truth of a phenomenon by reference to systematically gathered data”55, in contrast “poor qualitative analysis is anecdotal, unreflective, descriptive without being focused on a coherent line of inquiry.”55"(Pope, 200056 p. 116). Qualitative analysis is a matter of deconstructing the data, in order to construct an analysis or theory (Mortelmans, 2009)  .

The ways and techniques to analyse qualitative data are not easy to describe as it requires a lot of “fingerspitzengefühl” and it is unrealistic to expect a kind of recipe book which can be followed in order to produce a good analysis. Therefore what we present here is a number of hands-on guidelines, which have proven useful to others.

The difficulty of qualitative analysis lies in the lack of standardization and the absence of an universal set of clear-cut procedures which fit every type of data and could be almost automatically applied. Also there are several approaches for taking the analysis forward: for example thematic analysis, the general inductive approach, grounded theory or the framework approach. These approaches move from inductive to more deductive, but in practice the researchers often moves back- and forwards between the data and the emerging interpretations. Hence induction and deduction are often used in the same analysis. Also elements from different approaches may be combined in one analysis3.

The approach chosen depends largely on the design and the aims of the research. Some designs and/or research questions require an inductive, others a deductive approach. Inductive means that themes emerge from the data, while deductive implies a pre-existing theory or framework which is applied to the data. Different aims may also require differing depths of analysis. The analysis of an interview-based study will probably be more detailed than the analysis of a small number of interviews carried out as an exploratory part of a mixed method study encompassing several components. “The analysis may seek simply to describe people’s views or behaviors, or move beyond this to provide explanation that can take the form of classifications, typologies, patterns, models and theories (Pope and Mays, 20063, p. 67).” The two levels of analysis can be described as following:

  • The basic level is a descriptive account of what was said (by whom) related to particular topics and questions. Some texts refer to this as the “manifest level” or type of analysis.
  • The higher level of analysis is interpretative: this is the level of identifying the ”meanings” of the responses. It is sometimes called the latent level of analysis. This second level of analysis can to a large degree be inspired by theories.

The steps in the analysis

  1. Independent of the approach of analysis, a qualitative analysis always starts with the preparation of the gathered data and a first reading of the interview transcripts and/or field notes to get familiar with the data.
  2. The following step is very laborious (especially with large amounts of data) and consists of reading and re-reading the data in order to develop an profound knowledge of the data. This way an initial set of themes or categories is identified. This means pieces of text are coded, this means given a label or a name. In fact meanings are isolated in function of answering the research question. One piece of text may belong to more than one category or label. Hence there is likely to be overlap between categories. Non-relevant data is omitted. Opposing or non-consensual visions related to themes or items discussed have to be documented/reported. Major attention should be paid when “rival explanations” or interpretations about the data can be made.
  3. In a third step the categories are further refined and reduced by being grouped together. Word processors or software for qualitative data analysis will prove to be very helpful at this stage. During the analysis the researcher might (as a third step) constantly compare the constructed categories with new data, and the new categories with already analysed data. This results in a kind of inductive cycle of constant comparison to fine tune categories and concepts arising from the data. New data collection could also be necessary to verify new point of views or insights emerging from the analysis.

This general a-theoretic procedure reflects what in the literature is called the general inductive approach for analysing qualitative data. It does not aim at the construction of theories, but the mere description of emerging themes. It provides a simple, straightforward approach for deriving findings in the context of focused research questions without having to learn an underlying philosophy or technical language associated with other qualitative analysis approaches57.

Figure 3 – Conceptual representation of iterative process of qualitative analysis with an inductive approach

Coneptual representation of iterative processof qualitative analysis

Data analysis is generally expected to be an iterative process.Especially in the grounded theory approach constant comparative analysis is emphasized. This means that overall data collection and data-analysis are not organized in a strict sequential way. Constant comparative analysis is a process whereby data collection and data analysis occur on an ongoing basis. The interview is transcribed and analysed as soon as possible, preferably before the next interview takes place. Any interesting finding is documented and incorporated into the next interview. The process is repeated with each interview until saturation is reached. As a result it could be possible that the initial interviews in a research project differ a lot from the later interviews as the interview schedule is continuously adapted and revised. For this reason researchers have to clarify and document on how structured or unstructured their data-collection method is and keep memo’s of the process. Notes and observations made at the time of the interview are re-examined, challenged, amended, and/or confirmed using transcribed audio or video tapes. One expects that all members of the research team participate in a review of the final interpretation, in which data and analysis are again re-examined, analysed, evaluated, and confirmed. The use of more than one analyst can improve the consistency or reliability of analyses. 

Theoretical approaches

The process described above is inductive and is in line with the principles of e.g. grounded theory and the general inductive approach.

Grounded theory is more than a way to analyse data, it is a complete methodology, a way of conceptualizing a qualitative research project. Key to grounded theory is the emphasis on theory as the final output of research58. Other approaches may stop at the level of description or interpretation of the data.”The appeal of grounded theory analysis is the structured and detailed procedures for the generation of theory from data”59. The steps in the procedure are:

 

  • Open coding, or the identification of an initial set of themes or categories (called codes) (see step 2 in the analysis)
  • Axial coding, or the identification of relationships between themes or categories. This is a way of refining the initial codes (see step 3 in the analysis)
  • Selective coding, or the movement towards “the development of analytical categories by incorporating more abstract and theoretically based elements” (Pope and Mays, 20063, p. 71).

The term ‘grounded theory’ has often been used in research papers, without any mention of the elements described above3. The grounded theory approach is popular, although this kind of analysis is unpredictable to some extent: “it is seldom possible to specify precisely the dimensions or direction of the research at the outset” (Pope and Mays, 200660, p. 71). This characteristic is often problematic for research funders, program managers and ethical committees in general, and at the KCE in particular.

Within the KCE context in particular and applied and policy research in general a more deductive approach could be useful. Often several forms of data gathering and analysis are used within one research project, hence qualitative research is often only one component within a broader whole. The other parts often provide input and may lead to the development of a preliminary framework which may guide the qualitative data analysis.

Framework analysis has been developed specifically for applied or policy relevant qualitative research, and is used in deductive research strategies. In a framework analysis the objectives of the investigation are set in advance. The thematic framework for the content analysis is identified before the research or the qualitative research part in the project commences.

The decision on using frameworks when analysing data is closely related to the question for what purpose the qualitative material will be used in the overall research strategy. “Frameworks” are generally deducted from hypotheses of theoretical frameworks: e.g. if the aim of a focus group is trying to get a picture of stakeholders interests and potential conflicting perspectives on a health care issue, and the focus group tries to grasp on how stakeholders develop power plays or influence strategies to set agenda’s, a conceptual framework of decision-making processes and power play will serve as a useful tool to orient data-collection and data-analysis.

Applying framework analysis concretely means that the themes emerging from the data are placed in the framework defined a priori. The framework is systematically applied to all the data. Although a analytical framework can be very useful, it is not suited, if the aim is to discover new ideas, since a framework or grid could be blinding61.

Software to analyse qualitative data

Analysis may either be done manually or by using qualitative analysis software, for example Nvivo©[2], Atlas ti©[3], Maxqda©[4], etc.

These Computer-Assisted Qualitative Data Analysis Software (CAQDAS) offer a support to the analyst with the storage, coding and systematic retrieval of qualitative data35. They are able to manage different types of qualitative materials, such as transcripts, texts, videos, images, etc. their utility for the analysis depends on the size of the corpus of analysis (number of interviews, plurality of the data sources) and has not to be automatic. They also could be useful for collaborative purposes when several researchers are analysing the same data. They not guarantee the scientific nature of the results62. Indeed, quality of the results does not depend on the tool used, but on the scientific rigor and the systematic analysis of the data.

3.1.1.7        How to validate qualitative research findings?

External validation, also called transferability of the analysis, refers to whether you can apply the findings to other settings. It is largely determined by the sampling strategy and the setting. (see also 1.2.2 General quality criteria in part 2).

Quality criteria and validation techniques for qualitative research methods are described more extensively in section 1.2.1 (part 2).


3.1.1.8        How to report qualitative research findings?

Interviews can be presented in a number of ways, there is no specific format to follow. However, alike other research methods, justification and methodology of the study should be provided. The research process should be fully transparent so that any researcher can reproduce it. In addition, it should be comprehensible to the reader.

 

A possible structure could be:

1. Introduction and Justification

2. Methodology

2.1 How were respondents recruited?

2.2 Description of the sample

2.3 Description of selection biases if any

2.4 What instruments were used to collect the data?

    You may want to include the topic list or questionnaire in an appendix

2.5 Over which period of time was the data collected?

3. Results : What are the key findings?

4. Discussion

4.1 What were the strengths and limitations of the information?

4.2 Are the results similar or dissimilar to other findings

     (if other studies have been done)?

5. Conclusion and Recommendations

6. Appendices (including the interview guide(s)/ topic guide)

&

 

 When writing up findings qualitative researchers often use quotes from respondents. Quotes are useful in order to63:

  • Illustrate the themes emerging from the analysis.
  • Provide evidence for interpretations, comparable to the use of tables of statistical data appearing in reports based on quantitative findings.
  • Strengthen credibility of the findings (despites critics argue that researchers can always find at least one quote to support any point they might with to make).
  • Deepen understanding. The actual words of a respondent could sometimes be a better representation of the depth of feeling.
  • Enable voice to research participants. This enables participants to speak for themselves and is especially relevant in a participatory paradigm.
  • Enhance readability by providing some vividness and sometimes humour: Braking up long passages of text by inserting spoken words, could help to keep the reader focused, but there could be a danger in moving too far towards a journalistic approach.

Ideally, quotes are anonymous and are accompanied by a pseudonym or description of the respondents. For example, in a research about normal birth, this could be: (Midwife, 36 years). There are however exceptions the rule of anonymity, e.g. stakeholder interviews, in which the identity of the respondent is important for the interpretation of the findings. In that case the respondent should self-evidently be informed and his agreement is needed in order to proceed.

Also in terms of lay out quotations should be different from the rest of the text, for example by using indents, italic fond or quotation marks. Quotes are used to strengthen the argument, but should be used sparingly and in function of the findings. Try to choose citations in a way that all respondents are represented. Be aware that readers might give more weight to themes illustrated with a quotation.

When the research is conducted in another language than the language of the report in which the findings are presented, quotes are most often translated. “As translation is also an interpretive act, meaning may get lost in the translation process (Van Nes et al, 201064, p. 313)”. It is recommended to stay in the original language as long and as much as possible and delay the use of translations to the stage of writing up the findings64.

KCE practice is to translate quotes only for publications in international scientific journals, but not for KCE reports. Although KCE reports are written in English, inserted quotes are in Dutch or French to stay close to the original meaning. The authors should pay attention to the readability of the text and make sure that the text without quotes is comprehensive to English speaking readers.

3.1.1.9        What are the common pitfalls?

In the following paragraph we mention a number of common pitfalls typical for interviews. They are based on the work of Mortelmans (Mortelmans, 2009) and the Qualitative Research Guidelines Project33.

  • The methodology needs to be transparent. Each step of the sampling, data collection and analysis should be described in sufficient detail, this means that it must enable other researchers to replicate the same study.
  • The sample should be well constructed and described.
  • Avoid dichotomous questions which elicit a yes or a no. In an interview we are especially interested in rich descriptions and we want the interviewee to talk a lot and elaborate on the topic of the question.
  • Avoid double questions, for example: “Once you decided to have a screening, what was the next step? How did you proceed? How did it change the way you thought about potential risks?” The interviewee can not respond to all the questions at once and thus picks out one. This means the other questions are lost.
  • Avoid the expression of value judgements or your own opinion, for example: “What do you think about the endless waiting times?” The word “endless” suggests irritation.
  • Avoid to be suggestive, for instance by giving examples: “Which kind of difficulties did you encounter, like long waiting times, full waiting rooms etc?” This kind of examples provide the interviewee with a frame, which he will possibly not transcend. This way you loose what he would have answered spontaneously.
  • Avoid a reverse of roles. The interviewee should not be asking you questions. An example could be: I: “What does it mean to you to be a patient?”, R: “I don’t know. What does it mean to you?”. If this happens you can say that you are willing to answer that question after the interview, but that you can not answer it during the interview in order not to influence the answers of the interviewee. A reverse of roles can be avoided if the interviewer introduces himself in a neutral way, for example as a researcher, but not as, for example a physician or an expert in an issue related to the topic/goal of the interview, in order for the respondent not to ask you too many questions on a particular condition or issue.
  • Avoid letting the interviewee deviate to far from the topic or elaborates on irrelevant matters by returning to the question posed.
  • Avoid being too jargony, but use a familiar terminology which does not need explications or definitions.
  • The analysis should not be superficial but really in-depth. However it may not transcend the data. The data must always support the results.

 


[a] based on Mortelmans, 2009 (Mortelmans, 2009)

[2]           http://www.qsrinternational.com/products_nvivo.aspx

[3]           http://www.atlasti.com/index.html

[4]           http://www.maxqda.com/

AttachmentSize
fig2_interviewing_people.png91 KB
fig3_conceptualisation.png37.37 KB

3.1.2 Individual interviews

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

3.1.2.1        What are individual semi-structured interviews?

Interviews are used in many contexts (journalism, human resource managers, etc.) and for many purposes (entertainment, recruitment of personnel, etc.), hence scientific data collection is only one very specific application, which should not be confused with other applications. The interview is easily trivialized as it is common practice in the media landscape which surrounds us. Fontana and Frey even speak about “the interview society”65 according to Atkinson and Silverman66. Practicing health professionals routinely interview patients during their clinical work, and they may wonder whether simply talking to people constitutes a legitimate form of research67. In qualitative research, however, interviewing is a well established research technique and two types can be distinguished: semi-structured and unstructured. Structured interviews are out of scope here, because they consist of administering structured questionnaires producing quantitative data.

Unstructured interviews are more or less equivalent to guided conversations67. Originally they were part of ethnographers’ field work, consisting of participant observation and interviewing key informants on an ongoing basis to elicit information about the meaning of observed behaviors, interactions, or artifacts67. There is no list of questions, nor an interview guide, the questions asked are based on the responses of the interviewee, as in the natural flow of a conversation68.

Semi-structured interviews are often the sole data source in a qualitative research project. A set of predetermined open-ended questions is used to guide the interview, but other questions emerging from the dialogue can be added68. Also the iterative nature of the research process in which preliminary data analysis coincides with data collection, results in altering questions as the research process proceeds. Even so, questions that are not effective in eliciting the necessary information can be dropped or replaced by new ones68.

Essentially an interview consists of someone who asks questions (interviewer), someone who answers these questions (interviewee) and the registration of those answers in some way10.

The interview as qualitative research method differentiates from other forms of interviewing used in varied domains. Mortelmans10 pays attention to four characteristics:

  • Flexibility; with flexibility internal and external flexibility is meant: external refers to the iterative use of interviewing and data analysis. Structure and content of the subsequent interview may be changed in function of the analysis of the previous one. Internal flexibility points to the fact that the sequence of the prepared interview questions and themes should stands in function of the interviewee in order to guard the natural flow of the conversation.
  • The interviewee leads so to speak the conversation. The interviewer only guards the scope of the conversation and makes sure that all the topics are covered.
  • Non-directiveness; the interviewee steers the interview and the interviewer only makes sure that the conversation does not stray too far by means of non-directive interview techniques.
  • Direct face-to-face contact is important to built trust and get in-depth information, but this depends on the topic and should be considered case by case.

3.1.2.2        When to use individual semi-structured interviews?

Individual semi-structured interviews are useful to:

  • Collect data on individuals’ personal histories, perspectives, and experiences, particularly when sensitive topics are being explored54.
  • Elicit a vivid picture of the participant’s perspective54.
  • Provide context to other data, offering a more complete picture69.
  • Learn about the perspectives of individuals, as opposed to, for example, group norms of a community, for which focus groups are more appropriate54.
  • Get people to talk about their personal feelings, opinions, and experiences54.
  • Gain insight into how people interpret and order the world on the research topic54.
  • Address sensitive topics that people might be reluctant to discuss in a group setting54.
  • Elicit information from key informants1.
  • Examine people’s experiences, attitudes and beliefs24.

3.1.2.3        Strengths and weaknesses of the method

Strengths:

  • They provide much more detailed information than what is available through other data collection methods, such as surveys69.
  • Questions can be prepared ahead of time. This allows the interviewer to be prepared and appear competent during the interview33.
  • Semi-structured interviews also allow informants the freedom to express their views in their own terms33.
  • Semi-structured interviews can provide reliable, comparable qualitative data33.

Weaknesses:

  • Interviews can be time-intensive because of the time it takes to recruit participants, conduct interviews, transcribe them, and analyse the results. In planning your data collection effort, care must be taken to include time for transcription and analysis of this detailed data69.
  • Interviewers must be appropriately trained in interviewing techniques. To provide the most detailed and rich data from an interviewee, the interviewer must make that person comfortable and appear interested in what they are saying. They must also be sure to use effective interview techniques, such as avoiding yes/no and leading questions, using appropriate body language, and keeping their personal opinions in check69.
  • Data from individual semi-structured interviews are not generalizable in a statistical way, but they are theoretically transferrable, because small samples are chosen and no random sampling methods are used. Individual semi-structured interviews however, provide valuable information, particularly when supplementing other methods of data collection. It should be noted that the general rule on sample size for interviews is that when the same stories, themes, issues, and topics are emerging from the interviewees, then a sufficient sample size has been reached69.

3.1.2.4        How to plan the research design?

See part 2, 2.2.1.1 “How to plan the research design?”

3.1.2.5        Modalities of data collection

Individual semi-structured interviews are usually conducted face-to-face and involve one interviewer and one participant. Phone conversations and interviews with more than one participant also qualify as semi-structured interviews, but, in this chapter, we focus on individual, face-to-face interviews54.

3.1.2.6        Data collection tools

The data collection tools to carry out interviews are topic lists, questionnaires and field notes. Topic lists and questionnaires are described in part 2, section 2.2.1.3.

Researchers use field notes to record observations and fragments of speech. Field notes should be written up as soon as possible after the events to which they refer. If possible, short “aide-mémoire” or pocket dictaphones may be used in fieldwork settings, to facilitate later expansion of the notes into proper fieldnotes35. In the chapter on observational techniques field notes are addressed in more detail (section 2.3.6.2 in part 2).

3.1.2.7        Sampling

For general issues on sampling, see 2.2.1.2 “Sampling issues in qualitative research: who and how many?” in part 2.

3.1.2.8        Human resources necessary

In the ideal scenario researchers plan, organize, carry out and transcribe the interviews themselves, to be completely immersed in the data, but in practice the interviews are often carried out by subcontractors and the transcriptions are often done by professional typists.

3.1.2.9        Practical aspects

Preparations for the interview See 2.2.1.4 “How to run the data collection” in part 2.

Physical organisation of an interview. Take the following rules into account:

  1. Interviewee and interviewer should not sit opposite each other, but rather at an angle of 90° or less.
  2. The interview should take place in a quiet place where the interviewee feels at ease.
  3. Avoid the presence of third parties.

3.1.2.10      Analysis and reporting of findings

See 2.2.1.6: “How to analyse the data?” and 2.2.1.8 “How to report qualitative research findings?” in part 2.

3.1.2.11      Examples of KCE reports using the method

  • Home monitoring of infants in prevention of sudden infant death syndrome70
  • Making general practice attractive: encouraging GP attraction and retention71
  • Osteopathy and chiropractic: state of affairs in Belgium49
  • Acupuncture: state of affairs in Belgium50
  • Homeopathy: state of affairs in Belgium48
  • Burnout among general practitioners: prevention and management72
  • Evaluation of a fixed personal fee on the use of emergency services73


[1]           We propose a example of a ‘standard introductive text’ in appendix.

3.1.3 Focus groups

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

3.1.3.1        What are focus groups ?

A focus group is a particular technique in qualitative research. In order to do a focus group interview a group of individuals is gathered in function of their specific profile or characteristics to explore a limited number of “focused questions”1. Groups are generally homogenous on a or several criteria relevant to the focus of the discussion.

In essence, a focus group is a small (usually 6-12 people) group brought together to discuss a particular issue (..) under the direction of a facilitator who has a list of topics to discuss (Green and Thorogood, 200937, p. 111).

Focus groups are group semi-structured interviews used for the purpose of collecting information focused on a specific subject or area of concern, for exploration and discovery, in-depth understanding of a problem as it is experienced in context, to assess needs, preferences, attitudes and interests related (in the context of KCE research) to health and health care issues.

 

It differs from individual semi-structured interviews, as the interaction component is used to bring out insights and understandings in ways which questionnaire items or individual questions may not be able to do. The interaction between the moderator and the group, as well as the interaction between group members, may result in more in-depth information, and to elicit differing perspectives related to carefully designed questions. Focus groups are thus not to be considered as a pragmatic time saving substitute for individual semi-structured interviews (e.g. if for any reason the planning does not allow for individual interviews), as the methodological groundings of both techniques differ.

A focus group is not synonymous to ‘group interview’: For a focus group, people are recruited specifically to participate in a research protocol, using a certain method. It is a group interview in the sense that it gathers data simultaneously from different participants37. However it differs from a group interview in the importance that is attached to the interaction among participants. Participants might change their perspective during the focus group interview because of this interaction. In a group interview the interaction between participants is limited, and occurs mainly between interviewer and interviewees.

Figure 4 – Interaction patterns in a group interview versus focus group interview

 

 

Depending on sampling strategy and aims, group interviews can take several forms, e.g. consensus panel, focus group, natural group or community interview (Coreil 2005 cited by Green and Thorogood, 200937).

Focus groups can be used as a single research strategy, as well as in combination with other methods in a multi-method research strategy.

3.1.3.2        Specific questions suitable for the method

Specific questions suitable for interviews in general were elaborated in section 1.1. (part 2).

The principal feature of focus group interviews is interaction between participants. Kitzinger (Kitzinger, 200674, p. 22) highlights that this particularity could be used to:

  • “Highlight the respondents’ attitudes, priorities, language and framework of understanding.
  • Encourage participants to generate and explore their own questions, and to develop their own analysis of common experiences.
  • Encourage a variety of communication from participants – tapping into a wide range and different forms of discourse.
  • Help to identify group norms/cultural values.
  • Provide insight into the operation of group social processes in the articulation of knowledge (e.g. through the examination of what information is sensitive within the group.
  • Encourage open conversation about embarrassing subject and to permit the expression of criticism.
  • Facilitate the expression of ideas and experiences that might be left underdeveloped in an interview, and to illuminate the research patient’s perspectives through the debate with the group.”
  • Allow topics which participants have given little thought in advance to emerge from the discussion75.


3.1.3.3        Strengths and weaknesses of the method

The benefits from focus groups highlighted are:

  • Interaction between participants37.
  • Ability to produce a large amount of data on a topic in a short time14.
  • Access to topics that might be otherwise unobservable14.
  • Access to explore sensitive topics, such as dissatifaction with a service: it can be easier for an interviewee if negative ideas are reported as coming from a group than from one single person37.
  • Ability to insure that data directly targets researcher's topic14.
  • Access to comparisons that focus group participants make between their experiences. This can be very valuable and provide access to consensus/diversity of experiences on a topic33.

The limitations of focus groups are related to the limitations of group interviews:

  • Inappropriate to uncover marginal or deviant opinions37.
  • Importance of social norms: participants are influencing each other, creating a certain kind of implicit norm76, or consensus.
  • Otherwise, group dynamics may contribute to cristallization of opinions.
  • Not easy to organize: several selected people have to be gathered in the same place during a couple of hours .

3.1.3.4        How to plan the research design?

Since focus group interviews are a collective data collection technique requiring direct person-to-person contact (several people have to come together at the same moment and in the same place) a careful planning of all activities and related tasks is necessary.

3.1.3.5        Modalities of data collection

The data collection by focus group could vary according to33:

  • The level of standardization of the questions (see part 2, 2.2.3.6).
  • The number of focus groups (see part 2, 2.2.3.7).
  • The number of participants in each groups(see part 2, 2.2.3.7).
  • The level of implication of the moderator (see part 2, 2.2.3.8 ).

3.1.3.6        Data collection tools

During the preparation of the focus group interviews a set of topics or questions is developed and takes the form of a topic list or questionnaire. For the general principles, see part 2, 2.2.1.3”

A focus group interview is in most cases a structured group process structured by means of an agenda to keep the group focused and on track. A focus-group should be experienced as free-flowing and relatively unstructured, but in reality, the moderator must follow a pre-planned script of specific issues and set goals for the type of information to be gathered. An introduction of up to 15 minutes should be carefully planned, as well as a good opening question. In order to keep the time schedule, as several people are going to participate and answer to the questions, it is important to foresee a maximum duration for each question.

The use of a well designed guide is helpful to compare information from one group to another as it is expected to have more than one focus group for a given topic.

Examples of topic lists used in focus groups can be found in appendices or in KCE reports in which this technique have been used (see part 2, 2.2.3.13).


3.1.3.7        Sampling

For general issues on sampling, see part 2, 2.2.1.2 “Sampling issues in qualitative research: who and how many?”

Identification of units of analysis

The starting point for selecting participants for focus groups is to identify the unit of analysis. Is the unit of analysis “individuals for their personal opinions/experience/expertise”, or is it “individuals because they represent organizational perspectives”? It has a major impact on the people invited to the focus group interview and therefore it should be clearly described.

The sample of focus groups will consist of groups of people, instead of individuals. People who are invited to take part need to have an interest in the subject.

Composition of the groups

Ideally groups have to be internally homogenous on criteria relevant to the topic but externally heterogeneous between groups. Homogeneity in the group capitalizes on people’s shared experiences74.

It is best to select people who do not know one another, but have similar relationships with the topic being investigated (although it could in practice be difficult for particular topics). Selecting participants who are similar may help them to share ideas more freely and develop an in-depth analysis of a topic (homogeneous groups).

Sometimes, heterogeneous groups can be used after the primary analysis of homogeneous focus groups has started. Heterogeneous groups are used to “confront” diverging opinions. In general terms, heterogeneous groups are composed of representatives of all relevant stakeholders.
In this case, the researcher has to pay attention to potential power differences or inequalities between participants. This may prevent some people from talking freely during the discussion and by consequence prevent the collection of rich data74.

In the Belgian context, focus group interviews can be carried out with French-speaking or Dutch-speaking and even German-speaking, participants. It is advisable to conduct unilingual groups: it is easier and richer for facilitators and participants. For heterogeneous groups, like stakeholders samples, it could be difficult to separate people in groups according to their mother tongue. In this particular case, it is important that participants express themselves in their mother tongue and to be sure that every participant understands the other language. The moderator has to be thus perfectly bilingual.

Number of participants per group

A group of six to twelve people is sufficient for a focus group. The ideal size for a focus group is eight to ten respondents. In general, the smaller the group, the more manageable it is. From experience, a group of 6‑8 participants allows enough time for discussion and is easier to manage. Where the purpose is to generate in-depth expression from participants, a smaller group size may be preferable in combination with carrying out more focus groups to attain saturation.

In order to make sure that a group counts enough participants, it is advisable to recruit 25% more people than required37. If too few participants turn up, one should foresee an additional focus group to substitute for the low attendance.

Number of groups

The number of focus group interviews needed depends on the aims and available resources . It is almost impossible to give clear standardized guidelines on the number of focus groups needed.

It is methodologically important for both approaches to conduct at least two focus groups by ‘type of people’. Using only one focus group to arrive at conclusions is risky since the opinions expressed may have had more to do with the group dynamics (i.e. persuasive skills of one or two members) than a true sampling of the opinions of the population that the group represents. Even the preset number of two focus groups is generally too limited to make in-depth analyses, especially if the topics discussed are rather “broad” or general (see also paragraph analysis on continuous comparative method). Having two homogeneous groups that provide different results suggests that more information is necessary (data saturation is not reached). One rule of thumb is to conduct focus groups until they no longer provide any new information on the topic discussed.

3.1.3.8        Human resources necessary

Three people (from the research team) could chair the focus group interview:

 

  1. The moderator (also called ‘facilitator’) plays a crucial role in the success of a focus group interview and can have a major impact on the outcomes of the data collection. He should lay down some ‘rules’, explain the duration of the focus group interview, plan a break in between, make everybody welcome before hand, do the paperwork (e.g. informed consent) before actually starting the interview. Before the opening question, is it important to ask everybody to introduce themselves briefly. He has “to establish a relaxed atmosphere, enable participants to tell their stories, and listen actively” (Green, 200937, p 126.). Facilitating or moderating focus group interviews requires particular competencies: interpersonal skills (including non-verbal communication skills) are needed as well as a non-biased attitude towards the issues discussed. A focus group moderator should be able to keep the discussion on track and make sure every participant is heard. He/she has to be able to summarize what has been said, to structure the discussion. However he/she should not take position, avoid to make quick assumptions or conclusions, avoid to develop answers for the participants or give advice. Focus groups are intended to make in-depth studies of the perceptions, attitude and opinions of the participants, not of the research team (or moderator). The moderator makes it socially acceptable for participants to have another point of view. If participants get off track or get ahead of the issue being discussed the moderator must pull the group back together. He/she does not need to be an expert in the domain of the research.The moderator needs to use “probing techniques” when necessary: probing is essentially a means of further investigating a topic that has already been introduced. Probing can be used to clarify, to obtain more detail and to assure completeness. For this purpose, see also part 2, 2.2.1.4. In the particular case of focus group interviews, the moderator could use disagreements in the group to force participants to develop and elucidate their point of view. An experienced interviewer could decide whether or not to follow the lead of the interview or to return to the sequence of the interview guide1 In the particular case of bilingual groups, the moderator has to master both languages.
  2. The note-taker will take notes during the discussion while the moderator is introducing questions. The note-taker could sit next to the moderator. Nevertheless, pay attention that if he/she is typewriting on a laptop directly, the sound of the typing on the keyboard is not disturbing. Moderator and note-taker can take turns in asking questions and taking notes (this requires a well functioning team that clearly understands its roles and can adapt to the situation). It should be discussed and reported whether different or the same persons facilitate the respective focus group interviews.
  3. The observer is a third facilitator who could be useful to observe the focus group participants (non-verbal language) and to help the moderator in identifying not very talkative participants and in keeping time.

 

As focus group have to be transcribed afterwards. It is also useful to engage the services of an audio typist.


3.1.3.9        Running of data collection

For general principles see part 2, 2.2.1.4 “How to run the data collection?”.

In the case of focus groups, once the group of respondents is gathered for the discussion, the moderator should give a brief introduction to set everybody at ease[1]. More concretely, the moderator should:

  • Explain the purpose of the discussion, how the information collected will be used and reported.
  • Introduce note-taker and observer who will remain in the room during the discussion.
  • Explain that the discussion is for scientific purposes and that information will solely be used with the context of the research.
  • Ensure participants that the rules of confidentiality apply to everyone in the room, including the note-takers, observers.
  • Explain how names will be used (real names or pseudonyms).
  • Explain the group rules (speak one at a time, avoid interrupting or monopolizing, etc.).
  • If the discussion is to be tape-or video-recorded, obtain permission from the respondents first, and explain how the tapes will be used, stored and eventually destroyed. – Tip to increase the quality of the recording: use 2 recorders, preferably stereo recording, one at each side of the table: it is useful to understand everybody and prevent the loss of data in case of disfunctioning of the recorder.

The Moderator will then begin the focus group interview by asking an ‘icebreaker question’ to facilitate the discussion in the group. Afterwards, he/she will come to the focus of the discussion.

Immediately after the focus group a debriefing has to be foreseen with the moderators/facilitators. The debriefing part is an essential step for the analysis. The debriefing exercise is best supported by a template of dimensions, upon which the moderator/facilitator team needs to comment (example in Appendix 9).

The facilitators should review the notes taken during the focus group and have a first assessment of clarity and understanding.

They should discuss, compare and record observations or impressions about the group not readily apparent from the notes.

Discuss and record any insights or ideas emerging during the interviews while they are still fresh in the mind.

3.1.3.10      Practical aspects

Preparations for the interview

See also part 2, 2.2.1.4 “How to run the data collection? .

Location & timing

  • The location where the focus groups will be held should be carefully selected.
  • Accessibility and transport issues (and mobility needs of participants) should be considered.
  • Avoid noisy areas where it will be difficult for participants and the moderator to hear each other.
  • The setting should be comfortable, non-threatening for the respondents. Refreshments should be provided.
  • The focus group table can be organized before hand and this allows the researcher to place name tags in the way he wants.
  • Seating should be arranged to encourage participation and interaction, preferably in a circle, with or without name tags. It can be discussed whether tables are needed. Moderators/facilitators (and note takers) should be integrated as much as possible within the discussion setting.
  • The timing of the focus group interview need to be acceptable for all potential respondents in order to avoid selective “non-response” as much as possible (take into account the socio-demographic profiles of the targeted participants such as working times, daily activities, family life, etc.).


Duration

The length of the focus group should be between 1 and 3 hours.

Allow sufficient time at the beginning to welcome participants, give them an introduction and let them introduce themselves. This part should not take excessive time (about 10 minutes).

Material

Data are collected through different sources: audio or video-taping can be considered. When focus group interviews are recorded, the equipment should be of good quality and easy to use (check batteries and microphone). For larger groups, it may be necessary to use two tape recorders or multi-channel equipment, strategically placed to maximize the probability of recording contributions from all participants.

“Field notes” are an essential part during data collection. They capture all of the essential “non-verbal” information during the focus group interview.

Information has to be collected in an unbiased manner (avoid to filter out information as pre-interpreting it as unimportant, especially in the first focus groups).

The context of statements made during focus groups should be documented (important for giving meaning to the statements in the phase of analysis).

Try to capture nonverbal behavior of group participants (nonverbal reactions of other participants after a participant statement may indicate consensus or disagreement).

3.1.3.11      Analysis and reporting of findings

For general issues on analysis, see part 2, 2.2.1.6 “How to analyse the data?”.

In the particular case of focus groups, separate analyses have to be performed on data gathered “within-focus group” and continuously compared “between focus group”. This is also an iterative process.

It is important that statements be understood in the context which they were made. Nonverbal communication observed during the interview can also be very informative.

For reporting, see part 2, 2.2.1.8 “How to report qualitative research findings”.

Note that findings are reported by focus group as unit of analysis and not by person.

3.1.3.12      Quality criteria

See section part 2, 2.2.1.7 “How to validate qualitative research findings?”

Vermeire et al propose a checklist specific to critical appraise the quality of focus groups in health care research articles in primary healthcare77.

3.1.3.13      Examples of KCE reports using the method

  • Evaluation of the Belgian reference reimbursement system78.
  • Evidence-based content of the written information provided by the pharmaceutical industry to the general practitioner79.
  • Quality development in general practice in Belgium: status quo or quo vadis ?80.
  • Mental health care reforms: evaluation research of ‘therapeutic projects’81.
  • Emergency psychiatric care for children and adolescents82.

3.2 Observation

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

The purpose of participant observation is partly to confirm what you already know (or think you know) but is mostly to discover unanticipated truths. It is an exercise of discovery” (Mack, 200554, p. 23)

In this chapter we explicitly try to focus on direct observation, instead of participant observation. However, two remarks are in place. One, there is nearly always some participation involved in observing, unless the researcher is covered behind for example a one-way mirror. In all other cases the researcher is present in a setting, hence inevitably becomes part of the setting. Second, in the KCE context participant observation is unlikely to be applied because it is very time consuming, intensive and hence is not compatible with KCE working procedures. However, that does not mean that observational techniques are irrelevant to a KCE researcher. They can be very useful, for example in case of site visits. In the following chapter although participating is not the main goal, it often enters the logics and quotes used.

3.2.1 What is (naturalistic) observation?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Observing is more than looking around, it is actively registering information along a number of dimensions, namely places (physical place or setting), persons (the actors involved) and activities (a series of acts)83. Observing means having attention for (1) the detail of the observation, (2) visual as well as auditory information, (3) the time dimension, (4) the interaction between people, and (5) making links with mental categories10.

Observing includes roughly three steps:

  1. A descriptive step; the researcher enters the research setting and gets a general overview of the social setting.
  2. A focused step; more focused observations are a step closer to the research question. The aim is to search for relationships or connections between several elements in his research question, for example X is a characteristic of Y, or X is the result of Y. More concrete, suppose a researcher wants to study the way emergency care is organized in Belgium, he would do some descriptive observations in the emergency department of hospitals to get an idea of the general structures and processes characteristic for emergency care. In a next step he turns to his research question which is about how cost-effectiveness of emergency care could be attained. Hence the focus of his observation will relate to all possible costs and which could be avoided.
  3. a selective step83;. In this last phase, after the researcher may have analysed his data (field notes), he may have identified a lack of information of one specific category of costs, e.g. cleaning and housekeeping costs, and may therefore decide to do extra observations in function of this specific aspect.

3.2.2 When to use observations?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens
  • To collect data on naturally occurring behaviors in their usual contexts54. Observation also captures the whole social setting in which people function by recording the context in which they live84.
  • Unstructured observation illustrates the whole picture, captures context/process and informs about the influence of the physical environment84.
  • To check whether what people say they do is the same as what they actually do84. Both what people perceive that they do and what they actually do are however valid in their own right and just represent different perspectives on the data84.
  • Observation is also an ongoing dynamic activity that is more likely than interviews to provide evidence for processes, things that are continually moving and evolving84.
  • To study the working of organisations and peoples’ roles and functioning within organisations20.
  • To uncover behaviours or routines of which the observed themselves are not aware of20. What the researcher considers an important finding may belong to the self-evident nature of daily life from the participants’ point of view.
  • To understand data collected through other methods (e.g. interviews) and also to design the right questions for those methods54

3.2.3 What are the strengths and weaknesses of observations?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

3.2.3.1        Strengths

A number of strenghts have already been described under “When to use observations?”. We could add that:

  • Observation has the advantage of capturing data in more natural circumstances84.
  • The Hawthorne effect[1] is an obvious drawback but once the initial stages of entering the field are past most professionals are too busy to maintain behaviour that is radically different from normal84

3.2.3.2        Weaknesses

  • It can be very difficult to get access to the setting.: An observer is often experienced as a threat, especially if the setting is not asking for the research to take place. Observation (and especially participant observation) might lead to knowledge of informal procedures or rules, which people do not want to be uncovered. Also the researcher can be experienced or perceived as a barrier for the normal daily routine in the setting10. In direct observation, the researcher does not participate in the setting, hence is known as a stranger and gets only access to the public or formal layer of the social reality. He does not become an insider and will miss inside information because he is too distant from the actors he is observing10. “Access, then, is not a straightforward process of speaking to the person in charge and obtaining the approval of the ethics committee. It usually involves considerable time and effort and a constant endeavour to strive for ‘cultural acceptability’ with the gatekeepers and participants in research sites” (p. 310)84.
  • Once inside the setting there is the problem of avoiding “going native”: This means “becoming so immersed in the group culture that the research agenda is lost or that it becomes extremely difficult or emotionally draining to exit the field and conclude the data collection” (p. 183)20.
  • Observational data, are more than interview data, subject to interpretation by the researcher. Observers have a great degree of freedom and autonomy regarding what they choose to observe and how they filter the information84.
  • Observations are time-consuming and hard work at every possible hour of the day.
  • An observer can get emotionally involved in what he observes, and by consequence lose his neutrality.
  • It is impossible to write down everything that is important while observing (and participating). The researcher must rely on his memory and have the discipline to write down and expand the field notes soon and as completely as possible54.

[1]           The Hawthorne effect is the process where human subjects of an experiment change their behavior, simply because they are being studied http://www.experiment-resources.com/hawthorne-effect.html.

3.2.4 How to plan the research design?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Often observations are carried out at the beginning of the data collection phase, but the method can also be used later on during the research process to address questions suggested by data collected though other methods54. Before starting the observations, the researcher should try to find out as much as possible about the site where he will be observing.

At the KCE, site visits are common to allow the researchers to become familiar with the research topic and setting. This is often combined with interviews or less formalized talks to key persons on the site. After a number of site visits the scope of the research project is determined and precise research questions are formulated.

3.2.5 Modalities of data collection

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

3.2.5.1        Participant versus direct observation

The role to adopt during observation and the extent to which participants are fully informed are somewhat intertwined84. Typically researchers refer to Gold’s typology of research roles85:

  • The complete observer, who maintains some distance, does not interact and whose role is concealed;
  • The observer as participant, who undertakes intermittent observation alongside interviewing, but whose role is known;
  • The participant as observer, who undertakes prolonged observation, is involved in all the central activities of the organization and whose role is known;
  • The complete participant, who interacts within the social situation, but again whose role is concealed.

Mack et al.54 describe observing as remaining an “outsider” and simply observing and documenting events or behaviors being studied, while participating is taking part in the activity while also documenting it. Pure observing, without participating is a situations that in fact seldom occurs, because once you are present, you are visible, you influence the activities around you, you participate in some degree. There are two reasons for this participation, or to better understand the local perspective, or in order not to call attention to yourself54.

3.2.5.2        Structured versus unstructured observation

  • Structured observations are associated with the positivist paradigm and aim at recording physical and verbal behavior by means of a list of predetermined behaviours84.
  • Unstructured observations are not ‘unstructured’ in the sense of unsystematic or messy, “instead, observers using unstructured methods usually enter ‘the field’ with no predetermined notions as to the discrete behaviours that they might observe. They may have some ideas as to what to observe, but these may change over time as they gather data and gain experience in the particular setting. Moreover, in unstructured observation the researcher may adopt a number of roles from complete participant to complete observer, whereas in structured observation the intention is always to ‘stand apart’ from that which is being observed” (p307)84.

3.2.5.3        Overt versus covert observation

Covert observation corresponds to two roles in Gold’s typology85, i.e. complete observer and complete participant (see above). Most authors agree that covert observation is only legitimate in very specific circumstances and should be avoided. Mack et al. 54 formulate the following ethical guideline regarding observations: “When conducting participant observation, you should be discreet enough about who you are and what you are doing that you do not disrupt normal activity, yet open enough that the people you observe and interact with do not feel that your presence compromises their privacy.”(p. 16) As with all qualitative research methods, researchers must also protect the identities of the people they observe or with whom they interact, even if informally. “Maintaining confidentiality means ensuring that particual individuals can never be linked to the data they provide54

3.2.6 Data collection tools

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

3.2.6.1        Checklists

Before you enter the setting and start observing, it might be a good idea to have some questions in mind. It may be helpful to carry a checklist in your pocket to help you remember what you are meant to observe54.

3.2.6.2        Fieldnotes

“Fieldnotes are used by researchers to record observations and fragments of remembered speech. Although researchers may use other means of recording (such as video) and other form s of data (such as interview transcripts), fieldnotes remain one of the primary analytic materials used in ethnography.” (p. 82) 35.

Depending on the research questions, the researcher is interested in other aspects of social reality. Mulhalls’ schema84 includes the following types of field notes, each covering an aspect of social reality:

  • Structural and organizational features – what the actual buildings and environment look like and how they are used
  • People – how they behave, interact, dress, move.
  • The daily process of activities.
  • Special events – in a hospital ward this might be the consultant’s round or the multidisciplinary team meeting.
  • Dialogue.
  • An everyday diary of events as they occur chronologically – both in the field and before entering the field.
  • A personal/reflective diary – this includes both my thoughts about going into the field and being there, and reflections on my own life experiences that might influence the way in which I filter what I observe.

It is particularly important to detail any contradictory or negative cases. Unusual things often reveal most about the setting or situation20.

Documenting observations consists of the following steps54, 86:

  • Quick notes during the observation.,
  • Once the researcher left the setting, he expands his notes into fieldnotes. This means he reads them through and adds other things he can remember, but has not yet written down. Note taking in the setting is not self-evident and it is impossible to write down everything you see. Therefore good note taking should trigger the memory by means of key words, symbols, drawings, etc.
  • After expansion, the researcher “translates” his shorthand into sentences., and
  • Together with the translation phase, a descriptive narrative can be composed. The researcher writes down a description of what happened and what he has learned about the setting. In this step the researcher should distinguish between describing what happened and interpreting.

The researcher should be well aware of the difference between describing what he observes versus interpreting what he observed. It should be avoided to report interpretations rather than an objective account of the observations54. For example, an interpretive description of a patient could be “he was in terrible pain”. An objective description would be “he was screaming and his face turned pale while grimacing”. “To interpret is to impose your own judgment on what you see” (Mack, 200554, p23). The danger is that interpretations can turn out to be wrong. Therefore the researcher should ask her/himself “what is my evidence for this claim?”54. One way of separating descriptions and interpretations is by separating them visually on paper or screen.

3.2.6.3        Draw a map of the setting or settings you observe.

Maps might support your memory and are a tool to reconstruct interactions and movements of people in a room.

3.2.6.4        Audio or video

Audio or video recordings of observations are generally not permissible unless all ethical requirements are fulfilled and informed consent has been obtained.

3.2.7 Sampling

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

As outlined in the general principles of the chapter on interviewing (part 2, 2.2.20), sampling in qualitative research is seldom statistically based. Also samples of settings or groups to observe are purposive.

Specifically for observation the sampling units are places, locations, and blocks of time, but usually not individuals. The aim is to select ‘information-rich’ cases, but in practice site selection is often a pragmatic decision based on existing networks and accessibility. Ideally however, sites are chosen because they typify some larger population of sites (such as clinics) or perhaps because they are exceptional in some way. Observation methods may be used across multiple sites and one could select the ones representing a range of typical settings37

3.2.8 Human resources necessary

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Observations can be the work of one researcher, a pair of researcher, or a whole team. Which arrangement is most appropriate depends on the research questions and the features of the setting. Also members of a team can disperse to different locations individually, or in pairs or groups, in order to construct a more complete picture of the issues being studied.

One of the advantages of team work is that field notes can be compared and that team members can question each other about assertions being made. “Taking another perspective on validity Graneheim et al. (2001) used multiple data collectors with different perspectives (insider or outsider) to observe the same situation. This may not accord with the idea that every researcher may produce a unique account of a situation that is valid in its own right. But with extensive mutual reflection, as undertaken by Graneheim and colleagues, these combined observations may have consensual validity. However, from a practical standpoint few projects are afforded the luxury of multiple data collectors.” (Mulhall, 200384, p. 309).

3.2.9 Practical aspects

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens
  • Try to be “invisible” as an observator. Adapt to the setting in which you will do the observations, in terms of dress code, the way of behaving, and what is expected from you by the other actors in the setting.
  • Start with short observations to explore the field and to get yourself used to your role as observer.
  • First you should get an idea of “the normal” way of life in a setting, before you are able to identify unusual or abnormal situations.
  • Circumstances may make it difficult or unacceptable to make fieldnotes, hence the researcher has to write down his observations afterwards. This can lead to a memory bias.
  • Field notes should not contain interpretations, but merely descriptions.
  • There is also the practical problem of how, especially in large and busy social settings, like an emergency department, to inform and obtain consent from everyone who might ‘enter’ the field of observation84.
  • Note that once inside the setting it might be difficult to get out again: Ending the fieldwork should not happen abruptly. The researcher must take time to “ease out”. In the ‘easing out’ phase the researcher is more and more absent from the setting. This means more time to analyse the data. When present in the setting, the researcher can confront his preliminary analysis with new observations in the setting10. In the literature the advice is to keep in contact with the setting until the final report is written87.

3.2.10 Analysis

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Field notes contain a lot of detail and are highly descriptive. In order to find explanations or answers to the research questions, the researcher should develop categories and test them against hypotheses, and refine them. This is an iterative process that starts during the data collection phase. 

3.2.11 Reporting of findings

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

As with other qualitative research methods it is important that evidence from the data is presented to support the conclusions of the researcher, by means of examples or quotations. The main principles have already been mentioned in (see part 2, 2.2.1.8 “How to report qualitative research findings”).

3.2.12 Quality criteria

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

The quality of observational studies depends largely on the quality of the descriptions of data collection and analysis provided by the researcher. Details about how the research was conducted are crucial and should be well documented. For example, how much time was spent in the field, how typical were the events recorded, description of the attempts to verify the observations made, etc.

The general criteria to assess the quality of qualitative research are described in part 2, section 1.2.1 and also apply to observational methods.

3.2.13 Examples of KCE reports using the method

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

So far no observational studies have been carried out at the KCE.

3.3 Delphi Technique

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Consensus reaching methods generally used in health care are Delphi panel, nominal group or consensus conference. They are useful to organize “qualitative judgments and, which is concerned to understand the meanings that people use when making decisions about health care.” (Black, 200688, page 132). They are not as such qualitative methods because they may use quantitative data collection tools (questionnaires, scales), and quantitative element in the analysis (statistics).

All the consensus methods cited here are characterized by the provision of information prior to the discussion, privacy (participants express their opinion in private), opportunity for participants to change their view and explicit and transparent derivation of the group decision, based on (statistical) analysis88.

3.3.1 Description of the method

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

The Delphi method (named so because of the Delphi Oracle) was initiated by the RAND corporation, a nonprofit institution that helps improve policy and decision making through research and analysis[a]. The original definition given in the 50s was that it “entails a group of experts who anonymously reply to questionnaires and subsequently receive feedback in the form of a statistical representation of the "group response," after which the process repeats itself. The goal is to reduce the range of responses and arrive at something closer to expert consensus.”89 Today, the method has evolved and Delphi surveys could aim at different goals or have several designs[b]. It could be define more as “a method for structuring a group communication process” and not as a method to produce consensus90. The method could also be defined as a systematic collection and aggregation tool of informed judgment from a group of experts on specific questions and issues” (Hasson, 201191, p. 1696).

Delphi surveys are used in several domains (politics, psychology, agriculture, etc.) and could vary in different ways. Several types of Delphi often used in health research (non exhaustive) are presented in Table 10.

 

Table 10 – Types of Delphi designs

Design Type

Aim

Target panellists

Administration

Number of rounds

Round 1 design

Classical

To elicit opinion and gain consensus

Experts selected based on aims of research

Traditionally postal

Employs three or more rounds[3]

Open qualitative first round, to allow panelists to record responses

Modified

Aim varies according to project design, from predicting future events to achieving consensus

Experts selected based on aims of research

Varies, postal, online, etc.

May employ fewer than 3 rounds

Panelists provided with pre-selected items, drawn from various sources, within which they are asked to consider their responses

Decision

To structure decision-making and create the future in reality rather than predicting it

Decision makers, selected according to hierarchical position and level of expertise

Varies

Varies

Can adopt similar process to classical Delphi

Policy

To generate opposing views on policy and potential resolutions.

Policy makers selected to obtain divergent opinions

Can adopt a number of formats including bringing participants together in a group meeting

Varies : It theoretically needs 5 rounds but could be done in 3 or 4 rounds:

Can adopt similar process to classical Delphi or

1- preformulating the obvious issues by the research team;
2- seeding the list with an initial range of options but allowing for the respondents to add to the lists92, 93

Real time/consensus conference

To elicit opinion and gain consensus on real time

Experts selected based on aims of research

Use of computer technology that panelists use in the same room to achieve consensus in real time rather than post or via Internet94

Varies

Can adopt similar process

Adapted from Hasson, 201191, p. 1697 and Keeney, 201195


[a]           http://www.rand.org

[b]           See the special issue 78 of the review ‘Technological Forecasting & Social change” (2011) available at http://www.journals.elsevier.com/technological-forecasting-and-social-ch....

[3]           Note that the number of rounds should ideally be based on the saturation of the responses and is difficult to fix in advance

3.3.2 Specific questions suitable for the method

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

The following questions could be answered by using a consensus reaching method such as the Delphi panel:

  • To help the decision making process.
  • When personal contact is not necessary96.
  • To choose the most appropriate method or tool (e.g. data collection technique, scales, questionnaires, etc.).
  • To identify the best choice of treatment (when no other evidence is available or to complete it).
  • To identify the form of a programme.
  • To clarify professional roles97.
  • To develop clinical guidelines98.

3.3.3 Strengths and weaknesses of the method

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

3.3.3.1        Strengths

  • Lower production cost99.
  • Relatively rapid results99.
  • Participant can express their opinion anonymously96, without external (perceived) pressure while the process allows to catch the view of the entire group96.
  • Avoid domination by individuals or professional interests97;

3.3.3.2        Weaknesses

  • Success depends on the qualities of the participants.
  • Reliability increases with the number of participants (and the number of rounds). In addition, it is difficult to keep everybody in successive rounds96.
  • Coordination is difficult96.
  • The existence of a consensus does not necessary mean that it reflects an appropriate or “correct” answer97.

3.3.4 How to plan the research design?

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

A Delphi survey takes several weeks, even if the number of participants is small.

It has to be planned in the beginning of the project or, if the necessity to conduct such a study appears late in the course of the project, it is important to realize that the whole process takes several weeks, depending on the number of rounds needed. The next figure illustrates the whole process and the time needed.

Figure 5 – The Delphi process

 Adapted from Slocum et al.93

3.3.5 Modalities of data collection

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Delphi could be administrated ‘paper-and-pencil’ by mail or e-mail.

Online Delphi’s are more and more carried out. Software is available to support the data collection and the analysis (Delphi_Survey_Web (DSW)100, Mesydel©101)

The number of rounds is not necessarily defined a priori (often because of budgetary, time or human resources limitations): data collection must stop when the saturation or the consensus is reached.

3.3.6 Data collection tools

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

The Delphi method uses iterative (e-)mailed questionnaires in successive rounds. Because there is no interaction between the respondent and the researcher, the formulation of the questions has to be clear, and definitions should be given where necessary.

The questionnaire of the first round encompasses open-ended questions, to identify items to include in the second round.

Next rounds could be exclusively qualitative or composed of closed questions with scales (from totally agree to totally disagree, i.e. from 1 to 9), or combining both qualitative and quantitative questions. They present a synthesis of the results issued from the previous round.

In the case of closed questions, agreement is usually summarized by using the median and consensus assessed by presenting interquartile ranges for continuous numerical scales97. Graphical presentations of the results are welcomed.

In KCE reports the questionnaires used in each round are presented in appendices.

3.3.7 Sampling

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Participants have to be carefully chosen because of their expertise, experience or knowledge in the field of the research question. In addition, the variety of positions in the field or opinions regarding the subject, should be covered. In that way, lay people could be added to increase the variety of viewpoints102.

They could be identified through publically available bibliographic information102. Snowballing recruitment could be useful to secure easy agreement to panelist invitation and strengthen panelist retention102.

There is no practical limit to the number of participants in a Delphi survey89

3.3.8 Human resources necessary

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

The administrator of the survey develops the questionnaires, identifies, mobilizes and recruits participants, analyses findings and reports them. He/she is responsible for keeping a low attrition rate and insure the coherence between the different steps of the method.

Administrative support could be needed to (e-)mail the questionnaires and manage reminders and answers.

3.3.9 Practical aspects

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens
  • It is important to clearly explain the goal of the questionnaire and the way it will be analysed. The redaction of the invitation/introduction letter is thus crucial. “Stressing the practical policy application of the Delphi yield to experts panelists to aid their retention” (Rowe, 2011102, p. 1489).
  • The research team should have managers skills to follow up the returned questionnaires and mailing.
  • The utilization of online tools could be very useful as well for the research team (rapid results) as for the participants.
  • While anonymity in the process of the Delphi is required, “using social rewards for recognition in participation, such as subsequently publishing panel membership listings” (Rowe, 2001102, p. 1489) could improve panelists recruitment and retention.

3.3.10 Analysis

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Each step of the Delphi requires a specific analysis.

In a classical Delphi, open-ended questions from round 1 should be content analysed ‘in order to group statements generated by the experts panel into similar areas’95.

Round that uses closed questions should be statistically analysed. Summary statistics are used to decide whether or not consensus is reached. The level of the consensus has to be defined in advance (i.e. 70% of agreement).

There is no agreement on the threshold indicating a consensus, nor how to choose this threshold95. Each researcher has to reflect on it, case by case.

The proposals that have reached consensus should be eliminated from the next round.

3.3.11 Reporting of findings

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

Intermediary results are reported directly in the successive questionnaires.

All the consensus and dissensus items are listed and discussed at the end of the process. 

3.3.12 Quality criteria

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

It seems that no consensus exists with regards to the standard of methodological rigor to apply. And that “no definitive evidence exists which demonstrates the reliability or validity of the technique” (Keeney, 201195, p. 104). This is partly due to the variety of the Delphi surveys and the constant evolutions in this field91.

We have not identified any checklists to assess the quality of a Delphi survey.

However, the following aspects of the survey could be assessed (adapted from Jillson103 and Hasson91):

  • Applicability of the method to the specific research problem
  • The quality of the composition of the Delphi panel. Participants have to be carefully chosen in function of their expertise and position in the group.
  • Design and administration of the questionnaire
  • Feedback

A Delphi survey should be reviewed in terms of reliability, validity and trustworthiness to judge its worth91.

3.1.13 Examples of KCE reports using the method

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens
  • Impact of academic detailing on primary care physicians104
  • Burnout among general practitioners: prevention and management72
  • Methods for including public preference values in reimbursement decision making processes for health interventions. Exploration of the feasibility of different models in Belgium (ongoing project, publication foreseen end 2012)

3.1.14 Basis references

Author(s): 
Laurence.Kohn
Author(s): 
Wendy.Christiaens

For practical tips see the report of the King Baudouin Foundation available in French, Dutch and English93

WEBSURVEY

Author(s): 
Laurence.Kohn

 

Draft version attached

AttachmentSize
limesurvey_manual.pdf6.43 MB
Plugin Delphi for Limesurvey manual.pdf1.64 MB
procedure_websurvey.pdf393.71 KB

ECONOMIC EVALUATION AND BUDGET IMPACT ANALYSIS (HTA)

Author(s): 
Irina.Cleemput

KCE has developed guidelines for economic evaluation and budget impact analysis for Belgium. For economic evaluation, there are guidelines for

 

  1. the literature review,
  2. the perspective of the evaluation,
  3. the target population,
  4. the comparators,
  5. the analytic technique,
  6. the study design,
  7. the calculation of costs,
  8. the estimation and valuation of outcomes,
  9. the time horizon,
  10. modelling,
  11. handling uncertainty and testing the robustness of the results,
  12. the discount rate.   

The guidelines for budget impact analysis encompass specificities with respect to the target population and the comparator and refers to guidelines for economic evaluation which should also be respected in the budget impact analysis.

 

Co-author(s): 
Mattias.Neyt
Co-author(s): 
Nancy.Thiry
Co-author(s): 
Stefaan.VandeSande

FORMULATION OF CLINICAL RECOMMENDATIONS (GCP)

Author(s): 
Pascale.Jonckheer

Writing recommendations is one of the most important steps in developing a clinical guideline. (NICE, 2009) According to the GRADE system, a recommendation is depending on several factors: not only the level of evidence, but also the balance between harm and benefit, the patients’ values and preferences, and the cost of the intervention. These factors allow allocating a level of strength to the recommendation which has to be translated in the formulation of the recommendation.

In specific situations, the available literature provides no evidence, or that conflicting or poor evidence that it is not possible to draw clear conclusions. In these cases, several solutions exist: no recommendation, recommendation without grading, recommendation with low strength or an “only in research” recommendation (see Figure 1). The choice between these solutions is not easy.

This part of the process note aims to provide guidance and tips to formulate clinical recommendations with consistency. It is based on documents from other guideline developers as NICE, SIGN, IKNL, from the GRADE literature and a discussion between KCE experts involved in GCP or in data analysis. It is intended to be used by all experts (intern and extern) involved in the development of clinical guidelines.

Figure - Situations after the literature search

Figure - Situations after the literature search

AttachmentSize
GCP_figure1_ElaborationofRecommendation_24_04-2013.png12.44 KB
GCP_figure1_ElaborationofRecommendation_24_04-2013.jpg53.01 KB
Co-author(s): 
Joan.Vlayen
Co-author(s): 
Leen.Verleye

WHICH TYPE OF RECOMMENDATIONS?

Author(s): 
Pascale.Jonckheer

Formulating a recommendation (even if the level of evidence is low) should always be the aim. The other options (not to formulate a recommendation, formulate a “only in research recommendation” or formulate a recommendation without grading) should be exceptions.

Recommendation with grading

The panellists should not be afraid with the formulation of recommendations even if evidence is poor. Absence of a statistically significant effect is no proof that an intervention does not work. It is only proven that an intervention doesn’t work if the confidence interval around the effect estimation excludes a minimally important difference or decision threshold. Even when confidence in effect estimate is low and/or desirable and undesirable consequences are closely balanced, GRADE encourages to make recommendations (inevitably weak) to avoid clinicians frustration with the lack of guidance. (Andrews et al., 2013) As the US Preventative Services Task Force states : “Even though evidence is insufficient, the clinician must still provide advice, patient must make choices, and policy makers must establish policies”.(Petitti et al., 2009)

No recommendation

Decede NOT to formulate a recommendation could be proposed2:

  • When the confidence in effect estimates is so low that the panellists feel a recommendation is too speculative.
  • When “although the confidence in effect estimates is moderate or even high, the trade-offs are so closely balanced, and the values and preferences and ressource implications not known or too variable, that the panel has great difficulty deciding on the direction of the recommendation”.

But as said above, choosing not to make recommendation might be an exception. And if the panel chooses to make no recommendation, the reason (low confidence in effect estimate or close balance between harm and benefit) should be specified.(Andrews et al., 2013)

“Only in research” recommendation

“Only in research” recommendation will be appropriate when 3 conditions are met(Andrews et al., 2013) :

  • There is insufficient evidence supporting an intervention for a panel to recommend its use;
  • Further research has a large potential for reducing uncertainty about the effects of the intervention;
  • Further research is deemed good value for the anticipated costs.

Recommendation without grading

In some cases, grading a recommendation can be superfluous, but the eligibility criteria to choose this option still have to be determined by the KCE and are currently under discussion.

Co-author(s): 
Joan.Vlayen
Co-author(s): 
Leen.Verleye

WORDING OF RECOMMENDATIONS - TIPS AND TRICKS

Author(s): 
Pascale.Jonckheer

Recommendations must be decidable and executable. Do not use assertions of fact as recommendations.(Hussain et al, 2009)

Example of assertion: Suppressive therapy is effective for preventing recurrent infections.(strength of recommendation A-1)

Focus on the action (NICE, 2009)
Recommendation should begin with what needs to be done and should be as specific as possible about the exact intervention being recommended.

Prefer active voice for clarity: choose a verb as “offer”, “measure”, “advise”, “discuss”

Example:

  • Instead of “an intervention is recommended”, say “offer the intervention”
  • Instead of “an intervention may be offered”, say “consider offering the intervention”

Start with the verb

Exception: If recommendations differ for slightly different circumstances, it can be clearer to start with details of the patient group or other information: e.g. “If surgery is being considered, offer to…”

Use direct instructions assuming you are talking to the healthcare professional (HCP) who is working with the patient at that time.

Example: “Record the person’s blood pressure every 6 months”

Exception:

  •  Recommendations about service organization or for target group not HCP: “Care should be provided by a multidisciplinary team”
  • Recommendations concerning a specific type of HCP: “An occupational therapist should assess the patient”
  • Recommendation that use ‘must’ or “must not” because of legal aspects (see below)

Present recommendations in favour of a particular management approach rather than against an approach

Except if a useless or harmful therapy is in wide use (Andrews, 2013)

Clearly distinguish between the distinct concepts of quality evidence and strength of recommendation

Choose the verb according to the strength of recommendation

  • For strong recommendation, “Offer”, “Avoid”,…
  • For weak recommendations, we can add “Consider” before the verb

Add a level of evidence immediatly after each recommendation and whrite it in words

  • From “Very low level of evidence” to “High level of evidence”

Use no symbols to translate the strength of the recommendation.

Be concise, unambiguous and easy to translate into clinical practice(NICE, 2009)

Consider only one action by recommendation, or by bullet point in each recommendation.

To combine briefness and accuracy, word the recommendation in one sentence and put a short text aside (e.g. by clicking) with the rational supporting the recommendation. This rational could encompass the level of evidence and the other considerations issues.

For example the reason why the recommendation is not strong: ”The addition of a systemic anti-inflammatory drug can be considered, but apart from case series, there is currently no convincing evidence that it accelerates the healing process. Moreover the balance benefit/harm is not clear’

In other words, when we're not sure, the clinician is has the right to know on what basis.

Include what readers need to know.
Even if recommendations should be clear and concise, they should contain enough information to be understood without reference to supporting material.(NICE, 2009). Recommendations too vague to be implemented (e.g. “take a comprehensive history” or “a detailed physical examination”) are unhelpful.(Guyatt, 2011) “An ideal recommendation should explicitly or implicitly answers the questions : WHO should do WHAT to WHOM, UNDER WHAT CIRCUMSTANCES, HOW, and WHY?”(Hussain, 2009)

Define any specialized terminology or abbreviation

Define the target population unless it is obvious from the context

Specify the comparator unless it is obvious. Sometimes, the setting can also be important.

Include cross-references to other recommendations if necessary to avoid the need to repeat information such as treatment regimens or definitions of terms.

Emphasise the patient’s involvement

Use “offer” and “discuss” rather than “prescribe” or “give”.(NICE, 2009)

Use words as “people” or “patients” rather than “individuals”, “cases” or “subjects”.(NICE, 2009)

For people with mental health problems, prefer “services users” or “people” instead of “patients”

For people with chronic condition, use “people” rather than “patients”

For healthy pregnant women, do not use “patients”

In the text aside the recommendation, frame values and preferences statements for recommendations particularly sensitive to the patients, and for those for which values and preferences are less certain.(Andrews, 2013)

Example:

“This recommendation places relatively more weight on this x outcome despite the increased risk of this xx adverse event”.

“Patients who prefer to avoid surgery and the high rates of gastro-oesophageal reflux disease seen after surgery, and who are willing to accept a higher initial failure rate and long-term recurrence rate, can reasonably choose pneumatic dilatation”

Formulate each “research recommendation” as an answerable questions or a set of closely related questions.(NICE, 2009)

Use the PICO framework.

Example: “Is benzoyl peroxide or adapalene more clinically and cost effective at reducing the number of non-inflammatory lesions in the treatment of acne vulgaris in adolescents?”

Group the recommendations together in a summary section to facilitate their identification
Provide an explanations of the tips in the beginning of each report for limiting the risk of misunderstanding.

Co-author(s): 
Joan.Vlayen
Co-author(s): 
Leen.Verleye

GUIDELINE DEVELOPMENT: PRINCIPLES (GCP)

Author(s): 
Joan.Vlayen

When clinical practice guidelines are developed at KCE, the principles of AGREE are followed. It is therefore strongly recommended to use the AGREE II instrument as a checklist during all phase of the guideline development.

Since guideline development is a time- and resource-consuming process, strict project management is needed. A generic protocol for guideline development is available on the KCE intranet (for internal use only).

Co-author(s): 
Leen.Verleye

ADAPTE (GCP)

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The ADAPTE Collaboration is an international collaboration of researchers, guideline developers, and guideline implementers who aim to promote the development and use of clinical practice guidelines through the adaptation of existing guidelines. The group's main endeavour is to develop and validate a generic adaptation process that will foster valid and high-quality adapted guidelines as well as the users' sense of ownership towards the adapted guideline. Following the finalization of the ADAPTE Manual and Resource Toolkit and their evaluation, the ADAPTE Collaboration dissolved and transferred the ADAPTE process and its resources to the Guidelines International Network (G-I-N) to facilitate its dissemination.


G-I-N (www.g-i-n.net) made this version of the ADAPTE Manual and Resource Toolkit (version 2.0) available for free on its website. G-I-N established an Adaptation Working Group to support groups undertaking or planning to undertake guideline adaptation and to handle further developments and refinements of the ADAPTE Manual and Resource.


The current ADAPTE methodology and resources are based on the results of an evaluation conducted on a draft manual and toolkit: upon requesting the ADAPTE resources, potential users were sent a survey asking their impressions about the resources and the proposed process.

ADAPTE: To use or not to use?

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The ADAPTE method remains controversial (also within KCE). Some of the discussion points are summarized below:



  • One of the main arguments in favour of ADAPTE is that it would be more efficient. However, this is not proven yet, and the survey mentioned above indicated that savings in time are probably fairly modest in a lot of cases. One of the main reasons for this is the fact that the underlying evidence for each recommendation needs to be verified anyway.

  • ADAPTE and GRADE: if the source guideline did not use GRADE, an appraisal and structuring of the underlying evidence is needed, which is time-consuming. On the other hand, the same is true for systematic reviews that did not use GRADE, which are often used as a starting point at the KCE.

  • ADAPTE critically depends on the availability of recent high-quality guidelines that can be sufficiently trusted. These are not always available, and even high-quality guidelines do not always have exactly the same scope as the guideline to be developed. Above this, high quality (as measured by the AGREE II instrument) is not a guarantee for a correct content.

  • ADAPTE may be useful because clinical recommendations do not automatically and mechanically follow from the available evidence. Interpretation by clinicians is necessary and a good recommendation can highlight the pitfalls or the acceptability issues quoted by clinicians. Ideally, these considerations are already available in the selected source guidelines. Furthermore, identifying published high-quality guidelines allows to compare our recommendations with those from other countries, institutions, etc.

These issues were discussed during two internal consensus meetings at the KCE. The following conclusions were reached:



  • ADAPTE can only be used when high-quality, recent guidelines are available that are in line with the defined PICO. This implies that a GCP project always starts with a search for guidelines. The following criteria will need to be taken into account when assessing the relevance of a guideline:

    • All identified guidelines will need an assessment with the AGREE II instrument by two independent reviewers. Although the domain scores of AGREE II are useful for comparing guidelines and will inform whether a guideline should be recommended for use, the AGREE Consortium has not set minimum domain scores or patterns of scores across domains to differentiate between high-quality and poor-quality guidelines. These decisions should be made in consensus by the reviewers and guided by the context in which AGREE II is being used. Quantified cut-offs, while easy to use and enhancing reproducibility, are not recommended, because they have serious validity problems. The most important domain to be taken into account is ‘Rigour of development’.

    • A criterion that could be taken into account as well is the degree of detail provided by the guideline on the evidence that was used for developing the recommendations. In order to apply GRADE correctly a fair amount of detail is needed: in case it is necessary to retrieve all the primary studies, the gain of adapting a guideline becomes limited.

    • Updating a guideline with a search date that is too old may not be efficient, although it is difficult to recommend a general rule. Two years could be used as a rule of thumb, although this is very context- and topic-specific.

  • Each research team can decide to use or not to use ADAPTE based on written arguments. This decision should be made when the research protocol is written. In case of subcontracting, the choice of method will have to be discussed with the subcontractor. Transparent and documented judgement is key here, not the blind application of a set of rules.

  • If it is decided to use ADAPTE, the ADAPTE Manual and Resource Toolkit should be carefully read. The protocol should contain a clear description of how ADAPTE will be used (e.g. only used for some research questions, update of source guidelines with new evidence, etc). 

The ADAPTE methodology

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The ADAPTE methodology is exhaustively presented in the ADAPTE manual, being accompanied by a resource toolkit [1]. The methods aim to suit the needs of a broad range of stakeholders (from novices to those experienced with guideline development and groups with lesser or greater resources). The key aspects are summarized below.


The adaptation process basically consists of three main phases, each with a set of modules (see Figure on next page):



  • Set-up Phase: Outlines the necessary tasks to be completed prior to beginning the adaptation process (e.g., identifying necessary skills and resources).

  • Adaptation Phase: Assists users through the process of selecting a topic to identifying specific health questions; searching for and retrieving guidelines; assessing the consistency of the evidence and the guideline quality, currency, content, and applicability; decision making around adaptation; and preparing the draft adapted guideline.

  • Final Phase: Guides the user through the process of obtaining feedback on the document from stakeholders impacted by the guideline, consulting with the developers of source guidelines used in the adaptation process, establishing a process for the review and updating of the adapted guideline, and creating a final document.

The ADAPTE process is supported by resources to facilitate its application. Each module of the resource toolkit provides a detailed description of the steps, the products and deliverables, and the skills and organizational requirements.


At the KCE, a summary of the evidence on which the recommendations are based is usually provided in tables, and until now all selected guidelines were updated with more recent evidence. This may not always be necessary, and under time constraints experts in the field could be consulted to see if there are recent developments, provided that the search date of the guideline is not to old.


 


[1] Can be downloaded from the G-I-N website at http://www.g-i-n.net/activities/adaptation


 



GRADE SYSTEM (GCP, HTA)

1. Introduction

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

1.1. What is GRADE?

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE (Grading of Recommendations, Assessment, Development and Evaluation) offers a system for rating quality of evidence in systematic reviews and guidelines and grading strength of recommendations in guidelines. The system is designed for reviews and guidelines that examine alternative management strategies or interventions, which may include no intervention or current best management. It tries to offer a transparent and structured process for developing and presenting evidence summaries for systematic reviews and guidelines in health care and for carrying out the steps involved in developing recommendations.

The GRADE approach is based on a sequential assessment of the quality of evidence, followed by a judgment about the balance between desirable and undesirable effects, and subsequent decision about the strength of a recommendation. Separating the judgments regarding the quality of evidence from judgments about the strength of recommendations is a critical and defining feature of the GRADE system. Therefore, unlike many other grading systems, the GRADE system emphasizes that weak recommendations in the face of high quality evidence are common because of factors other than the quality of evidence influencing the strength of a recommendation, such as balance between desirable and undesirable effects, patient values and preferences and use of resources. For the same reason it allows for strong recommendations based on the evidence from observational studies.

Grade example

Although the GRADE system makes judgments about quality of evidence and strength of recommendations in a systematic and transparent manner, it does not eliminate disagreements in interpreting evidence nor the inevitable need for judgments in deciding on the best among alternative courses of action. This is a very important aspect and the developers of GRADE repeatedly warn against a too mechanistic approach and stress the fact that different experts can use GRADE and come to different conclusions. What matters in the first place is to ensure a maximum transparency about judgments and to explicitly acknowledge the values and preferences that underlie the recommendation.

An introduction to GRADE is given based on the guidance provided by the help function of the GRADEpro [1] software and the series of articles that appeared in the Journal of Clinical Epidemiology. This series is not yet complete and additional articles will be published in the near future, going deeper into several aspects and problems that one may encounter when trying to implement GRADE. It is, however, important to realize that most problems that are encountered when applying GRADE are not in the first place due to GRADE itself, but are general dilemmas that a scientist is confronted with when trying to use scientific evidence supporting recommendations.

Mostly, the only thing GRADE does is to confront you systematically with these problems in a way that makes it difficult to simply ignore them, by forcing you to think about and make explicit judgments on these aspects and report the results of this effort, be it on issues of heterogeneity, precision, indirectness, valid end points or publication bias.

Therefore, GRADE should not be seen as a tool for standardization in the first place, but as a tool for quality assurance and transparency.


[1] http://ims.cochrane.org/gradepro

1.2. Limitations of GRADE

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE is used for systematic reviews, health technology assessments and clinical guidelines. The approach differs in principle somewhat for systematic reviews but this difference is not really relevant for KCE, as KCE does not in principle produces guidelines that do not go together with recommendations.

GRADE is not designed for the evaluations of public health interventions, and is not suitable not for Health System Research.  Although some principles used in grade can be useful,  there are too many unresolved questions.  GRADE is in the first place designed for the evaluations of interventions, be such as drugs, surgical interventions or radiation therapy.  Although GRADE also deals with diagnostics the approach is far from mature in this area and its usefulness is limited, we will explain the problems in more detail at the end of the document

Recommendations that may be helpful but do not need grading are typically those in which it is sufficiently obvious that desirable effects outweigh undesirable effects that no direct evidence is available because no one would be foolish enough to conduct a study addressing the implicit clinical question. Typically, such recommendations are supported by a great deal of indirect evidence, but teasing out the nature of the indirect evidence would be challenging and a waste of time and energy. One way of recognizing such questions is that if one made the alternative explicit, it would be bizarre or laughable.

1.3. Steps in the process

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE includes the following steps:

  • Ask a specific healthcare question to be answered by a recommendation;
  • Identify all important outcomes for this healthcare question;
  • Judge the relative importance of outcomes;
  • Summarize all relevant evidence;
  • Grade the quality of evidence for each outcome and for each comparison;
  • Decide on the overall quality of evidence across outcomes;
  • Include judgments about the underlying values and preferences related to the management options and outcomes;
  • Decide on the balance of desirable and undesirable effects;
  • Decide on the balance of net benefits and cost;
  • Grade the strength of recommendation;
  • Formulate a recommendation;
  • Implement and evaluate.

The steps are visualized in the following flowchart:

2. Framing the question

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE requires a clear specification of the relevant setting, population, intervention, comparator(s), and outcomes (see Building a search question). A guideline question often involves an additional specification, i.e. the setting in which the guideline will be implemented. For instance, guidelines intended for resource-rich environments will often be inapplicable in resource-poor environments. Furthermore, in some cases it may be necessary to specify if the guideline needs to be implemented in an inpatient or an outpatient setting.

Questions must be sufficiently specific: across the range of populations, interventions, and outcomes, a more or less similar effect must be plausible. Because the relative risk associated with an intervention vs. a specific comparator is usually similar across a wide variety of baseline risks, it is usually appropriate for systematic reviews to generate single pooled estimates of relative effects across a wide range of patient subgroups. Recommendations, however, may differ across subgroups of patients at different baseline risk of an outcome, despite there being a single relative risk that applies to all of them. Absolute risk reductions are greater in higher-risk patients, warranting taking a higher risk of side effects and enduring inconvenience. Evidence quality may also differ across subgroups, and this may mandate differing recommendations (higher likelihood of recommending an intervention, or making a stronger recommendation when evidence is of higher quality). Thus, guideline panels must often define separate questions (and produce separate evidence summaries) for high- and low-risk patients, and patients in whom quality of evidence differs.

Example

The case for warfarin therapy associated with both inconvenience and a higher risk of serious bleeding is much stronger in atrial fibrillation patients at substantial vs. minimal risk of stroke.

3. Deciding on important outcomes

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

3.1. General approach

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Outcomes may include survival (mortality), clinical events (e.g. stroke or myocardial infarction), patient-reported outcomes (e.g. specific symptoms, quality of life), adverse events, burdens (e.g. demands on caregivers, frequency of tests, restrictions on lifestyle) and economic outcomes (e.g. cost and resource use). It is critical to identify both outcomes related to adverse effects/harm as well as outcomes related to effectiveness.

Review authors should consider how outcomes should be measured, both in terms of the type of scale likely to be used and the timing of measurement. Outcomes may be measured objectively (e.g. blood pressure, number of strokes) or subjectively as rated by a clinician, patient or carer (e.g. disability scales). It may be important to specify whether measurement scales have been published or validated.

GRADE specifies three categories of outcomes according to their importance. Guideline developers must, and authors of systematic reviews are strongly encouraged to specify all potential patient-important outcomes as the first step in their endeavour. The guideline development group should classify outcomes as:

  • Critical;
  • Important, but not critical;
  • Of limited importance.

The first two classes of outcomes will bear on guideline recommendations; the third may or may not. Ranking outcomes by their relative importance can help to focus attention on those outcomes that are considered most important, and help to resolve or clarify disagreements. GRADE recommends to focus on a maximum of 7 critical and/or important outcomes.

Guideline developers should first consider whether particular desirable or undesirable consequences of a therapy are important to the decision regarding the optimal management strategy, or whether they are of limited importance. If the guideline panel thinks that a particular outcome is important, then it should consider whether the outcome is critical to the decision, or only important, but not critical. To facilitate ranking of outcomes according to their importance guideline developers as well as authors of systematic reviews may choose to rate outcomes numerically on a 1 to 9 scale (7 to 9 – critical; 4 to 6 – important; 1 to 3 – of limited importance) to distinguish between importance categories.

For each recommendations GRADE proposes to limit the number of outcomes to a maximum of 7.

3.2. Perspective of outcomes

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Different audiences are likely to have different perspectives on the importance of outcomes.

The importance of outcomes is likely to vary within and across cultures or when considered from the perspective of patients, clinicians or policy-makers. It is essential to take cultural diversity into account when deciding on relative importance of outcomes, particularly when developing recommendations for an international audience. Guideline panels should also decide what perspective they are taking. Guideline panels may also choose to take the perspective of the society as a whole (e.g. a guideline panel developing recommendations about pharmacological management of bacterial sinusitis may take the patient perspective when considering health outcomes, but also a society perspective when considering antimicrobial resistance to specific drugs).

3.3. Before and after literature review

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

For a guideline, an initial rating of the importance of outcomes should precede the review of the evidence, and this rating should be confirmed or revised following the evidence review.

One should aim to decide which outcomes are important during protocol development and before one undertakes a systematic review or guideline project.

However, rating importance of an outcome prior to evidence review is preliminary: when evidence becomes available a reassessment of importance is necessary.

Grade example 2

Guideline panels should be aware that in some instances the importance of an outcome may only become known after the protocol is written, evidence is reviewed or the analyses are carried out, and should take appropriate actions to include these in the evidence tables.

Outcomes that are critical to decision making should be included in an evidence table whether or not information about them is available.

3.4. Implications of the classification

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Only outcomes considered critical (rated 7—9) or important (rated 4—6) should be included in the evidence profile.

Only outcomes considered critical (rated 7—9) are the primary factors influencing a recommendation and should be used to determine the overall quality of evidence supporting this recommendation.

When determining which outcomes are critical, it is important to bear in mind that absence of evidence on a critical outcome automatically leads to a downgrading of the evidence.

3.5. Expert involvement

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Experts and stakeholders should be involved when determining the research questions and important outcomes. At KCE this usually consists of inviting a number of experts in the field to an expert meeting. While interactions between experts often are useful, there is a real danger that unprepared meetings lead to ‘suboptimal’ decisions. The following may make this process easier:



  • Try to make them focus on the really important questions, there are usually lots of interesting questions but scope needs to be limited

  • Explain on forehand the implications of the term ‘critical outcome’. It is useful to ask the question on beforehand: is the outcome that critical that one is prepared to downgrade the level of evidence if insufficient evidence is found for this particular outcome.

  • Make a proposal on beforehand, expert meetings are often too short to construct a complete framework of questions with the relevant outcomes from scratch by the invited experts.

  • It may be useful to ask experts on beforehand to provide ratings for the different outcomes (e.g. in an Excel sheet) and ask them to put their justification in writing.

  • Try to give an introduction on GRADE so that everybody has an understanding of what it is and what the implications are.

3.6. Use of surrogates

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Guideline developers should consider surrogate outcomes only when high-quality evidence regarding important outcomes is lacking. When such evidence is lacking, guideline developers may be tempted to list the surrogates as their measures of outcome. This is not the approach GRADE recommends. Rather, they should specify the important outcomes and the associated surrogates they must use as substitutes. The necessity to substitute with the surrogate may ultimately lead to rating down the quality of the evidence because of indirectness.

3.7. Clinical decision threshold and minimally important difference

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

When important decisions are made about outcomes, it is also important to consider the minimal clinical importance of an effect size, as this is best decided before the evidence is collected and summarized, in order to avoid subjective and ad hoc decisions influenced by the available evidence.

GRADE uses the term Clinical Decision Threshold, i.e. the threshold that would change the decision whether or not to adopt a clinical action.

For binary outcomes this usually implies a risk reduction. The threshold is likely to differ according to the outcome, e.g. a mortality reduction of 10 % will be more important than a reduction of 10% in the number of patients developing a rash. For continuous outcomes, the minimally important difference is used, i.e. the smallest difference in outcome of interest that informed patients or proxies perceive to be important, either beneficial or harmful, and that would lead the patient or clinician to consider a change in management.

Notes

  • A minimally important difference is measured at the individual level.
  • The effect on a continuous outcome can be expressed as a mean difference, but also as the proportion of patients having a benefit that is above the minimally important difference.

 GRADE example 3

Determining this threshold is not straightforward and often difficult. Expert opinion is often essential.

For a few outcomes validated thresholds exist based on evidence from surveys amongst patients, e.g. the Cochrane back pain group determined a threshold for back and neck pain. Doing a specific literature search on this topic is probably too labour-intensive and moreover, there are no universally accepted and agreed validated methods for doing so. Some rules of thumb are provided by the GRADE working group, such as an increase/decrease of 25%, but one should be cautious to apply these without a critical reflection on the context.

3.8. Adverse effects

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Any intervention may be associated with adverse effects that are not initially apparent. Thus, one might consider ‘‘as-yet-undiscovered toxicity’’ as an important adverse consequence of any new drug. Such toxicity becomes critical only when sufficient evidence of its existence emerges.



The tricky part of this judgment is how frequently the adverse event must occur and how plausible the association with the intervention must be before it becomes a critical outcome. For instance, an observational study found a previously unsuspected association between sulfonylurea use and cancer-related mortality. Should cancer deaths now be an important, or even a critical, endpoint when considering sulfonylurea use in patients with type 2 diabetes? As is repeatedly the case, we cannot offer hard and fast rules for these judgments.

4. Summarizing the evidence

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Evidence must be summarized with results ideally coming from optimally conducted systematic reviews for each patient-important outcome. For each comparison of alternative management strategies, all outcomes should be presented together. It is likely that all studies relevant to a healthcare question will not provide evidence regarding every outcome. The GRADE working group has developed specific approaches to present the quality of the available evidence, the judgments that bear on the quality rating, and the effects of alternative management strategies on the outcomes of interest: the GRADE evidence profile (EP) and the Summary of Findings (SoF) table. An evidence profile is more detailed than a summary of findings table. The main difference is that it provides the detailed judgments and reasons for down- or upgrading per category (see below). As such, an evidence profile provides the information for other guideline developers and validators that allows them to understand how the judgment about the level of evidence was reached.

The GRADEpro software facilitates the process of developing both EPs and SoF tables.

The SoF table consists of 7 elements:

  1. A list of all important outcomes, both desirable and undesirable;
  2. A measure of the typical burden of these outcomes (e.g. control group, estimated risk);
  3. A measure of the risk in the intervention group or, alternatively or additionally, a measure of the difference between the risks with and without intervention;
  4. The relative magnitude of effect;
  5. Numbers of participants and studies addressing these outcomes;
  6. A rating of the overall confidence in effect estimates for each outcome (which may vary by outcome);
  7. Comments.

For binary outcomes, relative risks (RRs) are the preferred measure of relative effect and, in most instances, are applied to the baseline or control group risks to generate absolute risks. Ideally, the baseline risks come from observational studies including representative patients and identifying easily measured prognostic factors that define groups at differing risk. In the absence of such studies, relevant randomized trials provide estimates of baseline risk. When confidence intervals (CI) around the relative effect include no difference, one may simply state in the absolute risk column that results fail to show a difference, omit the point estimate and report only the CIs, or add a comment emphasizing the uncertainty associated with the point estimate.

On top of that GRADE provides a number of supplementary recommendations:

  • A SoF table should present the seven (or fewer) most important outcomes. These outcomes must always be patient-important outcomes and never be surrogates, although surrogates can be used to estimate effects on patient-important outcomes;
  • A SoF table should present the highest quality evidence;
  • When quality of two bodies of evidence (e.g. randomized trials and observational studies) is similar, a SoF table may include summaries from both.

5. Rating the quality of evidence

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

5.1. Introduction

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE specifies four quality categories (high, moderate, low, and very low) that are applied to a body of evidence, but not to individual studies. In the context of a systematic review, quality reflects our confidence that the effect estimates are correct. In the context of recommendations, quality reflects our confidence that the effect estimates are adequate to support a particular recommendation.


Guideline panels have to determine the overall quality of evidence across all the critical outcomes essential to a recommendation they make. Guideline panels usually provide a single grade of quality of evidence for every recommendation, but the strength of a recommendation usually depends on evidence regarding not just one, but a number of patient-important outcomes and on the quality of evidence for each of these outcomes.


When determining the overall quality of evidence across outcomes:



  • Consider only those outcomes that are deemed critical;

  • If the quality of evidence differs across critical outcomes and outcomes point in different directions — towards benefit and towards harm — the lowest quality of evidence for any of the critical outcomes determines the overall quality of evidence;

  • If all outcomes point in the same direction — towards either benefit or harm — the highest quality of evidence for a critical outcome, that by itself would suffice to recommend an intervention, determines the overall quality of evidence. However, if the balance of the benefits and harms is uncertain, the grade of the critical outcome with the lowest quality grading should be assigned.

5.1.1. Four levels of evidence

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Randomized trials start as high-quality evidence, observational studies as low quality (see table). ‘‘Quality’’ as used in GRADE means more than risk of bias and may also be compromised by imprecision, inconsistency, indirectness of study results, and publication bias. In addition, several factors can increase our confidence in an estimate of effect. This general approach is summarized in the table below.




In the following chapters these factors will be discussed in depth. However, it is important to emphasize again that GRADE warns against applying this upgrading and downgrading in a too mechanistic way and to leave room for judgment.


Although GRADE suggests the initial separate consideration of five categories for rating down the quality of evidence and three categories for rating up, with a yes/no decision in each case, the final rating of overall evidence quality occurs in a continuum of confidence in the validity, precision, consistency, and applicability of the estimates. Fundamentally, the assessment of evidence quality remains a subjective process, and GRADE should not be seen as obviating the need for or minimizing the importance of judgment. As repeatedly stressed, the use of GRADE will not guarantee consistency in assessment, whether it is of the quality of evidence or of the strength of recommendation. There will be cases in which competent reviewers will have honest and legitimate disagreement about the interpretation of evidence. In such cases, the merit of GRADE is that it provides a framework that guides one through the critical components of this assessment and an approach to analysis and communication that encourages transparency and an explicit accounting of the judgments involved.

5.1.2. Overall quality of evidence

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Guideline panels have to determine the overall quality of evidence across all the critical outcomes essential to a recommendation they make. Guideline panels usually provide a single grade of quality of evidence for every recommendation, but the strength of a recommendation usually depends on evidence regarding not just one, but a number of patient-important outcomes and on the quality of evidence for each of these outcomes.


When determining the overall quality of evidence across outcomes:



  • Consider only those outcomes that are deemed critical;

  • If the quality of evidence differs across critical outcomes and outcomes point in different directions — towards benefit and towards harm — the lowest quality of evidence for any of the critical outcomes determines the overall quality of evidence;

  • All outcomes point in the same direction — towards either benefit or harm — the highest quality of evidence for a critical outcome that by itself would suffice to recommend an intervention determines the overall quality of evidence. However, if the balance of the benefits and downsides is uncertain, then the grade of the critical outcome with the lowest quality grading should be assigned.

5.1.3. GRADE and meta-analysis

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE relies on the judgment about our confidence in a (beneficial or adverse) effect of an intervention and therefore it is impossible to apply GRADE correctly if a meta-analysis is not at least considered and the necessary judgments are made on (statistical, methodological and clinical) heterogeneity. It is possible that no pooled effect can or should be calculated if there is evidence of heterogeneity, be it clinical, methodological or merely statistical, but meta-analysis should always be attempted. Otherwise, it is impossible to gather sufficient elements to make the necessary GRADE judgments. Note that heterogeneity is in most cases a reason to downgrade the body of evidence, with some exceptions that will be explained later.


In order to apply GRADE (but actually in order to make a sound judgment on evidence in general) it is essential that at least one person implicated in the development of the guideline understands this guidance and is able to apply it.


GRADE remains rather vague about what to do if only one study is available. We recommend to downgrade the evidence with at least one level, except when the single study is a multicentre study where sample size in the individual centres is sufficient to demonstrate heterogeneity if there is any. Any decision not to downgrade must be explained and justified.


If the primary studies do not allow the calculation of a confidence interval, consider downgrading as judging precision and heterogeneitiy becomes difficult. There are some rare exceptions, when the confidence interval is not needed as all studies point clearly in the same direction.  In some cases non-parametric tests are used because the assumption of normality is violated. In these case, the non-parametric measure of uncertainty should be used (most of the time an interquartile range) and interpreted.  Decisions taken around these issues should be justified.

5.2. Study limitations, risk of bias

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

5.2.1. Risk of bias assessment of individual studies should be done using a validated assessment tool

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

For the quality assessment of individual studies, specific tools were selected by the KCE.


There is no agreed KCE tool for observational studies. At this moment, we limit ourselves to a number of elements that need to be verified when looking at observational studies. There are a large number of assessment tools, but in the scientific community there is considerable disagreement on what items really matter. Moreover, observational studies are way more diverse then RCTs.


Study limitations in observational studies as identified by GRADE are:



  • Failure to develop and apply appropriate eligibility criteria (inclusion of control population);

  • Under- or overmatching in case-control studies;

  • Selection of exposed and unexposed in cohort studies from different populations;

  • Flawed measurement of both exposure and outcome;

  • Differences in measurement of exposure (e.g., recall bias in case-control studies);

  • Differential surveillance for outcome in exposed and unexposed in cohort studies;

  • Failure to adequately control confounding;

  • Failure of accurate measurement of all known prognostic factors;

  • Failure to match for prognostic factors and/or lack of adjustment in statistical analysis;

  • Incomplete follow-up.

5.2.2. Moving from individual risk of bias to a judgment about rating down for risk of bias across a body of evidence

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Moving from risk of bias criteria for each individual study to a judgment about rating down for risk of bias across a group of studies addressing a particular outcome presents challenges. GRADE suggests the following principles:



  • First, in deciding on the overall quality of evidence, one does not average across studies (for instance if some studies have no serious limitations, some serious limitations, and some very serious limitations, one does not automatically rate quality down by one level because of an average rating of serious limitations). Rather, judicious consideration of the contribution of each study, with a general guide to focus on the high-quality studies, is warranted.

  • Second, this judicious consideration requires evaluating the extent to which each trial contributes toward the estimate of magnitude of effect. This contribution will usually reflect study sample size and number of outcome events: larger trials with many events will contribute more, much larger trials with many more events will contribute much more.

  • Third, one should be conservative in the judgment of rating down. That is, one should be confident that there is substantial risk of bias across most of the body of available evidence before one rates down for risk of bias.

  • Fourth, the risk of bias should be considered in the context of other limitations. If, for instance, reviewers find themselves in a close-call situation with respect to two quality issues (risk of bias and, e.g. precision), we suggest rating down for at least one of the two.

  • Fifth, notwithstanding the first four principles, reviewers will face close-call situations. They should both acknowledge that they are in such a situation, make it explicit why they think this is the case, and make the reasons for their ultimate judgment apparent.

This approach is summarized in the table below.


5.3. Inconsistency

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Widely differing estimates of the treatment effect (i.e. heterogeneity or variability in results) across studies suggest true differences in underlying treatment effect. When heterogeneity exists, but investigators fail to identify a plausible explanation, the quality of evidence should be downgraded by one or two levels, depending on the magnitude of the inconsistency in the results.


Inconsistency may arise from differences in:



  • Populations (e.g. drugs may have larger relative effects in sicker populations);

  • Interventions (e.g. larger effects with higher drug doses);

  • Outcomes (e.g. diminishing treatment effect with time).

Guideline panels or authors of systematic reviews should also consider the extent to which they are uncertain about the underlying effect due to the inconsistency in results and they may downgrade the quality rating by one or even two levels.


GRADE suggests rating down the quality of evidence if large inconsistency (heterogeneity) in study results remains after exploration of a priori hypotheses that might explain heterogeneity.

5.3.1. Heterogeneity and inconsistency

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE uses inconsistency and heterogeneity rather interchangeably. However, there are some important nuances:



  • A heterogeneity in effect – where it can be assumed that it is randomly distributed – may be due to random variation in the effect amongst studies. To properly address this, the pooled effect should be calculated using random modelling (RevMan uses the DerSimonian and Laird random effects model, but other techniques, such as Bayesian and maximum likelihood, are often used as well). An important condition for the use of these techniques is that it must be plausible that the heterogeneous effect is randomly distributed, which is not always easy to verify. DerSimonian, Laird and maximum likelihood methods have an additional assumption that the effect is normally distributed, while with Bayesian techniques another distribution can be used as well. The studies in this case cannot be considered as inconsistent, and the heterogeneity is accounted for here by the larger confidence interval, so no downgrading is needed here. Note that if the heterogeneity statistic Q is less than or equal to its degrees of freedom (so if I² = 0), DerSimonian gives results that are numerically identical to the (non random effects) inverse variance method.

  • If heterogeneity is important for one reason or another, but all estimates point in the same direction, e.g. a strong or very strong effect of the intervention, then one should not necessary downgrade for inconsistency but make a judgement on the plausibility of the study results.

5.3.2. Judging heterogeneity and inconsistency

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Exploring and judging heterogeneity is probably the most difficult part in performing and judging a meta-analysis. A number of rules are presented, but a full explanation can be found in the Cochrane Handbook (chapters 9.5 and 9.6). A KCE expert exploring and judging heterogeneity should at least have a good understanding of and ability to apply this Cochrane guidance. If not, he/she should ask for help from somebody who does.


GRADE identifies four criteria for assessing inconsistency in results, and reviewers should consider rating down for inconsistency when:



  1. Point estimates vary widely across studies;

  2. Confidence intervals (CIs) show minimal or no overlap;

  3. The statistical test for heterogeneity which tests the null hypothesis that all studies in a meta-analysis have the same underlying magnitude of effect shows a low p-value;

  4. The I², which quantifies the proportion of the variation in point estimates due to between-study differences, is large

In the past, rigid criteria were used to judge heterogeneity, e.g. an I² of 50% used to be a common threshold. This improves the consistency in judgments, but one risks to be consistently wrong. All statistical approaches have their limitations, and their results should be seen in the context of a subjective examination of the variability in point estimates and the overlap in CIs. So again, transparent judgments are essential here.


What is a large I²? One set of criteria would say that an I² of less than 40% is low, 30 to 60% may be moderate, 50 to 90% may be substantial, and 75 to 100% is considerable. Note the overlapping ranges and the equivocation (‘‘may be’’): an implicit acknowledgment that the thresholds are both arbitrary and uncertain. When individual study sample sizes are small, point estimates may vary substantially, but because variation may be explained by chance, I² may be low. Conversely, when study sample size is large, a relatively small difference in point estimates can yield a large I².


5.3.3. Other considerations

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen


  • Risk differences (i.e. absolute risk reductions) in subpopulations tend to vary widely. Relative risk (RR) reductions, on the other hand, tend to be similar across subgroups, even if subgroups have substantial differences in baseline risk. GRADE considers the issue of difference in absolute effect in subgroups of patients, much more common than differences in relative effect, as a separate issue. When easily identifiable patient characteristics confidently permit classifying patients into subpopulations at appreciably different risk, absolute differences in outcome between intervention and control groups will differ substantially between these subpopulations. This may well warrant differences in recommendations across subpopulations.

  • Rate down for inconsistency, not up for consistency.

  • Even when there is heterogeneity in effect, one must evaluate if the heterogeneity affects your judgment on clinical effectiveness, e.g. when there are large differences in the effect size, but when the estimations point to the same direction (all beneficial or all harmful).

  • Reviewers should combine results only if – across the range of patients, interventions, and outcomes considered – it is plausible that the underlying magnitude of treatment effect is similar. This decision is a matter of judgment. Magnitude of intervention effects may differ across studies, due to the population (e.g. disease severity), the interventions (e.g. doses, co-interventions, comparison of interventions), the outcomes (e.g. duration of follow-up), or the study methods (e.g. randomized trials with higher and lower risk of bias). If one of the first three categories provides the explanation, review authors should offer different estimates across patient groups, interventions, or outcomes. Guideline panelists are then likely to offer different recommendations for different patient groups and interventions. If study methods provide a compelling explanation for differences in results between studies, then authors should consider focusing on effect estimates from studies with a lower risk of bias.

Beware of subgroup analyses. The warning below originates from the Cochrane Handbook (chapter 9.6). When confronted with this, consult at least a second opinion of a knowledgeable person.


Subgroup analyses involve splitting all the participant data into subgroups, often so as to make comparisons between them. Subgroup analyses may be done for subsets of participants (such as males and females), or for subsets of studies (such as different geographical locations). Subgroup analyses may be done as a means of investigating heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study. Findings from multiple subgroup analyses may be misleading. Subgroup analyses are observational by nature and are not based on randomized comparisons (an exception is when randomisation is stratified within these subgroups). False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed (this is due to the multiple testing problem: if you perform a significant test frequently enough, you are likely to find by chance a statistically significant result). If findings are presented as definitive conclusions, there is clearly a risk of patients being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations about directions for future research that, if followed, would waste scarce resources.

5.4. Indirectness

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Direct evidence comes from research that directly compares the interventions in which we are interested when applied to the populations in which we are interested and measures outcomes important to patients. Evidence can be indirect in one of four ways:



  • First, patients may differ from those of interest (the term applicability is often used for this form of indirectness).

  • Secondly, the intervention tested may differ from the intervention of interest. Decisions regarding indirectness of patients and interventions depend on an understanding of whether biological or social factors are sufficiently different that one might expect substantial differences in the magnitude of effect.

  • Thirdly, outcomes may differ from those of primary interest, for instance, surrogate outcomes that are not themselves important, but measured in the presumption that changes in the surrogate reflect changes in an outcome important to patients.

  • A fourth type of indirectness, conceptually different from the first three, occurs when clinicians must choose between interventions that have not been tested in head-to-head comparisons. Making comparisons between treatments under these circumstances requires specific statistical methods and will be rated down in quality one or two levels depending on the extent of differences between the patient populations, co-interventions, measurements of the outcome, and the methods of the trials of the candidate interventions.

5.5. Imprecision

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

GRADE suggests that examination of 95% confidence intervals (CIs) provides the optimal primary approach to decisions regarding imprecision. Results are considered imprecise when studies include relatively few patients and few events and thus have wide confidence intervals around the estimate of the effect. In this case a guideline panel will judge the quality of the evidence lower than it otherwise would because of resulting uncertainty in the results.


As a general principle GRADE recommends to consider the rating down for imprecision If a recommendation or clinical course of action would differ if the upper versus the lower boundary of the CI represented the truth.  In order to judge this the clinical decision threshold needs to be defined. We explain the situation for both categorical and continuous outcomes.

5.5.1. Clinical decision threshold and minimally important difference

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

5.5.1.1. Categorical outcomes

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

As we explained before, it is good to discuss the clinical decision threshold and try to determine them on beforehand. However, one must be careful not to apply this in a too mechanical way, as you need also to take into account the downsides of the intervention, such as side effect or morbidity induced by the intervention, on which in a lot of cases you have only information after collecting the evidence. If it is not possible to determine a threshold, then GRADE suggests to use a RRR or RRI greater than 25%. This 25% threshold is a good and pragmatic starting point, but other factors should be taken into account as well, such as the absolute risk reduction.

5.5.1.2. Continuous outcomes

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

If the 95%CI upper or lower CI crosses the minimally important difference (MID), either for benefit of harm.


Note: if the MID is not known or the use of different outcome measures required calculation of an effect size, we suggest downgrading if the upper or lower CI crosses an effect size of 0.5 in either direction.


Effect size (ES):


A generic term for the estimate of effect of treatment for a study. Sometimes, the term is used to refer to the standardized mean difference, the difference between two estimated means divided by an estimate of the standard deviation.


To facilitate the understanding we suggest an interpretation of the effect size offered by Cohen, also known as ‘Cohen’s d’[1]. According to this interpretation, an effect size or SMD of around:



  • 0.2 is considered a small effect;

  • 0.5 is considered a moderate effect;

  • 0.8 or higher is considered a large effect.

There are also other methods to standardise effect sizes, such as Glass ∆ or hedges d, depending on the choice of the estimator of the standard deviation.



[1] (Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed; 1988).

5.5.2. Application and examples

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The concrete application of these rules differs according to the situation.  Note that imprecision is judged on all outcomes that are judged critical, so the precision of the estimations of the effects on both harms and benefits need to be evaluated.



  • Studies indicate a benefit compared to the alternative so you consider recommending the intervention based on the fact that it is beneficial in a clinically significant way.

In this case you must prove not only that the pooled or best estimate of the effect is different from no effect but that it is better than the clinical decision threshold.  So the confidence interval must exclude this threshold.



  • Studies indicate a harm compared to the alternative so you consider not recommending the intervention based on the fact that it is harmful in a clinically significant way.

In this case you must prove not only that the pooled or best estimate of the effect is different from no effect but that it is more harmful then the clinical decision threshold.  So the confidence interval must exclude this threshold.



  • Studies indicate that it is not better than the alternative so you consider not recommending the intervention based on the fact that it is not better in a clinically significant way.

In this case you must prove that the pooled or best estimate of the effect  is lower that the clinical decision threshold.  So the confidence interval must exclude this threshold.



  • Studies indicate that it is not more harmful than the alternative so you consider recommending the intervention based on the fact that it is not more harmful in a clinically significant way.

In this case you must prove that the pooled or best estimate of the effect  is lower that the clinical decision threshold.  So the confidence interval must exclude this threshold.


The last two situations amount to proving non superiority or non inferiority. If the line of no effect is crossed, this does not play a major role in the decision.



5.5.3. Relative versus absolute reductions

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The recommended output of a meta-analysis is a relative risk or odds ratio as they are more stable across different baseline risks, but absolute reductions are more useful for clinical decisions. The valid way of estimating the absolute risk reduction is to multiply the relative risk from the meta-analysis with the most plausible estimate or range of estimates of the baseline risk, this is the risk in the assumed control group, often called the assumed control risk.


Assumed control risk (ACR):


As relative risks are more stable, absolute risk reductions are best calculated applying a relative risk reduction to an assumed control risk. Determining the assumed control risk is not always easy and straightforward. The best estimates usually come from Belgian data, e.g. for cancer the data from the cancer registry are a valuable tool. When not available one can use expert opinion or take the baseline effect from studies in the literature. When different studies exist with varying baseline risk sometimes the median control risk is taken (this is what GRADE pro does by default), but it may be better to report a range of plausible values.

5.5.4. Dealing with fragility: Optimal Information Size (OIS)

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The clinical decision threshold criterion is not completely sufficient to deal with issues of precision. The reason is that confidence intervals may appear narrow, but small numbers of events may render the results fragile.



The reasoning above suggests the need for, in addition to CIs, another criterion for adequate precision. GRADE suggests the following: if the total number of patients included in a systematic review is less than the number of patients generated by a conventional sample size calculation for a single adequately powered trial, consider the rating down for imprecision. Authors have referred to this threshold as the “optimal information size” (OIS). Many online calculators for sample size calculation are available, you can find one simple one at http://www.stat.ubc.ca/∼rollin/stats/ssize/b2.html.


As an alternative to calculating the OIS, review and guideline authors can also consult a figure to determine the OIS. The figure presents the required sample size (assuming α of 0.05, and β of 0.2) for RRR of 20%, 25%, and 30% across varying control event rates. For example, if the best estimate of control event rate was 0.2 and one specifies an RRR of 25%, the OIS is approximately 2 000 patients.


The choice of RRR is a matter of judgment. The GRADE handbook suggested using RRRs of 20% to 30% for calculating the OIS, but there may be instances in which compelling prior information would suggest choosing a larger value for the RRR for the OIS calculation.


Beware, however, not to base your sample size on the RRR of minimally clinical importance, a practice that is suitable for sample size calculations when you set up studies, but not for judging fragility, because it leads to paradoxes: if expected effects are considerably larger than what is clinically important because the clinically important effect is small (e.g. a small effect on mortality in children would be considered important), you would risk to downgrade without good reasons because the required sample size would be too large. Note that the OIS helps judging the stability of the CIs, and not if the study was large enough to detect a difference.



Power is, however, more closely related to number of events than to sample size. The figure presents the same relationships using total number of events across all studies in both treatment and control groups instead of total number of patients. Using the same choices as in the prior paragraph (control event rate 0.2 and RRR 25%), one requires approximately 325 events to meet OIS criteria.



Calculating the OIS for continuous variables requires specifying:



  • probability of detecting a false effect – type I error (α; usually 0.05)

  • probability of detecting a true effect – power (usually 80% [power = 1 – type II error; β; usually 0.20])

  • realistic difference in means (Δ)

  • appropriate standard deviation (SD) from one of the relevant studies (we suggest the median of the available trials or the rate from a dominating trial, if it exists).

For continuous variables we should downgrade when total population size is less than 400 (a threshold rule-of-thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).  In general an number of events of more then 400 guarantees the stability of a confidence interval.

5.5.5. Low event rates with large sample size: an exception to the need for OIS

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

In the criteria we have offered so far, our focus has been on relative effects. When event rates are very low, CIs around relative effects may be wide, but if sample sizes are sufficiently large, it is likely that prognostic balance has indeed been achieved, and rating down for imprecision becomes inappropriate.


The decision regarding the magnitude of effect that would be important is a matter of judgment. When control rates are sufficiently low, CIs around relative effects can appear very wide, but CIs around absolute effects will nevertheless be narrow. Thus, although one would intuitively rate down for imprecision considering only the CI around the relative effect, consideration of the CI around the absolute effect may lead to an appropriate conclusion that precision is adequate. Note that the inference of unimportance requires a low incidence of events over the desirable duration of follow-up; short follow-up will generate a low incidence of events that may be misleading.

5.6. Publication bias

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Publication bias is a systematic underestimate or an overestimate of the underlying beneficial or harmful effect due to the selective publication of studies. Publication bias arises when investigators fail to report studies they have undertaken (typically those that show no effect). Authors should suspect publication bias when available evidence comes from a number of small studies, most of which have been commercially funded.


A number of approaches based on the examination of the pattern of data are available to help assess publication bias. The most popular of these is the funnel plot. All, however, have substantial limitations and authors of reviews and guideline panels must often guess about the likelihood of publication bias. Again, there is no substitute for judgment.


Note that selective reporting of outcomes should be dealt with in the assessment of the individual studies.


Guideline panels or authors of systematic reviews should consider the extent to which they are uncertain about the magnitude of the effect due to selective publication of studies and they may downgrade the quality of evidence by one or even two levels. As there are no validated decision rules to do so it is important to provide a narrative justification of the final decision on this issue after consultation of the experts.


Trials registries are in principle compulsory now and can play a major role in detecting selective reporting.

5.7. Reasons to upgrade studies

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Observational studies are by default considered low level of evidence. However, the level of evidence can be upgraded for a number of reasons. an important general remark on this issue: only studies with no threats to validity (not downgraded for any reason) can be upgraded. RCT evidence can in principle be upgraded, but GRADE considers this as rare and exceptional.

5.7.1. Large magnitude of effect

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

When methodologically strong observational studies yield large or very large and consistent estimates of the magnitude of a treatment or exposure effect, we may be confident about the results. In these situations, the weak study design is unlikely to explain all of the apparent benefit or harm, even though observational studies are likely to provide an overestimate of the true effect.


The larger the magnitude of effect, the stronger becomes the evidence. As a rule of thumb, the following criteria were proposed by GRADE:



  • Large, i.e. RR >2 or <0.5 (based on consistent evidence from at least 2 studies, with no plausible confounders): upgrade 1 level

  • Very large, i.e. RR >5 or <0.2 (based on direct evidence with no major threats to validity): upgrade 2 levels

5.7.2. All plausible confounders

On occasion, all plausible confounding from observational studies or randomized trials may be working to reduce the demonstrated effect or increase the effect if no effect was observed.


For example, if only sicker patients receive an experimental intervention or exposure, yet they still fare better, it is likely that the actual intervention or exposure effect in less sick patients is larger than the data suggest.

5.7.3. Dose-response gradient

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The presence of a dose-response gradient may increase our confidence in the findings of observational studies and thereby increase the quality of evidence.

6. Recommendations

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

The strength of a recommendation reflects the extent to which a guideline panel is confident that desirable effects of an intervention outweigh undesirable effects, or vice versa, across the range of patients for whom the recommendation is intended. GRADE specifies only two categories of the strength of a recommendation. While GRADE suggests using the terms strong and weak recommendations, those making recommendations may choose different wording to characterize the two categories of strength.


For a guideline panel or others making recommendations to offer a strong recommendation, they have to be certain about the various factors that influence the strength of a recommendation. The panel also should have the relevant information at hand that supports a clear balance towards either the desirable effects of an intervention (to recommend an action) or undesirable effects (to recommend against an action). A strong recommendation is one for which the guideline panel is confident that the desirable effects of an intervention outweigh its undesirable effects (strong recommendation for an intervention) or that the undesirable effects of an intervention outweigh its desirable effects (strong recommendation against an intervention). A strong recommendation implies, that most or all individuals will be best served by the recommended course of action.


When a guideline panel is uncertain whether the balance is clear or when the relevant information about the various factors that influence the strength of a recommendation is not available, a guideline panel should be more cautious and in most instances it would opt to make a weak recommendation. A weak recommendation is one for which the desirable effects probably outweigh the undesirable effects (weak recommendation for an intervention) or undesirable effects probably outweigh the desirable effects (weak recommendation against an intervention) but appreciable uncertainty exists. A weak recommendation implies, that not all individuals will be best served by the recommended course of action. There is a need to consider more carefully than usual individual patient’s circumstances, preferences, and values.

6.1. Four key factors influence the strength of a recommendation

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen


  • Balance between desirable and undesirable effects (not considering costs)

The larger the difference between the desirable and undesirable consequences, the more likely a strong recommendation is warranted. The smaller the net benefit and the lower the certainty for that benefit, the more likely a weak recommendation is warranted.



  • Quality of the evidence

The higher the quality of evidence, the more likely is a strong recommendation.



  • Values and preferences

 The greater the variability in values and preferences, or uncertainty in values and preferences, the more likely a weak recommendation is warranted.



  • Costs (resource utilization)

 The higher the costs of an intervention – that is, the more resources are consumed – the less likely a strong recommendation is warranted.

6.2. Wording of a recommendation

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen

Wording of a recommendation should offer clinicians as many indicators as possible for understanding and interpreting the strength of recommendations:



  • For strong recommendations, the GRADE working group has suggested adopting terminology, such as "we recommend..." or "clinicians should...".

  • For weak recommendations, the GRADE working group has suggested less definitive wording, such as "we suggest..." or "clinicians might...".

Whatever terminology guideline panels use to communicate the dichotomous nature of a recommendation, it is essential that they inform their users what the terms imply. Guideline panels should describe patients or populations (characterized by the disease and other identifying factors) for whom the recommendation is intended and a recommended intervention as specifically and detailed as possible.


Wording strong and weak recommendations is particularly important when guidelines are developed by international organizations and/or are intended for patients and clinicians in different regions, cultures, traditions, and usage of language. It is also crucial to explicitly and precisely consider wording when translating recommendations into different languages.


It is important to adapt the wording of the recommendation to the available evidence. Absence of a statistically significant effect is no proof that an intervention does not work. It is only proven that an intervention doesn’t work if the confidence interval around the effect estimation excludes a minimally important difference or decision threshold.

7. GRADE and diagnostic testing

Author(s): 
Jo.Robays
Author(s): 
Joan.Vlayen
Author(s): 
Leen.Verleye

The GRADE system can be used to grade the quality of evidence and strength of recommendations for diagnostic tests or strategies. There are still a number of limitations and problems that are not entirely solved yet. However, there are some informative publications coming from the GRADE working group on this topic to guide authors of systematic reviews and guideline developers using GRADE to assess the quality of a body of evidence from diagnostic test accuracy (DTA) studies (Schünemann 2008, Brozek 2009, Hsu 2011, Schünemann 2016). People from the GRADE working group continue to publish on GRADE for diagnostic testing, see for example Journal of Clinical Epidemiology Volume 92 (December 2017) .

AttachmentSize
GRADE_for_Diagnosis_KCE_guidance_final2_20171220.pdf742.06 KB

RAPID REVIEWS

See the attached document.

Table of Content

  1. WHAT IS A RAPID REVIEW?
  2. WHY ARE RAPID REVIEWS NEEDED?
  3. DO WE HAVE EVIDENCE THAT RAPID REVIEWS ARE VALID?
  4. HOW TO PRODUCE RAPID REVIEWS AT KCE?
    1. SCOPE
    2. COMPREHENSIVENESS
      1. Limit the search strategy
      2. Using a published systematic review as the core document
      3. Limit textual analysis
    3. QUALITY CONTROL
      1. Only one reviewer for title/abstract screening and data extraction
      2. Limit or eliminate internal or external review of final product (e.g. peer review)
    4. TRANSPARENT REPORTING
  5. CONCLUSIONS
AttachmentSize
Rapid_Review_2.pdf214.97 KB