Free access

Review

20 October 2021

Real-world evidence: a practical toolbox for collecting health state utilities

Authors: Veronique Lambert-Obry https://orcid.org/0000-0002-2499-1377 [email protected], Jean-Philippe Lafrance, Michelle Savoie, and Jean LachaineAuthor Info & Affiliations

Publication: Journal of Comparative Effectiveness Research

Volume 11, Number 1

https://doi.org/10.2217/cer-2021-0121

PDF

Abstract

Health state utilities (HSU) data collected in real-world evidence studies are at risk of bias. Although numerous guidance documents are available, practical advice to avoid bias in HSU studies is limited. Thus, the objective of this article was to develop a concise toolbox intended for investigators seeking to collect HSU in a real-world setting. The proposed toolbox builds on existing guidance and provides practical steps to help investigators perform good quality research. The toolbox aims at increasing the credibility of HSU data for future reimbursement decision making.

Health technology assessment (HTA) is a multidisciplinary process that encompasses diverse dimensions such as clinical, economic, organizational, social, cultural and ethical considerations [1]. Cost-utility analyses (CUA), which present results in terms of cost per quality-adjusted life-year, can be used to inform decision making [2]. Utility data are required to calculate quality-adjusted life-years. Health state utilities (HSU) describe the value of a health state on a scale where 1 represents full health, 0 represents dead and negative values represent states worse than death [2]. Well-designed randomized controlled trials (RCTs) can be a reliable source for generating utility data. However, RCTs often provide estimates by treatment arm rather than HSU, which refer to estimates for given economic model health states. Moreover, RCTs have a low external validity and may not capture all health states relevant to economic modeling. Therefore, HSU may have to be collected in real-world evidence (RWE) studies (herein defined as studies that are not conventional RCTs [3]), which are at higher risk of bias than RCTs. Regulatory and reimbursement agencies have shown growing interest in RWE to demonstrate the effectiveness of new therapies. Threats to internal validity of RWE studies have led to the publication of several guidance documents providing good procedural practices for comparative effectiveness research. In the same way as clinical outcomes, the estimation of HSU data can also be biased. Several publications showed that pharmacoeconomic data collected in RWE studies are at risk of bias, which can directly impact the incremental cost–effectiveness ratio (ICER) and lead to the wrong conclusions on cost–effectiveness [4–15]. Furthermore, sensitivity analyses of economic evaluations often show that the ICER is sensitive not only to clinical inputs but also to utility values [16,17]. Considering the risk of bias and the potential significant impact on the ICER, RWE studies collecting HSU should also entail good procedural practices.

According to stakeholder interviews, one of the challenges when using RWE for coverage decisions is the lack of investigator expertise (at interpreting data and adjusting for bias) [18]. Recommendations adapted for the collection of HSU in a real-life practice could help to estimate high-quality inputs for reimbursement decision making. To our knowledge, none of the existing guidelines focus on methodologies for generating HSU in real-life practice specifically. Thus, the objective of this article is to review current recommendations for HSU generation, and incorporate methodological standards into a single toolbox intended for investigators seeking to collect HSU in a real-world setting.

Methods

To identify existing guidance on the collection of HSU, guidelines and methodology reports of countries with a long tradition of HTA [19] were screened: Australia [20], Canada [21], England and Wales [22], France [23], Germany [24], the Netherlands [25], Scotland [26], Spain (Catalonia) [27] and Sweden [28]. Guidelines published by international HTA organizations were also screened, including the European Network for Health Technology Assessment (EUnetHTA) [29], Health Technology Assessment International (HTAi) [30], International Health Economics Association (iHEA) [31], International Network of Agencies for Health Technology Assessment (INAHTA) [32] and International Society for Pharmacoeconomics and Outcomes Research (ISPOR) [33]. The search was conducted up to April 2021. Using the keywords ‘utility’ and ‘utilities,’ 58 potentially relevant publications were identified by titles and abstracts (when available). Full-text publications were then reviewed for eligibility, resolving any disagreements through discussion. Selection was limited to publications providing guidance on how to collect HSU. Following full-text screening, 49 publications did not specifically discuss HSU collection and were therefore excluded for the following reasons: focus on economic modeling/CUA methodologies (n = 29), focus on mapping (n = 13), specific to a certain population (n = 4) and specific to certain disease areas (n = 3). Among the nine remaining publications, two were duplicates (journal articles based on previously published Technical Support Document), leading to seven original publications: five from the NICE [34–38] and two from ISPOR [2,17]. Data extraction included key methodological elements affecting internal validity: definition of health state, choice of instrument, mode of administration, features of assessment (e.g., timing and frequency), choice of respondents, sampling, method of recruitment, response rate, missing data, study design and analytical techniques.

None of the seven publications provide a framework for investigators seeking to estimate HSU in a real-world setting (data extraction is summarized in the Supplementary Table 1). Three documents discuss the NICE reference case and alternative methods for measuring HSU, indicating preferred choices of instruments and respondents [34,35,38]. Two reports focus on secondary research, providing a systematic methodology to incorporate HSU obtained from the literature into economic models. These two reports provide a framework for the identification, review and synthesis of HSU, noting that original sources should be reviewed for data quality and relevance. Response rate and missing data are presented as part of the minimum quality checks [2,37]. A review by the NICE Decision Support Unit shows that reporting standards relating to basic information on regression models used in decision analytic models are poor, and provides recommendations for improvement [36]. The only publication providing an exhaustive framework for primary research on HSU is the ISPOR Task Force Report on the collection of HSU for economic models in clinical trials [17]. However, as the report mainly focuses on clinical trials, some aspects, like design and analytical considerations for RWE studies, are only briefly discussed. Overall, only a few methodological elements are extensively addressed: the health state definition for utility estimation should be representative of the economic model health state, the preferred method for HSU generation is generally a generic preference-based measure (PBM), and respondents should usually be patients themselves and be representative of the economic model population. Practical advice to avoid bias in HSU generation, particularly selection bias and confounding, is limited. Therefore, a second search of the HTA publications was conducted to identify good practices for observational research (regardless of type of outcomes) in order to complement recommendations for HSU collection. This was a necessary step to gather practical guidance on study conduct in a real-world setting. Seven publications providing recommendations to avoid bias were identified: one from NICE [39] and six from ISPOR [40–45].

As there is no single guideline for investigators seeking guidance on methodological steps to estimate HSU in a real-world setting, recommendations for HSU generation and for observational research were combined together to develop a concise toolbox. The toolbox recommendations aim to help investigators design and conduct a study as well as avoid methodological flaws. The toolbox is organized into four sections, as follows: the first section discusses confounding, the second and third sections address selection bias (study sample and missing data respectively) and the last section discusses information bias. Existing recommendations were summarized in each section of the toolbox, and were then synthesized into practical steps.

Results

The toolbox summarizing the practical steps for HSU generation in a real-world setting is presented in Box 1.

Box 1. Toolbox to generate health state utilities in a real-world setting.

Confounding: study design and analytical techniques

•

Justify the choice of study design

•

Explain how confounding variables are identified

•

Describe how confounding variables are measured and the risk of residual confounding

•

Control for confounding by design or through analytical methods

•

Assess basic statistical assumptions and conduct sensitivity analyses

•

Explore the risk of unmeasured confounding

Selection bias: study sample

•

Determine sample to ensure representativeness and relevance

•

Select participants by probability sampling or in a systematic manner

•

Implement similar enrollment procedures at all study sites and for each exposure group

•

Document participation rates to discuss risk of selection bias

Selection bias: missing data

•

Take steps to minimize missing data

•

Conduct a descriptive analysis of missing data

•

Implement a method to handle missing data

Information bias: exposure and outcome measurement

•

Justify the choice of instrument

•

Implement standardized procedures for exposure and outcome measurement

•

Select features of assessment to ensure efficiency and suitability

Confounding: study design & analytical techniques

Justify the choice of study design

An important step is to choose between a cross-sectional design or a longitudinal design (e.g., before-after study and prospective study) [40]. Trade-offs of different study designs (e.g., speed, cost, quality, relevance) should be discussed to ensure the study provides useful evidence and is fit for purpose [40,45]. A well-known limitation of longitudinal studies is nonrandom missing data [17]. Conversely, cross-sectional data may provide biased estimates due to underlying heterogeneity across patients [17]. Indeed, there is evidence that a cross-sectional design may overestimate the impact of a health state on utility [17]. However, it is often impractical to wait for patients to progress through different health states [17]. The nature of the condition and health state (e.g., acute/episodic vs chronic disease, severity within a single health state and speed of progression) will influence the applicability of the different designs.

Explain how confounding variables are identified & measured, & describe the risk of residual confounding

At the design stage, potential confounding domains should be identified based on existing literature and/or expert opinion [39,42,44]. Adjusting for intermediate and collider variables is inappropriate and causes bias [39,44]. The use of directed acyclic graphs (causal models that contain directed edges [arrows], linking nodes [variables] and their paths) can help to clarify the structural relationship among variables (confounder, mediator and collider) [43,44]. Naturally, to limit residual confounding, measures of the confounding variables should be valid and reliable.

Control for confounding by design or through analytical methods

Without the advantage of randomization, all observational study designs are vulnerable to time-invariant and/or time-varying confounding [39–45]. Study designs using patients as their own controls (i.e., individual exposure varies) protect against time-invariant confounding [40,41,44]. Any studies without a comparison group are vulnerable to confounding by secular trends or natural history (historical bias) [40,41,45]. Methods controlling for confounding most commonly include matching, restriction, stratification, multivariate regression analysis and propensity score analysis [39,40,42–45]. Methods for dealing with time-varying confounding (affected by prior exposure) include inverse probability weighting of marginal structural models and g-estimation [40,43,44].

Assess basic statistical assumptions & conduct sensitivity analyses

Although regression analysis is common practice in observational studies, it can lead to biased estimates if improperly performed [39,44,45]. Certainly, the functional form of the dependent variable drives the choice of the regression model (along with number of measures and/or clustering) [39,44,45]. However, it may not be an easy choice considering the unique nature of HSU (left-skewed, censored and ceiling effect) [17]. One option is to transform HSU to the decrement scale (right-skewed), where the back transformation to the original scale does not impact standard errors [17]. Certainly, the original HSU distribution should be displayed using a histogram as a starting point for model selection [36]. The widely used ordinary least squares may not be the best option, and different model types might need to be tested [36,39,44,45]. For longitudinal studies, a common choice is generalized estimating equations [17,44]. As for independent variables, the choice of their inclusion into the model should be justified [36,39,44]. When possible, controlling for baseline utility is recommended [17]. It is also recommended to present the full regression model, not only the adjusted effect [42,44]. Coefficients should be presented along with their uncertainty parameters (e.g., standard errors and confidence intervals) [36,42]. The statistical assumptions that underlie the regression analysis should be tested (e.g., functional form, normality, heteroscedasticity, multicollinearity and overlap) [36,39,42,44,45]. Model performance and goodness of fit should be reported [36,42,44,45]. Key assumptions should be tested in sensitivity analyses, such as distribution of covariates (e.g., creation of categories, cutoffs), functional form (e.g., assumption of nonlinearity) and outliers (e.g., inclusion or exclusion) [36,39,42,44,45].

Explore the risk of unmeasured confounding

Unmeasured confounding may remain after adjustment, mainly due to unmeasured known confounders and unknown confounders. High-dimensional propensity score, additional adjustment (e.g., two-stage sampling and external adjustment) and instrumental variable analysis are proposed approaches to address unmeasured confounding [44]. The use of negative controls (outcomes not impacted by intervention) helps explore the presence of unmeasured confounding [40,42,43]. To substitute qualitative discussions of unmeasured confounding, sensitivity analyses should be performed [42,44]. Simple options include the array approach (explores how the observed association changes by varying the covariates distribution and the strength of association) and the rule-out approach (determines the strength of unmeasured confounding needed to fully explain away the observed association) [44].

Selection bias: study sample

Determine sample to ensure representativeness & relevance

Except for special patient populations (e.g., young children and cognitively impaired), patients should rate their own health [2,17,34,35,37,38]. At design, the choice of eligibility criteria (e.g., disease severity, comorbidities and therapies) and study sites (e.g., primary, secondary and tertiary care) will directly impact generalizability and relevance for economic modeling [2,17,34–38]. To ensure the representativeness of an economic model, sampling should allow reflecting individual variability (i.e., patient heterogeneity) and variability over time (e.g., disease progression) within each health state [17]. The method of recruitment will also affect the representativeness, where an online sample (e.g., social media, recruitment panels and patient associations) may differ (e.g., sociodemographic and health profiles) from a sample of patients recruited through clinical sites [17]. The need to access detailed medical information certainly influences the choice of sample [17].

Select participants systematically, implement standardized enrollment procedures, & document participation rates to discuss risk of selection bias

For HSU generation, a significant threat to internal validity is self-selection bias (e.g., volunteer bias, nonresponse bias). Most often, healthier patients are more likely to participate, which may bias estimates for severe health states [17]. To limit the impact of self-selection bias, patients (exposed and unexposed) should be selected from the same population. Moreover, although information on nonparticipants is often limited, participation rates have to be recorded to document the risk of selection bias [2,17,37,42]. To select the study sample, there are several sampling techniques available. Although probability sampling is the gold standard to ensure sample representativeness, convenience samples are common practice for HSU generation in a real-world setting. To minimize selection bias, systematic participant recruitment such as consecutive sampling should be used [40].

Selection bias: missing data

Take steps to minimize missing data, & conduct a descriptive analysis of missing data

Minimizing missing data by taking preventive steps (e.g., reminders, incentives and multiple attempts) is the best way to limit bias [17,40]. The extent of missing data should be reported by outcome, exposure group and timepoint when applicable [2,17,36,37,42,44]. For longitudinal studies, missing data pattern should be described (i.e., monotonic [i.e., no return after dropout] or non-monotonic [i.e., intermittent missing data]) [17]. Possible reasons for missing data should be discussed or baseline predictors should be explored [2,17,37,42].

Implement a method to handle missing data

One should explicitly report how missing data are handled [2,17,36,37,42,44]. There are several techniques to deal with missing data, including complete case analysis (CCA) (i.e., restricting analysis to patients without missing data). Drawbacks of CCA are reduced sample size and biased estimates if data are not missing completely at random (MCAR) [44]. Restricting to CCA when data are not MCAR will overestimate HSU, representing a healthier subsample of the target population [17]. The appropriate choice of the analytical approach to deal with missing data (e.g., inverse probability weighting, multiple imputation and Bayesian methods) depends on missing data pattern and mechanism assumption (i.e., MCAR, missing at random or missing not at random). Standard imputation such as last observation carried forward is not recommended for HSU not MCAR [17].

Information bias: exposure & outcome measurement

Justify the choice of instrument

The use of a validated instrument is recommended [2,17,34,35,37,38]. Generic PBM are preferred over condition-specific PBM (and over direct valuation), unless there is good evidence that generic measures are not able to measure changes in health [2,17,34,35,37,38]. When selecting a generic PBM, one should consider the psychometric assessment in the particular health condition and requirements from HTA agencies [2,17,34,35,37,38].

Implement standardized procedures for exposure & outcome measurement, & select features of assessment to ensure efficiency & suitability

The health state for utility estimation should be clearly defined and representative of the economic model health state [2,17,34,35,37,38]. It is not uncommon that the exposure is self-reported (rather than confirmed with medical records), which may impact categorization into health states depending on the complexity of the definitions (e.g., when progression through health states is defined by a laboratory or radiological marker) [17]. Although blinding is the optimal way to protect against information bias, patients are usually aware of their health condition at utility measurement. When applicable, one option is to schedule the timing of assessment before the patient is informed of disease evolution (e.g., at the beginning of a medical visit before receiving progression results), where standardization of the timing would be essential if other measures (e.g., test results) can affect HSU [17]. Frequency of assessments should also be carefully considered and adapted to disease progression, changes, events and duration [17]. One should anticipate the needs of a future economic model and optimize measurement accordingly (e.g., capture changes within and between health states and capture transient effects) [17]. The recall period of a questionnaire should also be considered when planning assessments, particularly when health states fluctuate over a short period of time (e.g., acute events) [2,17]. Moreover, one should choose the most suitable mode of administration (e.g., paper questionnaire, electronic questionnaire and interview), as they offer different advantages and disadvantages, which may best suit different populations and conditions [17]. To minimize the risk of bias, data collection, including mode of administration, should be standardized [17].

Discussion

A review of guidelines published by HTA organizations was conducted to identify standards for HSU generation. Existing recommendations mainly discuss the choice of instruments and respondents. Practical advice to design and conduct a RWE study collecting HSU is limited. Currently, only guidance documents for observational research provide methodological steps to increase internal validity, which are adapted for comparative effectiveness research. Without specific recommendations, RWE studies may generate poor quality HSU data, which can affect the accuracy of assessment of cost–effectiveness. The recent ISPOR report on the identification, review and use of HSU in CUA highlights that the possible bias introduced into CUA may occur at different stages: searching, selecting, synthesizing and use in the model [2]. Considering the numerous steps that can lead to bias, a good starting point to ensure the validity of CUA is the quality of original sources (i.e., studies generating HSU estimates). Although it is recognized that the quality of HSU inputs can ultimately impact reimbursement decision making, none of the HTA organizations have published recommendations for investigators collecting HSU in a real-world setting. Existing guidelines focus either on HSU generation or on RWE, but a single framework is lacking. Therefore, the purpose was to develop a toolbox combining the different steps to help investigators avoid bias. The toolbox recommendations build on existing guidance, and address important aspects such as confounding, selection bias and information bias. A limitation of the toolbox is that it may not cover all potential issues arising from HSU studies, as it is designed to be concise and practical for common investigators. Moreover, the toolbox is neither a protocol checklist nor a critical appraisal tool. The toolbox focuses on a narrow scope, and does not cover important topics, such as the conduct of economic evaluations, modeling techniques and decision-making processes. For in-depth discussion on a broad range of issues in HTA, further reading is available [33]. Although toolbox recommendations are supported by authoritative references, the next step would be that an international organization as ISPOR publishes an official framework.

Conclusion

The proposed toolbox provides practical steps to help investigators perform good quality research. It is specifically designed for HSU generation in a real-world setting and is complementary to existing good practices. The toolbox aims at increasing the credibility of HSU data for future reimbursement decision making.

Future perspective

In a resource-constrained healthcare system, CUA is essential to help decision making. Although clinical inputs for CUA are generally collected in RCTs, HTA agencies/regulatory bodies have shown growing interest in real-world comparative effectiveness research. Methodological principles for RWE studies have gained prominence in guidelines, and are highly topical in the era of ‘big data.’ Considering the opportunities created by RWE studies, a rising uptake is to be expected. Pitfalls of RWE studies are well-known, yet guidelines for pharmacoeconomics outcomes as HSU provide limited advice. Practical recommendations for investigators would help generate high quality estimates for CUA. Standardization of RWE studies collecting HSU has yet to come.

Executive summary

•

Health state utilities (HSU) data collected in real-world evidence (RWE) studies are at risk of bias. This can directly impact the incremental cost–effectiveness ratio and lead to the wrong conclusions on cost–effectiveness.

•

A review of the guidelines published by health technology assessment organizations was conducted. Data extraction included key methodological elements affecting internal validity: definition of health state, choice of instrument, mode of administration, features of assessment, choice of respondents, sampling, method of recruitment, response rate, missing data, study design and analytical techniques.

•

None of the existing guidelines focus on methodologies for HSU generation in real-life practice. Therefore, current recommendations from the different publications were combined together into a single toolbox.

•

The toolbox was developed to help investigators design and conduct a study as well as limit confounding, selection bias and information bias.

•

The toolbox aims at increasing the credibility of HSU data for future reimbursement decision making.

Author contributions

V Lambert-Obry contributed in the conceptualization, writing – original draft preparation. J-P Lafrance, M Savoie and J Lachaine contributed in the conceptualization, writing – reviewing and editing, and supervision.

Financial & competing interests disclosure

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

No writing assistance was utilized in the production of this manuscript.

Supplementary Material

File (supplementary table 1.docx)

Download
24.70 KB

References

Papers of special note have been highlighted as: •• of considerable interest

O'Rourke B, Oortwijn W, Schuller T; International Joint Task Group. The new definition of health technology assessment: a milestone in international collaboration. Int. J. Technol. Assess. Health Care 36(3), 187–190 (2020).