Visualizing the target estimand in comparative effectiveness studies with multiple treatments
Publication: Journal of Comparative Effectiveness Research
Abstract
Aim: Comparative effectiveness research using real-world data often involves pairwise propensity score matching to adjust for confounding bias. We show that corresponding treatment effect estimates may have limited external validity, and propose two visualization tools to clarify the target estimand. Materials & methods: We conduct a simulation study to demonstrate, with bivariate ellipses and joy plots, that differences in covariate distributions across treatment groups may affect the external validity of treatment effect estimates. We showcase how these visualization tools can facilitate the interpretation of target estimands in a case study comparing the effectiveness of teriflunomide (TERI), dimethyl fumarate (DMF) and natalizumab (NAT) on manual dexterity in patients with multiple sclerosis. Results: In the simulation study, estimates of the treatment effect greatly differed depending on the target population. For example, when comparing treatment B with C, the estimated treatment effect (and respective standard error) varied from -0.27 (0.03) to -0.37 (0.04) in the type of patients initially receiving treatment B and C, respectively. Visualization of the matched samples revealed that covariate distributions vary for each comparison and cannot be used to target one common treatment effect for the three treatment comparisons. In the case study, the bivariate distribution of age and disease duration varied across the population of patients receiving TERI, DMF or NAT. Although results suggest that DMF and NAT improve manual dexterity at 1 year compared with TERI, the effectiveness of DMF versus NAT differs depending on which target estimand is used. Conclusion: Visualization tools may help to clarify the target population in comparative effectiveness studies and resolve ambiguity about the interpretation of estimated treatment effects.
Tweetable abstract
#Visualization of the target #Estimand helps to interpret the generalizability of comparative effectiveness results in real-world evidence studies. Read our paper to find out more!
Plain language summary: An accessible way to visualize to whom study results apply when the benefits of multiple treatments are compared
What is this article about?
A patient with a chronic disease such as multiple sclerosis often faces multiple options for treatment, which is why studies comparing more than 2 treatments are frequent yet harder to conduct. This is because when comparing only 2 treatments at a time and attempting to draw conclusions about all treatment options, there is a risk of mixing oranges with apples; a comparison of treatments A and B may apply to a certain group of patients while one comparing treatments A and C applies to another group with different characteristics, such as age or clinical values. In this article, we first help the readers understand the impact of this problem by using simple visualizations. Then, how to face this situation and understand which patients will benefit from the findings of such study? We use the same visualization tools to help clarify which patients are concerned with the results from a study.
What were the results?
First, we create artificial data using established statistical techniques and use two visualizations to showcase how the group of patients to whom study results apply changes according to which treatments (A and B, or A and C) are compared. This first part is what we call a toy example, because we create data simply to help the reader understand the problem and explain how to use the visualizations. Second, we use the same visualizations to tackle a real research problem: how do teriflunomide (TERI), dimethyl fumarate (DMF) and natalizumab (NAT) affect manual dexterity in patients with multiple sclerosis? We find that, depending on whether TERI and DMF, TERI and NAT, or DMF and NAT are compared, the results in terms of manual dexterity apply to patients of different ages and disease durations. In particular, we find that DMF and NAT improve manual dexterity compared with TERI overall, but that conclusions between DMF and NAT differ depending on what group of patients are considered in the analysis.
What do the results of the study mean?
The results of this article provide value to researchers and patients in two ways: (1) they help understand a difficult and often imperceptible problem relative to interpreting study findings when comparing multiple treatments and (2) they show how simple visualizations can be used to clarify to whom results about the benefit of different treatment options apply. If all researchers were to use the visualizations in their own study, results about the comparison of different treatment options in the medical literature would be easier to interpret and to connect with other studies, ultimately helping patients and clinicians better treat diseases.
Supplementary Material
File (appendix.docx)
- Download
- 887.58 KB
References
1.
European Medicines Agency. Data Analysis and Real World Interrogation Network (DARWIN EU) (2021). https://www.ema.europa.eu/en/about-us/how-we-work/big-data/data-analysis-real-world-interrogation-network-darwin-eu
2.
United States Food and Drug Administration. Real-World Evidence (2022). https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence
3.
Cui ZL, Hess LM, Goodloe R, Faries D. Application and comparison of generalized propensity score matching versus pairwise propensity score matching. J. Comp. Eff. Res. 7(9), 923–934 (2018).
4.
Samuel M, Batomen B, Rouette J et al. Evaluation of propensity score used in cardiovascular research: a cross-sectional survey and guidance document. BMJ Open 10(8), e036961 (2020).
5.
Yao XI, Wang X, Speicher PJ et al. Reporting and guidelines in propensity score analysis: a systematic review of cancer and cancer surgical studies. J. Natl Cancer Inst. 109(8), djw323 (2017).
6.
Stuart EA. Matching methods for causal inference: a review and a look forward. Stat. Sci. 25(1), 1–21 (2010).
7.
Austin PC. A comparison of 12 algorithms for matching on the propensity score. Stat. Med. 33(6), 1057–1069 (2014).
8.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983).
9.
Simoneau G, Pellegrini F, Debray TP et al. Recommendations for the use of propensity score methods in multiple sclerosis research. Mult. Scler. 28(9), 1467–1480 (2022).
10.
Imai K, Van Dyk DA. Causal inference with general treatment regimes: generalizing the propensity score. J. Am. Stat. Assoc. 99(467), 854–866 (2004).
11.
Imbens GW. The role of the propensity score in estimating dose-response functions. Biometrika 87(3), 706–710 (2000).
12.
Kalincik T, Brown JWL, Robertson N et al. Treatment effectiveness of alemtuzumab compared with natalizumab, fingolimod, and interferon beta in relapsing-remitting multiple sclerosis: a cohort study. Lancet Neurol. 16(4), 271–281 (2017).
13.
Lunt M, Solomon D, Rothman K et al. Different methods of balancing covariates leading to different effect estimates in the presence of effect modification. Am. J. Epidemiol. 169(7), 909–917 (2009).
14.
Kahan BC, Cro S, Li F, Harhay MO. Eliminating ambiguous treatment effects using estimands. Am. J. Epidemiol. 192(6), 987–994 (2023).
15.
International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). ICH Harmonised Guideline Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical Principles for Clinical Trials E9(R1) (2019). https://database.ich.org/sites/default/files/E9-R1_Step4_Guideline_2019_1203.pdf
16.
Lundberg I, Johnson R, Stewart BM. What is your estimand? Defining the target quantity connects statistical evidence to theory. Am. Sociol Rev. 86(3), 532–565 (2021).
17.
US Food and Drug Administration. E9(R1) Statistical Principles for Clinical Trials: Addendum: Estimands and Sensitivity Analysis in Clinical Trials (2021). https://www.fda.gov/regulatory-information/search-fda-guidance-documents/e9r1-statistical-principles-clinical-trials-addendum-estimands-and-sensitivity-analysis-clinical
18.
European Medicines Agency. ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. Report No.: EMA/CHMP/ICH/436221/2017. European Medicines Agency, Amsterdam, The Netherlands (2020).
19.
Luijken K, Van Eekelen R, Gardarsdottir H, Groenwold RHH, Van Geloven N. Tell me what you want, what you really really want: estimands in observational pharmacoepidemiologic comparative effectiveness and safety studies. Pharmacoepidemiol. Drug Saf. 32(8), 863–872 (2023).
20.
Ho M, van der Laan M, Lee H et al. The current landscape in biostatistics of real-world data and evidence: causal inference frameworks for study design and analysis. Stat. Biopharm. Res. 15(1), 43–56 (2021).
21.
Ziemann S, Paetzolt I, Grüßer L, Coburn M, Rossaint R, Kowark A. Poor reporting quality of observational clinical studies comparing treatments of COVID-19 – a retrospective cross-sectional study. BMC Med. Res. Methodol. 22(1), 23 (2022).
22.
Ioannidis JP, Haidich AB, Pappa M et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA 286(7), 821–830 (2001).
23.
Lin J, Yu G, Gamalo M. Matching within a hybrid RCT/RWD: framework on associated causal estimands. J. Biopharm. Stat. 33(4), 439–451 (2023).
24.
Lopez MJ, Gutman R. Estimation of causal effects with multiple treatments: a review and new ideas. Stat Sci. 32(3), 432–454 (2017).
25.
Rassen JA, Shelat AA, Franklin JM, Glynn RJ, Solomon DH, Schneeweiss S. Matching by Propensity Score in Cohort Studies with Three Treatment Groups. Epidemiology 24(3), 401–409 (2013).
26.
Yoshida K, Hernández-Díaz S, Solomon DH et al. Matching weights to simultaneously compare three treatment groups: comparison to three-way matching. Epidemiology 28(3), 387–395 (2017).
27.
Li F, Li F. Propensity score weighting for causal inference with multiple treatments. Ann. Appl. Stat. 13(4), 2389–2415 (2019).
28.
Scotina AD, Beaudoin FL, Gutman R. Matching estimators for causal effects of multiple treatments. Stat. Methods Med. Res. 29(4), 1051–1066 (2020).
29.
Scotina AD, Gutman R. Matching algorithms for causal inference with multiple treatments. Stat. Med. 38(17), 3139–3167 (2019).
30.
Sävje F, Higgins MJ, Sekhon JS. Generalized full matching. Polit. Anal. 29(4), 423–447 (2021).
31.
Degtiar I, Rose S. A review of generalizability and transportability. Annu. Rev. Stat. Appl. 10(1), 501–524 (2023).
32.
Karim ME, Pellegrini F, Platt RW, Simoneau G, Rouette J, de Moor C. The use and quality of reporting of propensity score methods in multiple sclerosis literature: a review. Mult. Scler. 28(9), 1317–1323 (2022).
33.
Prosperini L, Saccà F, Cordioli C et al. Real-world effectiveness of natalizumab and fingolimod compared with self-injectable drugs in non-responders and in treatment-naïve patients with multiple sclerosis. J. Neurol. 264(2), 284–294 (2017).
34.
Braune S, Grimm S, van Hövell P et al. Comparative effectiveness of delayed-release dimethyl fumarate versus interferon, glatiramer acetate, teriflunomide, or fingolimod: results from the German NeuroTransData registry. J. Neurol. 265(12), 2980–2992 (2018).
35.
Vollmer BL, Nair K, Sillau S, Corboy JR, Vollmer T, Alvarez E. Rituximab versus natalizumab, fingolimod, and dimethyl fumarate in multiple sclerosis treatment. Ann. Clin. Transl. Neurol. 7(9), 1466–1476 (2020).
36.
Mowry EM, Bermel RA, Williams JR et al. Harnessing real-world data to inform decision-making: Multiple Sclerosis Partners Advancing Technology and Health Solutions (MS PATHS). Front. Neurol. 11, 632 (2020).
37.
Rotstein D, Montalban X. Reaching an evidence-based prognosis for personalized treatment of multiple sclerosis. Nat. Rev. Neurol. 15(5), 287–300 (2019).
38.
Heinz P, Wendel-Garcia PD, Held U. Impact of the matching algorithm on the treatment effect estimate: a neutral comparison study. Biom. J. 66(1), 2100292 (2024).
Information & Authors
Information
Published In
Copyright
© 2024 The Authors. This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License
History
Received: 1 June 2023
Accepted: 3 January 2024
Published online: 23 January 2024
Keywords:
Topics
Authors
Metrics & Citations
Metrics
Article Usage
Article usage data only available from February 2023. Historical article usage data, showing the number of article downloads, is available upon request.
Citations
How to Cite
Visualizing the target estimand in comparative effectiveness studies with multiple treatments. (2024) Journal of Comparative Effectiveness Research. DOI: 10.57264/cer-2023-0089
Export citation
Select the citation format you wish to export for this article or chapter.
Citing Literature
- Peter C. Austin, David E. Austin, The performance of different propensity score methods for estimating the effects of multiple treatments or exposures: a neutral comparison study, BMC Medical Research Methodology, 10.1186/s12874-026-02831-2, (2026).
- John G. Rizk, Giuseppe Lippi, Carl J. Lavie, Beautiful weights, misinterpreted effects: the use and misuse of overlap weighting in major medical journals, 2020–2025, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2025.112113, 191, (112113), (2026).
