Real-world evidence: state-of-the-art and future perspectives
Publication: Journal of Comparative Effectiveness Research
Abstract
Recent developments in digital infrastructure, advanced analytical approaches, and regulatory settings have facilitated the broadened use of real-world evidence (RWE) in population health management and evaluation of novel health technologies. RWE has uniquely contributed to improving human health by addressing unmet clinical needs, from assessing the external validity of clinical trial data to discovery of new disease phenotypes. In this perspective, we present exemplars across various health areas that have been impacted by real-world data and RWE, and we provide insights into further opportunities afforded by RWE. By deploying robust methodologies and transparently reporting caveats and limitations, real-world data accessed via secure data environments can support proactive healthcare management and accelerate access to novel interventions in England.
Plain language summary
What is this article about?
Research involving real-world data (RWD) from patients, which most commonly comprises routine electronic health records, differs from clinical trials but should be held to equally high standards. When conducted properly, RWD research produces real-world evidence (RWE), which can greatly inform on health areas like chronic conditions such as neurological disorders or metabolic disorders. This article provides an overview of impacts and opportunities from RWD data sources in England, by describing exemplars across different health areas. Further, the authors evaluated recent technology appraisals submitted to the National Institute for Health and Care Excellence, to understand how RWE has been utilized. The article also provides recommendations to minimize bias and ensure robust transportability of outcomes to the target patient population.
What were the results?
We provide numerous references of how RWE has increased our knowledge of infectious diseases, neurological disorders, metabolic disorders, as well as mental health conditions. Our evaluations of technology appraisals revealed that RWD is prominently used to inform on cost–effectiveness for innovative technologies, relative to clinical effectiveness.
What do the results of the study mean?
This paper explains how better linked and higher quality data sources, when interrogated using best practices and robust study designs, provide researchers with tools with enormous potential to empower outcomes that benefit patients and strengthen the healthcare system.
The last decade has unleashed ‘big data’ as a valuable tool for informing and optimizing healthcare [1]. Increasingly, research using RWD is being carried out through case–control and longitudinal retrospective studies, as well as prospective follow-up and observational studies, to output descriptive findings, inferential statistics, and models to predict outcomes of interest or forecast demand–capacity utilization for healthcare services. RWE studies can also be designed to provide data captured in parallel with randomized clinical trials (RCTs) to validate clinical data or serve as late-stage evaluations for post-market surveillance.
The aim of this perspective is to summarize the current state-of-the-art of RWE in England, to provide case studies that can inspire further interest and access that broad sources of RWD, by describing case studies in different health and disease areas, selected among the National Institute for Health and Care Excellence (NICE) clinical knowledge summary specialties [2]: infections and infestations, endocrine and metabolic, neurological, cancer, mental health and preventative medicine (Figure 1), left. We also discuss the limited extent to which routine RWD has been used to support access to new treatments through the NICE Technology Appraisal (TA) program, emphasize the challenges when establishing causal inference and the inherent limitations of RWD, and we reflect upon compelling opportunities presented by RWE.

Figure 1. (left) Real-world evidence may inform on a spectrum of health conditions and diseases.
(right) Key changes in the technology appraisal process to support faster access to promising health technologies in England.
Historical context of RWD & RWE
Although RCTs remain the gold-standard for evidence of clinical effectiveness for the National Health Service (NHS) in England, RWE studies are essential for understanding effectiveness for patients in uncontrolled conditions and can overcome several drawbacks of RCTs. RWE studies can mitigate exorbitant costs (covering clinical and administrative staff, clinical trial site(s), supplies, participant support, monitor charges and insurance), avoid time-limited follow-up, and can offer alternatives to clinical trials that may lack patient representativeness or heterogeneity [3]. RWE may also be used instead of RCTs to demonstrate the effect of an intervention where randomization is unethical or infeasible.
Since 2010, regulatory bodies such as the Medicines and Healthcare Products Regulatory Agency (MHRA), NICE and the US FDA have encouraged the wider use of RWE (Figure 1) in researching the adoption and regulatory approval of novel interventions [4]. In 2022, the Innovative Medicines Fund was launched to support early access to noncancer medicines, when further RWD is collected to resolve remaining uncertainty about clinical or cost–effectiveness [5].
Despite the growing prominence of RWD, guidance for researching RWD remain fragmented, definitions of RWE vary among regulatory bodies such as the FDA and NICE, and discrepancies and inconsistencies of RWE may be generated from a wide range of sources and settings (Box 1).
Globally, definitions of real-world data (RWD) and evidence (RWE) have varied widely. Inconsistent use of these terms can cause confusion and impede evidence synthesis. We provide working definitions below, based on the NICE RWE Framework, and we further distinguish between routinely versus prospectively collected data and experimental versus nonexperimental settings.
| Routine real-world data (RWD) | Data from real world settings |
|---|---|
| Data relating to patient health status and/or the delivery of healthcare routinely collected from nonexperimental settings (e.g., electronic healthcare records; medical notes; audit and service evaluation; administrative and claims data) | Data relating to health status and/or the delivery of healthcare that has been routinely or prospectively collected in a noninterventional/noncontrolled experimental setting (e.g., observational cohort; single-arm trials) |
| Prospective real-world data (RWD) | Real world evidence (RWE) |
| Data relating to patient health status and/or delivery of healthcare that has been prospectively collected from nonexperimental settings (e.g., health surveys or interviews; patient registries) | Evidence generated from the analysis of real-world data or data collected in real-world settings |
As of 2024, the landscape of RWE sources has matured into a complex matrix. Access to routine RWD sources for research purposes is now supported in England through the growing set of national and subnational datasets linked across health and social care settings [8]. From 2024, the UK’s NHS is federating data across sub regional secure data environments (SDEs) to enable access to this valuable resource according to the NHS Value Sharing Framework [9]. Analysis of RWD, often in the form of de-identified electronic health records (EHRs) resident within safe repositories and large data warehouses, can comprise descriptive or inferential statistical approaches, while more sophisticated and complex analytical tools are available to researchers to predict outcomes [10]. For example, propensity score models can be used to match cohorts with similar distributions of baseline characteristics, imitating randomization of clinical trials [11]. Evaluations of robustly matched cohorts can enable inferences of risk ratios, odds ratios and adverse outcomes. This perspective reviews exemplars of such analysis of RWE among the selected specialty clinical areas most representative by NICE TA submissions since 2022. In addition, we elaborate on the substantive challenge for RWD researchers when designing a study capable of supporting specific causal inference using RWD in the section ‘Causal Inference’.
Infections & infestations
RWE was prominently used to protect global health during the COVID-19 pandemic. Following the publication of the genome of SARS-CoV-2, the causative agent of COVID-19, on 10 January 2020, industrial and academic vaccine manufacturers began rapidly progressing laboratory and clinical testing to ascertain the best performing vaccine candidates [12]. These efforts were justified based on global RWD trackers of COVID-19 infections, established in early 2020. Phase III trials for COVID-19 vaccine candidates from BioNTech/Pfizer, Moderna, Astra Zeneca and Johnson and Johnson began in April 2020 (29 April 2020: BioNTech/Pfizer; 27 July 2020: Moderna; 28 August 2020: Astra Zeneca; and 7 September 2020: Johnson and Johnson), and the first approval for use outside of clinical trials was seen in the UK in early December 2020 [13]. As an example of how RWE can inform on rare diseases: RWD analysis discovered rare events of cerebral venous sinus thrombosis in combination with thrombocytopenia following ChAdOx1 nCoV-19 vaccination from March 2021 [14]. At rates ranging from one case per 26,000–127,000 [15], monitoring for this proved unfeasible using clinical trials of relatively smaller sample sizes (i.e., 21,635 participants received the Astra Zeneca vaccine AZD1222 in the phase III trial), and RWD was invaluable for safety monitoring. Within the UK, changes to the recommended use of COVID-19 vaccines based on adenoviral vectors were prompted and resulted in changes to recommended administration to younger populations, as early as 7 April 2021 [16].
Clinical trials may not be feasible in some scenarios, such as during the rapid evolutionary timeline of a mutating pathogen affecting large populations. In such circumstances, RWD can be collected to retrieve effectiveness insights of several new technologies, such as vaccines designed for specific SARS-CoV-2 variants. RWE evaluated for technologies targeting the BA.2 Omicron variant (predominant globally from January to July 2022) has proved invaluable to understand vaccine effectiveness in infection-naive populations [17] as well as providing comparisons across various antiviral treatments [18].
Despite the successes made possible by RWD and RWE studies during the COVID-19 pandemic, the full power of RWE can only be leveraged within a streamlined mechanism to provide access to data while maintaining data security and privacy. However, the rapid expansion of RWE research has not addressed challenges, including lack of standardization in data capture, geographical differences in availability of data for research use, cost and availability of technology infrastructure, absence of unique patient identifiers leading to duplication of records or restricting data linkage, challenges of data sharing [19], as well as acceptability of data transportability methods, which apply result(s) based on data from one population to estimate the result(s) for another population by adjusting for relevant differences in demographic, clinical and/or other factors between the two populations [20]. Coordinated developments of data infrastructures, such as adopting standardized data formats (e.g., Observational Medical Outcomes Partnership Common Data Model) will expand and strengthen possibilities of gathering insights based on epidemiological data.
From April to May 2022, 528 cases of human mpox virus infection were reported from 16 countries outside of Africa, where this virus is endemic [21]. In July 2022, the WHO declared mpox a public health emergency. Networked or socialized healthcare systems in Israel were able to perform rapid analyses to direct public health measures. By July 2022, RWD from Clalit Health Services in Israel provided an opportunity to evaluate effectiveness of one dose of JYNNEOS (Modified vaccinia Ankara–Bavarian Nordic) in high-risk groups, where univariate and multivariate Cox proportional hazards regression models with time-dependent covariates indicated an 86% effectiveness against infection, adjusted for sociodemographic (i.e., living in the Tel Aviv district) and clinical risk factors (e.g., HIV/AIDS history, HIV-PrEP use, syphilis infection and chlamydia infection) [22].
In early 2023, RWD surveillance detected a resurgence of measles cases from early 2023 through early 2024 [23,24]. These RWD supported rapid public health messaging to promote MMR vaccinations in young children, as in England, coverage of two MMR doses at age 5 years is around 85%, well below the WHO 95% vaccine coverage target. Just the year prior, polio virus was detected in sewage samples in London from February to July 2022 [25], and the UK Health Security Agency declared a national enhanced incident response. Inactivated polio vaccine booster campaigns were launched in August 2022, with catch up programs carried out in 2023. This case highlights the importance of routine environmental sampling for mounting a public health response [26]; which could become more powerful if linked to de-identified health records.
As of 2022, 62,600 adults aged over 16 years (95% credible interval: 48,900–77,800) were estimated to be chronically infected with hepatitis C [27], which equates to a prevalence of 0.14% (95% credible interval: 0.11–0.17%). Though curable, when left untreated, chronic hepatitis C can cause liver cancer or liver failure. Using the Discover-NOW dataset covering the population of North West London (NWL), which contains more than 2.8 million patients’ health data, in a safe, secure and trusted research environment, a retrospective study estimated a prevalence of 0.3% across the population cohort in a study period of 31 December 1989–1 December 2023. Such study results can be used to evaluate risk factors and propensity for infections, as well as triage support, raise awareness and inform primary care networks or GP practices to more efficiently provide treatment. Such efforts support the WHO’s and UK Health Security Agency’s goals of eliminating viral hepatitis by 2030.
Though less common, retrospective analysis of bacterial infections is also possible [28], to evaluate community or nosocomial infection routes. More studies of primary or secondary bacterial infections and their risk factors, as well as antibiotic prescription histories, can inform on improved recommendations and use of antibiotics and treatments. RWE studies that elucidate resistance against antibiotics such as carbapenem [29], methicillin [30] or ceftriaxone [31,32], are needed to shape national and international efforts to suppress antimicrobial resistance.
These examples highlight the rapid acceleration of access to RWE that has been facilitated by secure access to routine EHR for population health management and research. However, potential underreporting – even when legally mandated for measles [33] – remains a limiting factor in RWE studies, particularly for infectious diseases when clinical diagnostic reliability declines during periods of low incidence. Likewise, vaccine coverage may be underestimated, potentially caused by including patients having left the area/registry or by incomplete recording of vaccination histories for those moving into the area/registry. Sensitivity analysis can be used to characterize the uncertainty and support more robust interpretations of data [34]; but transparent reporting on the limitations and caveats of RWE studies is crucially important for interpretability and strongly recommended.
Endocrine & metabolic diseases
Analyses of RWD can provide insights in multifactorial health spectra and multimorbid patients. Multimorbidity has become globally prevalent, with increasing rates over the last 2 decades [35]. Between 2022 and 2023, 64% of adults in England were estimated to be overweight or living with obesity [36]. Large scale RWE studies have illuminated relationships between morbidities, to indicate areas for potential early intervention and early care options for health providers. Body mass index for over 2.9 million patients in the UK were grouped and strength of association with 12 obesity-related comorbidities was measured [37]. Using a similar approach, Booth et al. found that hypertension prevalence was twice as high in patients with morbid obesity, suggesting hypertension control in morbid obesity is a key target for future intervention [38].
RWD datasets can also enable large cohorts for analysis. For example, in a sample of more than 150,000 patients with Type 2 diabetes (T2DM), medication patterns were evaluated to provide insight around treatment optimization [39]. In 2020, a risk prediction model was validated using primary care data for over 100,000 patients, which accounting for variables such as sex, ethnicity and lifestyle factors was created to attempt to identify risk of development of T2DM in patients with hyperglycemia [40]. Incidence of T2D per 1000 person years was highest in patients aged between 70 and 79 years, higher in males versus females, and greater if a patient had family history of T2DM. Scalable risk prediction models for multimorbid patients, with features based on demographics, lifestyles, socioeconomic factors and medical histories, embedded within healthcare ecosystems, can become guideposts for clinicians to determine the most effective care package for target patient groups.
Neurological diseases
Digital health technologies (DHTs) are increasingly being used to capture RWD, with examples in neurological diseases. Before approved use for patients, DHTs such as wearable electronic devices supporting epilepsy monitoring [41], regulatory review is required by the MHRA’s software as medical devices regulatory route, whereas the NICE Evidence Standard Framework regulatory route is required for DHTs with system benefits or that help patients understand healthy living and illnesses, preventing and managing diseases or with measurable user benefits.
RWD/RWE research can be challenging in disease areas with delayed onset such as dementia and Alzheimer’s disease (AD), where long-term clinical follow-up has delivered invaluable information [42]. Innovative methods are being explored to exploit RWE made possible from routinely collected data to supplement long-term and resource-intensive clinical trial units [43]. In a study of severe adverse events (SAEs) following onset of AD, Chen et al. via simulated data validated observed similar estimates (8.3% clinical trial rates vs. 8.9% simulated rates) of SAEs, which were mapped from the trial and considered an SAE based on criteria for grade 3/4 (results in hospitalization) or grade 5 (death), when proportional sampling was used to control demographic variables. Notably, the authors found higher rates of SAEs in simulations of RWD trials, based on random sampling approach and 1000 bootstrap sampling. To quantify healthcare resource use for patients with AD dementia, a retrospective study involving 18,116 patients from January 2010 until December 2019, patients were subgrouped based on rapid progressor classification and based on their mean total healthcare resource use cost per patient per year across primary care, secondary care and prescriptions [44]. The top factors most strongly associated with healthcare costs were dying during follow-up, frailty, stroke, heart failure, T2D and cancer. The highest cost quintile comprised 1423 patients (median age 84 years, 55% female) and used healthcare resources costing £13,665 per year, where the highest costs were associated with rapidly progression AD dementia and care home admission.
Cancer
Within England, cancer outcomes are reported regularly to National Cancer Registration and Analysis Service, which are then curated into cancer registration tumor and treatment dataset, the systemic anticancer treatment dataset (SACT) and the National Radiotherapy dataset (RTDS). These data are invaluable for informing on real world outcomes for patients with cancer and use of SACT is mandated for drugs with managed access agreements as part of the cancer drugs fund [6]. As an example of when a clinical trial for metastatic castration-resistant prostate cancer was deemed infeasible, a multinational prospective analysis of RWD from 3003 patients revealed that the efficacy of three major treatments (abiraterone, enzalutamide and docetaxel) for metastatic castration-resistant prostate cancer was similar [45]. In 2024, these data were included as part of a NICE recommendation for treatment.
However, as of 2024, there are significant barriers for researchers to readily access cancer RWD. Lags of 12–18 months are reported for review and approval to either the SACT or RTDS datasets. Streamlining the timelines for accessing such potentially impactful data would progress the capability of RWD researchers to use recent data feeds on research queries involving overlaps of population health management and cancer incidence and treatments, multimorbid profiles and propensity for cancer incidence and progression [46,47], or risk analyses for patients of different health inequalities [48]. A viable recommendation for any health research involving dataset linkages [49], quality control of accuracy and completeness between datasets is recommended to minimize potential bias in data reporting [50], while fully linked data could mitigate lags in access along with minimizing bias [51].
Mental health
Mental health comprises a broad range of conditions, including depression, anxiety, abnormal behavior, abnormal movements, obsessive compulsive disorder, eating disorders, attention deficit hypertensive disorder, autism spectral disorder, schizophrenia, bipolar disorder, self-harm and suicidal thoughts. Mental Health has long last become recognized as an important factor of health. In the NHS, for adults, severe mental illness is one of the five key clinical areas of health inequalities of the CORE20PLUS5 approach to reducing health inequalities, where mental health is one of the five key areas for children. The NHS long term plan also targets ‘significant improvements in mental health’, embedded within the NHS Mental Health Implementation Plan [52]. Mental health is also a contributing factor of health inequities, with associations observed for mental health service demands in more deprived communities [53]. With 75% of all mental disorders emerging before the age of 25 [54], using data to support innovative approaches of improved care is a noble effort. In researching mental health for children, a major issue involves the lack of standardized reporting of mental health data, rendering the service sector not fully informed on patient needs and unmet needs.
Within England, national registries report on prevalence and healthcare outcomes for children with mental health issues, based on semi-quantitative surveys, comprise a small sample size (i.e., 1905 children aged 7–16 years, in 2022), with categorical answers of ‘unlikely to have a disorder’, ‘possible disorder’, or ‘probable disorder’ to assess mental health prevalence (Table 1.1 raw data [55]). While lacking granularity of specific mental health conditions with clinical confirmation, survey studies can inform on likely unmet need for patients not accessing health services or support for mental health issues.
| Target trial framework | Estimands framework | Causal roadmap | |
|---|---|---|---|
| Features | This study design emphasizes seven elements: analysis plan, eligibility criteria, treatment strategies, assignment procedure, causal effect of interest, follow-up period, outcomes | This study design is based on five elements: population, treatment, outcomes variable, population-level summary, handling intercurrent events | This study design proposes an interactive approach toward establishing: causal question/estimand alongside a causal model, observed data, identifiability, statistical estimand, statistical model and estimator, sensitivity analysis and compare design and analysis plans |
| Highlights | Flexible framework that constitutes the most applicable basis for any RWE study, which may be iterated and strengthened by combining with other frameworks | Emphasizes need to construct a robust and specific estimand, which addresses each variable and factor of interest regarding the study’s primary objective and accounts for confounders and bias | Emphasizes need to address bias from the study concept and iteratively addresses mathematical variance and gaps when inferring causality |
Granular, de-identified patient-level data of population studies can inform on inequalities of service use for children accessing mental health services [56]. Comparing geographical disparities and differences in mental healthcare utilization indicates impacts on equity based on socioeconomic deprivation and inequalities across populations. Using the Discover-NOW dataset based in NWL, Lazzarino et al. reported significant differences in odds ratios for children of different ethnicities across the general population, with lower-than-expected use of mental health services utilization for the most deprived children in Ealing (NWL’s most populous borough), Brent and Hillingdon, suggesting that mental health needs of the most disadvantaged groups in these boroughs are not being met. RWD have the potential to pinpoint opportunities to improve or strengthen healthcare policies and services, by resolving queries on patient groups with sufficient specificity.
Preventative medicine
In the coming decades, RWD can be made useful in researching complications associated with our aging population. People are living longer but living with a greater number of problems, with 67.8% of people in England aged over 65 years predicted to have multimorbidities by 2035 [57]. RWE has already been powerfully informative in providing insights on inequalities in healthcare based on age, as well as gender, deprivation and ethnicity. For example, in a 2024 publication of a cross-sectional analysis of 32,905 patients with diabetic kidney disease revealed that people aged over 81 years were less likely to have had measurements for albumin-to-creatinine ratio, blood pressure or hemoglobin A1c, which provides an average value of blood sugar levels over the past 2–3 months; women were less likely to have their hemoglobin A1c measured; patients from the most deprived areas were less likely to have their blood pressure measured; and Black patients were less likely to be prescribed statins compared with White patients [58].
While such insights now allow targeted approaches to improve care for patient groups, RWE can provide insights on healthcare staff. Interrogation of the linked healthcare data on unpaid carers in NWL revealed that unpaid carers were, on average, older females from deprived areas who experienced a higher prevalence of long-term conditions [59]. RWD has the potential to inform on opportunities for supporting care options for patients and carers alike, based on reliable inputs such as socioeconomic status, demographics and geographical locations, which can further promote improvements to population health management.
Use of RWD in health-technology assessments
NICE outlines the methodology for analyzing cost–effectiveness, via either cost-utility or cost-comparing models. Cost-utility models comprise full economic analysis of the innovation relative to comparator(s), measured in quality-adjusted life years (QALYs), nominally using EQ-5D frameworks. A cost-comparison model is another powerful economic model for innovations that can offer similar outcomes to patients but at a similar or lower cost to comparator(s). Cost-comparison analyses do not include comparisons of the health outcomes, which are captured in clinical-effectiveness evidence, but rather comprise cost data consistent with published NICE guidance for comparator(s) [60]. RWD is powerfully informative as the time horizon access from routinely recorded data is often longer than data capture via clinical trials, increasing the probability of capturing important cost or outcome differences.
We have investigated how routine RWD has been incorporated into the NICE single technology appraisal (STA) process since the NIHR Health Technology Evaluation: The Manual and Real World Evidence Framework was published (Supplementary Material). We found that RWD was more frequently used to evaluate cost–effectiveness than clinical effectiveness (Figure 2). RWD was often used to validate assumptions of economic models: for example, TA880 (tezepelumab for treating severe asthma, AstraZeneca) conducted a UK-based real-world study using Clinical Practice Research Datalink and Office for National Statistics data [61]. This highlighted that all-cause mortality in the target patient cohort was considerably higher than their previous modeling, which increased the potential benefit of tezepelumab to patients in health economic models, and ultimately this technology was recommended for use. In contrast, TA777 (solriamfetol for treating excessive daytime sleepiness caused by obstructive sleep apnea, Jazz Pharmaceuticals) used Hospital Episode Statistics data to compare to the rate of adverse events (AEs) including one serious AE (stroke) observed in the original double-blind RCT and subsequent single-arm follow-up trial. This highlighted that AEs were relatively rare in the real-world (compared with the control arm of the pivotal trial). This increased the company’s base case incremental cost–effectiveness ratio, and the technology was not recommended by NICE [62].

Figure 2. Use of routine real-world data in evidence as part of 12 cancer and 63 noncancer National Institute for Health and Care Excellence single technology appraisal submissions.
(A) Number of submissions that were reviewed for use of RWD by specialty, including nine for rare diseases (affecting fewer than 1 in 2000 people); (B) Proportion of submissions that used RWD to demonstrate clinical effectiveness; (C) Proportion of of submissions that used RWD to demonstrate cost–effectiveness.
RWD: Real-world data; TA: Technology appraisal.
These examples highlight how RWE can inform future policy and NICE guidance. Strengthened population health management can create a positive feedback loop, as enabling health equity will lead to more informed health economic modeling of innovative technologies, as QALYs are weighted equally regardless of individual patient characteristics.
Causal inference
When estimating causal effects, drawing causal inference from RWE studies can be possible, when accounting for and transparently reporting several assumptions and addressing bias. Understanding data integrity and provenance, planning robust methodology to minimize bias, completing feasibility assessments, and quantifying data quality are paramount toward conducting a robust research study and delivering reliable results. When inferring causality, the NICE RWE Framework promotes a Target Trial Approach to estimate intervention effects using quasi-experimental and non-randomized designs (Box 1) [3]. This approach outlines a RWD study based on 7 features (Table 1). Another approach, the Estimands Framework, focuses on establishing a specific question, the estimand, that accurately and precisely defines the measurement and methodology for answering the primary objective [63]. Use of either approach, separately or in combination, is recommended to avoid immortal time bias, which arises from improper index event definition, or intercurrent events, which compete with and alter analysis of outcomes of interest. Another approach, the Causal Roadmap, follows a similar design framework though emphasizes the need for accountability for confounding factors and causal gaps throughout the study design process [64]. We recommend that each approach be reviewed for each RWE study, to iteratively improve its design to minimize bias and maximize robustness and quality.
Future perspective
Health researchers are presented with enormous opportunity and potential to impact a wide array of health areas, given the current state of RWD and SDEs. RWE has already impacted on common health areas as well as rare diseases, as standalone research studies, in conjunction with clinical trials, and as supporting evidence during post-marketing surveillance. However, there are still several limitations in conducting RWE studies, such as issues involving data quality and completeness. Raw imaging data and free text GP notes are not often captured in standardized data warehouses, and such gaps in data must be accounted for when carrying out RWE studies. Careful documentation of quality issues is strongly recommended during data preprocessing. Another important aspect is developing study plans that identify biases and address inequalities to minimize the likelihood of class imbalance and maximize the possibility of elucidating population health inequalities [65]. When developing a machine learning model, researchers should assess the plausibility of assumptions for identifiability, as well as critique assumptions of the stable unit treatment value assumption, positivity assumptions and unconfoundedness [66,67].
When looking toward future opportunities, public engagement and input is essential. The Health Data Research UK (HDR UK) established the concept of Five Safes Framework, which aims to establish Safe Data, Safe People, Safe Projects (which was recently updated to Safe Research), Safe Setting and Safe Output [68]. These endeavors were carried out in consideration of patient safety but also to support carer and system needs (e.g., reducing administrative burdens). National and subregional SDEs are being developed across England, with consideration of public input and feedback. Public deliberation on access to SDEs, such as OneLondon’s events with members from the public since 2020, aided in establishing guidelines for carrying out research with sufficient breadth and depth of benefit [69].
An obvious opportunity in pursuing RWE studies regards the availability of advanced analytical tools [43,70,71]. Deep learning techniques are now being successfully applied not only for conversion of unstructured sources (clinical notes/images) into structured data, but also in predictive tasks [72]. Generative artificial intelligence is considered for automating various summarization tasks as well. The advantage of deep learning-based methods is that they can be used for handling high-dimensional and unstructured data, time-series data, or data with high level of missing values. Another important area is data augmentation and synthetic data generation which should significantly enhance collected RWD and improve the performance of predictive models. An important consideration is sustainability, where models must be reviewed and updated following deployment, as data drift is a regular issue facing healthcare predictive models.
Further opportunities will arise when datasets become increasingly standardized and linked. Within England, datasets are becoming safely federated to provide insightful and elucidating information on health issues, while unveiled clinically effective or cost-effective innovations may benefit patients. Through careful and considerate planning of research methodology and tapping into the possibilities of sources within SDEs, researchers are poised to guide policies that return improved and equitable care packages directly to patients. We strongly recommend researchers analyzing RWD develop Protocols with defined statistical approaches to minimize bias [3]. Furthermore, we also recommend that researchers who use RWD report statistical outcomes transparently, appropriately and accurately, as misinterpreted statistical approaches can lead to inaccurately reported outcomes [73,74].
Executive summary
•
Real-world data (RWD) and real-world evidence (RWE) have historically been defined differently by varying global health regulatory bodies.
•
Recently, growing consensus and acceptance of RWD/RWE has seen it rise in prominent use for research purposes and regulatory submissions for innovative technologies.
•
Numerous health areas, such as infectious diseases, neurological diseases, metabolic diseases, mental health conditions, and population health management have greatly benefited from RWE research outcomes.
•
The standards of RWD research must match that of clinical trial research, by minimizing bias and transparently reporting on data provenance and quality, using robust study design and robust statistical interpretation.
•
Causal inference can be studied using RWD, though great care must be taken when designing and conducting the research study, with caveats and limitations routinely reported alongside the research outcomes.
•
Since 2022, technology appraisal submissions to NICE have included several examples of RWE, with more focus on cost–effectiveness research rather than clinical effectiveness.
•
The deployment of the increasing available powerful analytic tools, such as machine learning and artificial intelligence, must be grounded to the specific research question and refined during study design to minimize bias and confoundedness.
•
While study design and statistical planning are paramount, alongside data integrity and quality, patient input and feedback is critically important to inform researchers on addressing patient-centered issues that provide real value-based return.
•
The networking of secure data environments within England will support further data linkage and enable deeper enquiries into relevant patient cohorts.
Financial disclosure
The authors have no financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
Competing interests disclosure
The authors are employed by Imperial College Health Partners and provide services by performing research using the Discover-NOW dataset. The authors have no other competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript apart from those disclosed.
Writing disclosure
No writing assistance was utilized in the production of this manuscript.
Open access
This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Supplementary Material
File (supplementary material.docx)
- Download
- 156.02 KB
References
Papers of special note have been highlighted as: • of interest; •• of considerable interest
1.
Alberto IRI, Alberto NRI, Ghosh AK et al. The impact of commercial health datasets on medical research and health-care algorithms. Lancet Digit. Health 5, e288–e294 (2023).
2.
Specialities | CKS | NICE. (2024). https://cks.nice.org.uk/specialities/
3.
Westreich D, Edwards JK, Lesko CR et al. Target validity and the hierarchy of study designs. Am. J. Epidemiol. 188(2), 438–443 (2019).
• Included in this study as a strong reference for considering internal and external validity together when designing studies and interpreting results.
4.
NICE. NICE real-world evidence framework (ECD9). (2022). https://www.nice.org.uk/corporate/ecd9/resources/nice-realworld-evidence-framework-pdf-1124020816837
•• The authors recommend this framework as a best practice reference when designing and conducting studies involving real-world data (RWD) or real-world evidence (RWE).
5.
NHS England. (2024). https://www.england.nhs.uk/medicines-2/innovative-medicines-fund/
6.
Kang J, Cairns J. “Don't think twice, it's all right”: using additional data to reduce uncertainty regarding oncologic drugs provided through managed access agreements in England. Pharmacoecon. Open 7, 77–91 (2023).
7.
Makady A, De Boer A, Hillege H et al. What is real-world data? A review of definitions based on literature and stakeholder interviews. Value Health 20, 858–865 (2017).
8.
Zhang J, Morley J, Gallifant J, Oddy C, Teo JT, Ashrafian H et al. Mapping and evaluating national data flows: transparency, privacy, and guiding infrastructural transformation. Lancet Digit. Health 5, e737–e748 (2023).
• Included as a reference as this study provides a map of data flows and linkage within England, as of 2023, and can serve as a baseline for estimating impact of regional and national secure data environment networks.
9.
NHS England. (2024). Date accessed: 31 May 2024 https://www.england.nhs.uk/digitaltechnology/nhs-federated-data-platform/
10.
Patorno E, Najafzadeh M, Pawar A et al. The EMPagliflozin compaRative effectIveness and SafEty (EMPRISE) study programme: design and exposure accrual for an evaluation of empagliflozin in routine clinical care. Endocrinol. Diabetes Metab. 3(1), e00103 (2019).
11.
Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat. Med. 28, 3083–3107 (2009).
12.
Tregoning JS, Flight KE, Higham SL, Wang Z, Pierce BF. Progress of the COVID-19 vaccine effort: viruses, vaccines and variants versus efficacy, effectiveness and escape. Nat. Rev. Immunol. 21, 626–636 (2021).
13.
GOV.UK. UK medicines regulator gives approval for first UK COVID-19 vaccine. (2020). Date accessed: 31 May 2024 https://www.gov.uk/government/news/uk-medicines-regulator-gives-approval-for-first-uk-covid-19-vaccine
14.
Schultz NH, Sørvoll IH, Michelsen AE et al. Thrombosis and thrombocytopenia after ChAdOx1 nCoV-19 vaccination. N. Engl. J. Med. 384, 2124–2130 (2021).
• Included here as an exemplar of how RWD can elucidate rare diseases, in some cases not feasibly achieved using randomized clinical trials.
15.
Sunder A, Saha S, Kamath S, Kumar M. Vaccine-induced thrombosis and thrombocytopenia (VITT); exploring the unknown. J. Family Med. Prim. Care 11(5), 2231–2233 (2022).
16.
GOV.UK. JCVI statement on use of the AstraZeneca COVID-19 vaccine. (2021). Date accessed: 31 May 2024 https://www.gov.uk/government/publications/use-of-the-astrazeneca-covid-19-vaccine-jcvi-statement/jcvi-statement-on-use-of-the-astrazeneca-covid-19-vaccine-7-april-2021
17.
Lau JJ, Cheng SMS, Leung K et al. Real-world COVID-19 vaccine effectiveness against the Omicron BA.2 variant in a SARS-CoV-2 infection-naive population. Nat. Med. 29, 348–357 (2023).
18.
Drysdale M, Galimov ER, Yarwood MJ et al. Comparative effectiveness of sotrovimab versus no treatment in non-hospitalised high-risk COVID-19 patients in north west London: a retrospective cohort study. BMJ Open Respir. Res. 11, e002238 (2024).
19.
Dron L, Kalatharan V, Gupta A et al. Data capture and sharing in the COVID-19 pandemic: a cause for concern. Lancet Digit. Health 4, e748–e756 (2022).
20.
Levy NS, Arena PJ, Jemielita T et al. Use of transportability methods for real-world evidence generation: a review of current applications. J. Comp. Eff. Res. 13(11), e240064 (2024).
21.
Thornhill JP, Barkati S, Walmsley S et al. Monkeypox virus infection in humans across 16 countries — April–June 2022. N. Engl. J. Med. 387, 679–691 (2022).
22.
Wolff Sagy Y, Zucker R, Hammerman A et al. Real-world effectiveness of a single dose of mpox vaccine in males. Nat. Med. 29, 748–752 (2023).
23.
GOV.UK. London at risk of measles outbreaks with modelling estimating tens of thousands of cases. (2023). Date accessed: 31 May 2024 https://www.gov.uk/government/news/london-at-risk-of-measles-outbreaks-with-modelling-estimating-tens-of-thousands-of-cases
24.
GOV.UK. Confirmed cases of measles in England by month, age and region: 2023. (2023). Date accessed: 31 May 2024 https://www.gov.uk/government/publications/measles-epidemiology-2023/confirmed-cases-of-measles-in-england-by-month-age-and-region-2023
25.
Klapsa D, Wilton T, Zealand A et al. Sustained detection of type 2 poliovirus in London sewage between February and July, 2022, by enhanced environmental surveillance. Lancet 400, 1531–1538 (2022).
26.
Hill M, Pollard AJ. Detection of poliovirus in London highlights the value of sewage surveillance. Lancet 400, 1491–1492 (2022).
27.
GOV.UK. Hepatitis C in England. (2023). Date accessed: 31 May 2024 https://www.gov.uk/government/publications/hepatitis-c-in-the-uk/hepatitis-c-in-england-2023
28.
Hughes S, Troise O, Donaldson H et al. Bacterial and fungal coinfection among hospitalized patients with COVID-19: a retrospective cohort study in a UK secondary-care setting. Clin. Microbiol. Infect. 26, 1395–1399 (2020).
29.
Aiesh BM, Natsheh M, Amar M et al. Epidemiology and clinical characteristics of patients with healthcare-acquired multidrug-resistant Gram-negative bacilli: a retrospective study from a tertiary care hospital. Sci. Rep. 14, 3022 (2024).
30.
Wu H, Jia C, Wang X et al. The impact of methicillin resistance on clinical outcome among patients with Staphylococcus aureus osteomyelitis: a retrospective cohort study of 482 cases. Sci. Rep. 13, 7990 (2023).
31.
GOV.UK. GRASP protocol. (2023). Date accessed: 31 May 2024 https://www.gov.uk/government/publications/gonococcal-resistance-to-antimicrobials-surveillance-programme-grasp-report/grasp-report-data-to-june-2023
32.
GOV.UK. Managing incidents of ceftriaxone-resistant Neisseria gonorrhoeae in England. (2022). Date accessed: 31 May 2024 https://www.gov.uk/government/publications/ceftriaxone-resistant-neisseria-gonorrhoeae-incident-management/managing-incidents-of-ceftriaxone-resistant-neisseria-gonorrhoeae-in-england.Access
33.
Mette A, Reuss AM, Feig M et al. Under-reporting of measles: an evaluation based on data from north rhine-westphalia. Dtsch. Arztebl. Int. 108, 191–196 (2011).
34.
Choi YH, Gay N, Fraer G, Ramsay M. The potential for measles transmission in England. BMC Public Health. 8, 338 (2008).
35.
Chowdhury SR, Das DC, Sunna TC et al. Global and regional prevalence of multimorbidity in the adult population in community settings: a systematic review and meta-analysis. eClinicalMedicine. 57, 101860 (2023).
• A meta-analysis of global prevalence of patients with multimorbid clinical profiles, included here as an exemplar of national and regional RWD registries.
36.
GOV.UK. Obesity Profile: short statistical commentary May 2024. (2024). https://www.gov.uk/government/statistics/update-to-the-obesity-profile-on-fingertips/obesity-profile-short-statistical-commentary-may-2024
37.
Haase CL, Eriksen KT, Lopes S et al. Body mass index and risk of obesity-related conditions in a cohort of 2.9 million people: evidence from a UK primary care database. Obes. Sci. Pract. 7, 137–147 (2021).
38.
Booth HP, Prevost AT, Gulliford MC. Severity of obesity and management of hypertension, hypercholesterolaemia and smoking in primary care: population-based cohort study. J. Hum. Hypertens. 30, 40–45 (2016).
39.
Farmer RE, Beard I, Raza SI et al. Prescribing in Type 2 diabetes patients with and without cardiovascular disease history: a descriptive analysis in the UK CPRD. Clin. Ther. 43, 320–335 (2021).
40.
Coles B, Khunti K, Booth S et al. Prediction of Type 2 diabetes risk in people with non-diabetic hyperglycaemia: model derivation and validation using UK primary care data. BMJ Open. 10, e037937 (2020).
41.
Donner E, Devinsky O, Friedman D. Wearable digital health technology for epilepsy. N. Engl. J. Med. 390, 736–745 (2024).
• Included in this paper as an exemplar of how digital technology can provide RWE and RWD to benefit patients with neurological disorders.
42.
Jia J, Ning Y, Chen M et al. Biomarker changes during 20 years preceding Alzheimer's disease. N. Engl. J. Med. 390, 712–722 (2024).
43.
Chen Z, Zhang H, Guo Y et al. Exploring the feasibility of using real-world data from a large clinical data research network to simulate clinical trials of Alzheimer's disease. Digit. Med. 4, 84 (2021).
44.
Edwards S, Trepel D, Ritchie C, Hahn-Pedersen JH et al. Real world outcomes, healthcare utilisation and costs of Alzheimer's disease in England. Aging Health Res. 4, 100180 (2024).
45.
Chowdhury S, Bjartell A, Lumen N et al. Real-world outcomes in first-line treatment of metastatic castration-resistant prostate cancer: The Prostate Cancer Registry. Target Oncol. 15, 301–315 (2020).
46.
Strongman H, Gadd S, Matthews A et al. Medium and long-term risks of specific cardiovascular diseases in survivors of 20 adult cancers: a population-based cohort study using multiple linked UK electronic health records databases. Lancet 394, 1041–1054 (2019).
47.
Conroy MC, Reeves GK, Allen NE. Multi-morbidity and its association with common cancer diagnoses: a UK Biobank Prospective Study. BMC Public Health 23, 1300 (2023).
48.
Martins T, Abel G, Ukoumunne OC et al. Ethnic inequalities in routes to diagnosis of cancer: A Population-Based UK Cohort Study. Br. J. Cancer 127, 863–871 (2022).
49.
Shiekh SI, Harley M, Ghosh RE et al. Completeness, agreement, and representativeness of ethnicity recording in the United Kingdom's Clinical Practice Research Datalink (CPRD) and linked Hospital Episode Statistics (HES). Popul. Health Metr. 21, 3 (2023).
50.
Arhi CS, Bottle A, Burns EM et al. Comparison of cancer diagnosis recording between the clinical practice research datalink, cancer registry and hospital episodes statistics. Cancer Epidemiol. 57, 148–157 (2018).
• Exemplar of how data provenance is paramount, where missingness and quality of cancer outcome data can vary amongst data sources.
51.
Hagberg KW, Vasilakis-Scaramozza C, Persson R et al. Quality and completeness of malignant cancer recording in United Kingdom Clinical Practice Research Datalink Aurum compared to Hospital Episode Statistics. Ann. Cancer Epidemiol. 6, 1–15 (2022).
52.
NHS England. NHS long term plan. (2020). Date accessed: 31 May 2024 https://www.longtermplan.nhs.uk/
53.
The King's Fund, what are health inequalities? (2022). Date accessed: 31 May 2024 https://www.kingsfund.org.uk/insight-and-analysis/long-reads/what-are-health-inequalities
54.
Public Health England, measuring mental wellbeing in children and young people. (2015). Date accessed: 31 May 2024 https://www.kingsfund.org.uk/insight-and-analysis/long-reads/what-are-health-inequalities
55.
NHS England, Mental Health of Children and Young People in England 2022 - wave 3 follow up to the 2017 survey: data tables. (2022). Date accessed: 31 May 2024 https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-of-children-and-young-people-in-england/2022-follow-up-to-the-2017-survey
56.
Lazzarino AI, Salkind JA, Amati F, Robinson T et al. Inequalities in mental health service utilisation by children and young people: a population survey using linked electronic health records from Northwest London, UK. J. Epidemiol. Community Health. 78, 191–198 (2023).
57.
Kingston A, Robinson L, Booth H et al. Projections of multi-morbidity in the older population in England to 2035: estimates from the Population Ageing and Care Simulation (PACSim) model. Age Ageing 47, 374–380 (2018).
58.
Phillips K, Hazlehurst JM, Sheppard C et al. Inequalities in the management of diabetic kidney disease in UK primary care: a cross-sectional analysis of a large primary care database. Diabet. Med. 41, e15153 (2024).
59.
Lawrence-Jones A, Chan J, Galimov E et al. Involving underrepresented groups: how unpaid carers influenced our data analysis. Int. J. Popul. Data Sci. 8, 2253 (2023).
60.
NICE. User guide for the cost comparison company evidence submission template. (2017). Date accessed: 31 May 2024 https://www.nice.org.uk/process/pmg32/resources/user-guide-for-the-cost-comparison-company-evidence-submission-template-pdf-72286772526277
61.
NICE. Tezepelumab for treating severe asthma. (2023). Date accessed: 31 May 2024 https://www.nice.org.uk/guidance/ta880
62.
NICE. Solriamfetol for treating excessive daytime sleepiness caused by obstructive sleep apnoea. (2022). Date accessed: 31 May 2024 https://www.nice.org.uk/guidance/ta777
63.
Kahan BC, Hindley J, Edwards M et al. The estimands framework: a primer on the ICH E9(R1) addendum. BMJ. 384, e076316 (2024).
64.
Dang LE, Gruber S, Lee H et al. A causal roadmap for generating high-quality real-world evidence. J. Clin. Transl. Sci. 7, e212 (2023).
65.
Rudolph JE, Zhong Y, Duggal P et al. Defining representativeness of study samples in medical and population health research. BMJ Med. 2, e000399 (2023).
66.
Zivich PN, Edwards JK, Lofgren ET et al. Transportability without positivity: a synthesis of statistical and simulation modeling. Epidemiology 35, 23–31 (2024).
67.
Feuerriegel S, Frauen D, Melnychuk V et al. Causal machine learning for predicting treatment outcomes. Nat. Med. 30, 958–968 (2024).
68.
NHS England Digital, Five Safes Framework. (2024). Date accessed: 31 May 2024 https://digital.nhs.uk/services/secure-data-environment-service/introduction/five-safes-framework
69.
Get involved - OneLondon. (2024). Date accessed: 31 October 2024
•• Patient awareness, input and feedback are paramount wherever patient’s data are considered for research. Public deliberations provide invaluable input to ensure that data are handled and researched safely and towards patient-centered goals.
70.
Wang SV, Schneeweiss S. A framework for visualizing study designs and data observability in electronic health record data. Clin. Epidemiol. 14, 601–608 (2022).
71.
Gatto NM, Wang SV, Murk W et al. Visualizations throughout pharmacoepidemiology study planning, implementation, and reporting. Pharmacoepidemiol. Drug Saf. 31, 1140–1152 (2022).
72.
Kraljevic Z, Bean D, Shek A et al. Foresight-;a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. Lancet Digit. Health 6, e281–e290 (2024).
73.
Sackett DL, Deeks JJ, Altman DG. Down with odds ratios!. Evidence Based Med. 1, 164 (1996).
74.
Monaghan TF, Rahman SN, Agudelo CW et al. Foundational statistical principles in medical research: a tutorial on odds ratios, relative risk, absolute risk, and number needed to treat. Int. J. Environ. Res. Public Health 18, 5669 (2021).
Information & Authors
Information
Published In
Copyright
© 2025 The authors. This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License
History
Received: 1 August 2024
Accepted: 31 January 2025
Published online: 7 March 2025
Keywords:
Topics
Authors
Metrics & Citations
Metrics
Article Usage
Article usage data only available from February 2023. Historical article usage data, showing the number of article downloads, is available upon request.
Citations
How to Cite
Real-world evidence: state-of-the-art and future perspectives. (2025) Journal of Comparative Effectiveness Research. DOI: 10.57264/cer-2024-0130
Export citation
Select the citation format you wish to export for this article or chapter.
Citing Literature
- Mathew Folaranmi Olaniyan, Odekunle Bola Odegbemi, Redefining immunochemistry for contemporary biomedical science, Discover Immunity, 10.1007/s44368-026-00028-9, 3, 1, (2026).
- Yao An Lee, Ying Lu, Earl J Morris, Xing He, Almut G Winterstein, Carl Henriksen, Jiang Bian, Jingchuan Guo, Assessing the quality of electronic health record data and the claims linked data for target trial emulation studies, JAMIA Open, 10.1093/jamiaopen/ooag102, 9, 3, (2026).
- Tanvir Kapoor, Harrison J. Hansford, Brooke A. Spaeth, Adam D. Irwin, Aidan G. Cashin, Target Trial Emulation and the TARGET Guideline to Advance Rural and Remote Health Research , Medical Journal of Australia, 10.5694/mja2.70205, 224, 5, (2026).
- Grammati Sarri, Bengt Liljas, Keith R. Abrams, Stephen J. Duffield, Murtuza Bharmal, Mapping the Use of Real-World Evidence Across the EU Health Technology Assessment Regulation: Methodological Considerations, Challenges, and Opportunities for Harmonization, Journal of Market Access & Health Policy, 10.3390/jmahp14020020, 14, 2, (20), (2026).
- Jiajv Chen, Wei Li, Reigniting the institutional engine for pharmaceutical access: when policy experiments become new global evidence, Frontiers in Public Health, 10.3389/fpubh.2026.1771961, 14, (2026).
- Doreen Samelson, Ben Pfingston, Lindsey Sneed, Dosage in Applied Behavior Analysis: Effect on Adaptive Behavior, Goal Attainment, and Dangerous Behavior, Journal of Autism and Developmental Disorders, 10.1007/s10803-025-07203-1, (2026).
- Kimberly C Claeys, Andrea M Prinzi, Tristan T Timbrook, Beyond Accuracy: Methodological Advances for Assessing the Clinical Impact of Infectious Disease Diagnostics, Open Forum Infectious Diseases, 10.1093/ofid/ofaf489, 12, Supplement_2, (S1391-S1403), (2025).
- Zilin Long, Houyu Zhao, Yueqi Yin, Yexiang Sun, Peng Shen, Hongbo Lin, Junchang Liu, Siyan Zhan, Zhiqin Jiang, Feng Sun, Traditional Chinese medicine use and risk of type 2 diabetes mellitus among patients with prediabetes: a population-based cohort study, Chinese Medicine, 10.1186/s13020-025-01214-x, 20, 1, (2025).
