Skip to main content
The Evidence Base Post

Choosing the right data for the right insight: comparing US real-world data sources in the context of atrial fibrillation

  • The Evidence Base

As real-world data (RWD) sources continue to expand, selecting the right dataset is critical not only for accurately measuring treatment patterns and medication use, but also for minimizing bias and ensuring findings reflect real clinical practice. While open and closed claims data may appear to offer similar insights, important differences can impact study outcomes. A new study titled, “Concomitant Usage of Contraindicated Medications in Patients with Atrial Fibrillation: A Comparison of Real-World Data Sources in the United States,” presented at ISPOR 2025, explores these distinctions in the context of atrial fibrillation treatment in the US. Specifically, it examines how claims type affects the identification of contraindicated medication use.

Here we speak with lead author Mike Sicilia (Forian, USA) to learn more about the methodology, key findings, and implications for future use of RWD in clinical research. 

Mike Sicilia with co-author Wouter van der Pluijm at ISPOR 2025

Thanks for speaking with The Evidence Base. Can you explain the key differences between open and closed claims as RWD sources and why it was important to analyze them independently in this study?

Healthcare claims data is a crucial source of RWD to provide insights into patient interactions with and within the healthcare system. Claims data is created every time a patient receives care or fills a prescription, capturing key details like service type, date, and cost.

Closed claims are healthcare claims that have been fully paid and adjudicated by the insurance carrier or payer. These data points are typically sourced directly from the payer organization and contain the complete picture of a patient’s journey, with that payer, along with information about enrollment in their health plan.

Open claims are healthcare claims sourced from practice management systems, clearing houses, or pharmacy benefit managers. Open claims are not directly tied to the patient’s health plan enrollment, making them less sensitive to patients switching plans, dual enrollments, and out-of-pocket spending.

We looked at open and closed claims separately because we wanted to better understand the built-in biases in each and see how those differences might shape the results of a health economics and outcomes research (HEOR) or RWE analysis.


Why did you choose to focus specifically on contraindicated concomitant medication use and treatment patterns in patients with atrial fibrillation? 

We chose this topic to study the difference between the sources because it has wide-reaching implications. In an RWE analysis, a logical next step would be to compare the outcomes of those patients who did and did not receive a contraindicated medication. However, our analysis shows that depending on the data source, those labels could be incorrect in the first place due to the data capture structure for the dataset, leading to a skewed and potentially inaccurate analysis.


Can you walk us through how the comparison between open and closed claims was conducted? 

In each dataset, we identified patients with at least two claims for atrial fibrillation, separated by at least three months, and who were treated with dofetilide after the atrial fibrillation diagnosis. For the closed claims, we required that each patient had at least 12 months of continuous enrollment after dofetilide. For the open claims, we required that each patient had at least 12 months of longitudinal history after dofetilide – what we call ‘continuous activity’. We then analyzed the contraindicated concomitancy rates within 12 months of dofetilide.


What were the main findings of the study, and were there any results that surprised you? 

One surprising finding was the notable variation in demographic breakdowns – such as race, net worth, and the gender ratio – across the different data sources. For example, in the open claims the male-to-female ratio was 1.42, while it was significantly higher at 3.05 in the closed claims. 

As for the main finding, we observed a statistically significant difference in the rates of the contraindicated concomitant medication use within 12 months of dofetilide treatment between the open and closed claims datasets (p < 0.001). 


How might limitations in either open or closed claims data affect the reliability or interpretation of findings in studies like this? How can researchers or clinicians mitigate those limitations? 

Both open and closed claims come with inherent limitations that can affect the reliability and interpretation of study findings. With open claims, the data capture is not standardized like it is with closed claims, which means there is a risk of incomplete data. This inconsistency can lead to gaps in a patient’s medical history, potentially skewing outcomes. To mitigate this, researchers can require that patients demonstrate sufficient longitudinal history (e.g., at least one claim every 6–12 months) to ensure a more complete and representative dataset for the study period.

Alternatively, closed claims data is limited to the scope of a specific health plan enrollment. As a result, any healthcare use outside of that plan, such as out-of-pocket services, out-of-network visits, or care received during dual enrollment periods, will not be captured, which means the data may not reflect the full picture of a patient’s care or health outcomes. 

“One way to mitigate this limitation is by layering in open claims data to create a hybrid dataset, which allows for a more comprehensive view of the patient journey by filling in gaps not captured by closed claims alone.”


The difference in prevalence between open and closed claims was statistically significant. What does this tell us about the importance of selecting the right data source for a given analysis?

The results of our study indicated that selecting the correct data source for a given analysis is essential to interpretable, accurate analyses that can be applied to as many patients as possible. Using the wrong source could lead to biased results or missed insights, potentially limiting the reliability and applicability of the analysis.


How should stakeholders, whether researchers, regulators, or industry, approach dataset selection, especially as more integrated or hybrid sources become available?

Stakeholders should place the greatest emphasis on data quality and whether the data is truly fit-for-purpose. Claims research has long been plagued by missing claims, therapy areas without ICD codes, non-specific ICD codes, limited insights into non-clinical patient factors such as income, and the absence of lab and vital sign values – among many other limitations that can halt research or require the use of complex proxies.

“It is essential for RWE stakeholders to ensure the highest-quality research by selecting a dataset that best addresses the known limitations of a claims-based study. This means accepting the role of a hybrid claims data ecosystem in place of a traditional closed source, and incorporating additional data sources, such as electronic health records and social determinants of health, where applicable.”


With the growing use of hybrid datasets, looking ahead how do you see these enriched sources advancing our understanding of treatment patterns and medication safety in real-world settings? What other insights can this provide?

Hybrid datasets give us an opportunity to capture the best elements of closed and open claims data. I see hybrid claims adding significant volumes of patients and longitudinality to closed claims, and bolstering the findings of traditional RWE/HEOR studies. Hybrid claims reveal what happens outside patients’ health plans with the addition of open claims, while still preserving valuable enrollment data from the closed claims.

“Over time, hybrid claims will help paint a more complete picture on patient journeys. Paired with other essential datasets like electronic health records and social determinants of health, hybrid claims data will uncover previously unknown patterns in the US healthcare system.”


Interviewee

Michael Sicilia
RWE Data Scientist, Forian

As an RWE Data Scientist at Forian, Inc., Mike designs and conducts real-world evidence studies. Prior to joining Forian in June 2024, Michael was at EVERSANA where he implemented complex outcome, economic, burden-of-illness, and machine learning analyses on rare disease patient populations. In total, he has over 5 years of experience in the real-world data/evidence space and holds a bachelor’s degree in Computer Science. 


Disclaimer

The opinions expressed in this feature are those of the interviewee/author and do not necessarily reflect the views of The Evidence Base® or Becaris Publishing Ltd.


Sponsorship for this Peek Behind the Poster was provided by Forian.