The Evidence Base Post 28 May 2025

Using patient-level SDOH to strengthen outcomes research and address health disparities

The Evidence Base

Efforts to address social drivers of health (SDOH) are gaining momentum across healthcare, but accurately capturing SDOH data remains a persistent challenge. While neighborhood-level proxies, such as census or ZIP code-based measures, are commonly used, they often do not provide a complete picture of the circumstances of individuals. Presented at ISPOR 2025, the study “Incorporating Social Drivers of Health Information Into Health Economics and Outcomes Research: Neighborhood-Level Proxies Versus Individual-Level Data” directly compares aggregate and individual-level SDOH data to understand the extent to which neighborhood proxies reflect true patient-level characteristics.

In this interview, lead author Karl Kilgore (Director, Research Science and Advanced Analytics, Inovalon) discusses the findings and explores the implications for improving health equity, tailoring interventions, and enhancing the accuracy of outcomes research using more precise, patient-level information.

To begin, could you explain what motivated this research and why comparing neighborhood-level proxies to individual-level SDOH data is important for healthcare decision-making?

Although there is no debate about the importance of SDOH as factors in healthcare economics and outcomes, the commonly used data sources for this type of research, such as claims databases and electronic health records, traditionally have included very little information related to SDOH. For many decades, a work-around for this lack of available data on social risk factors has been the use of neighborhood-level data from sources such as the US Census. Researchers can obtain data on the characteristics of the people who live in a certain neighborhood, such as 5-digit ZIP Code areas, in the aggregate, and then impute those data down to the individuals of interest in their study using their home address. The motivation behind the current study is seeking to answer the question: How good are these neighborhood-level aggregates at being proxies for true individual-level measures of the same SDOH factors?

Can you tell us more about the datasets used in your analysis and how they enabled you to link patient-level claims data with both individual and neighborhood-level SDOH characteristics?

The patients in our study were a large sample of patients taken from the Inovalon MORE² registry. These patients were matched to a set of SDOH variables from multiple, comprehensive individual and household databases sourced from Acxiom, Inc. These data were aggregated at two different neighborhood sizes: 9-digit and 5-digit ZIP Code areas. Next, using a carefully controlled internal method that assured the privacy of all information, patient personally identified information was tokenized, or encoded, by an authorized third-party system. These tokens were then transmitted via secure transfer methods to Acxiom via the third party, during which the Inovalon tokens were translated into the corresponding Acxiom tokens. This ensures that neither Inovalon nor Acxiom can re-identify each other’s private information, while at the same time allowing the matching of the Inovalon patients to their individual-level SDOH maintained by Acxiom. The individually matched patient records were then securely transmitted back to Inovalon, resulting in a dataset of patients with measures on a set of identical SDOH variables at three different levels of aggregation: individual, 9-digit ZIP code, and 5-digit ZIP code.

Your study analyzed multiple levels of geographic aggregation. Could you walk us through the methodological approach, particularly how you measured and compared the variance between individual and neighborhood SDOH?

An important advantage of the method described above is that the three levels of aggregation all came from the same set of survey items. In other words, in calculating their neighborhood-level characteristics, Acxiom used the exact same survey items and the exact same patients as they provided at the patient-level, just at different levels of aggregation. Thus, the variables at the three levels of aggregation were all strictly comparable to each other, varying only in the number of people used to calculate them: either 1 (for individual level), or the respective populations in the 9-digit and 5-digit ZIP code neighborhoods.

"It is important to point out now that the process of ‘rolling up’ the individual values of each variable to the neighborhood level involves averaging across multiple individuals. Within a neighborhood, values of a specific variable (e.g., household annual income) may vary a lot or a little person-to-person."

A neighborhood with a population that is fairly homogeneous in terms of a given variable like income can end up with the same aggregate score as a very diverse neighborhood where half the population is very low income and the other half is high income. The risk of this loss of precision of measurement increases as the neighborhood size increases.

In statistics, the name for the amount of variation around a central score such as the mean is the ‘variance’. You can test for how similar two variables are to each other by using statistical methods to see how much the variance in one variable predicts the variance in the second variable, and that is what we did in this study. We compared how much of the variance in individual-level scores was consistent with, or explained by, the variance of the aggregate proxies. If that shared variance is high, then aggregate proxies are good estimates of individual scores; the shared variance is low, then the aggregate scores are less good substitutes for the individual measures.

What were the study’s key findings, and did any outcomes stand out as particularly noteworthy?

In summary, the accuracy of aggregate neighborhood characteristics as proxies for individual characteristics varied significantly by which SDOH characteristic we looked at and by the size of the neighborhood used to calculate the aggregate proxy. Across all variables, the proportion of variance in individual measures accounted for by 5-digit ZIP code aggregates was about half that for 9-digit ZIP code.

Which SDOH variables showed the greatest discrepancies between individual and proxy measures, and what might this mean for health equity efforts?

Household net worth and income, as well as dwelling type (single-family versus multi-family dwellings) showed the highest accuracy. This is a positive outcome, given that the literature shows the most commonly used single measure of social position in US studies is household income. On the other end of the scale, marital status and household size, the number of people comprising the household, were very poorly predicted by aggregate proxies. These two social characteristics can be thought of as measuring the amount of social support available to an individual, and, when the household size is precisely one, as a potential measure of social isolation or loneliness, which the literature has shown to be important outcome predictors, especially among older Americans.

What do your findings suggest about the risks of relying solely on neighborhood-level proxies when developing population health interventions or conducting outcomes research?

First, something is better than nothing. Although we saw differences in the accuracy of aggregate proxies by characteristic and neighborhood size, all the analyses were statistically significant. This means that, other things being equal, any inferences you draw based on aggregate proxies are more accurate than not having anything at all. Second, size matters: measures aggregated from smaller neighborhoods are better proxies because they ‘average over’ less variation in the underlying population. Finally, aggregate proxies are most useful in identifying population-level trends in the association between SDOH and other dimensions of healthcare including risk for disease, access to care, survival, resource utilization and costs. These demonstrated relationships are more difficult to apply at an individual clinical level.

With increasing access to tokenized, individual-level data, how do you see the role of SDOH evolving in HEOR over the next few years?

The widespread availability of individual-level SDOH data offers the potential to significantly impact clinical guidelines and clinical care by dramatically increasing the precision of the models of the relationship between SDOH and healthcare factors enumerated above.

While the general, population-level trends currently being promulgated are very useful, more precise models are needed that can accurately describe the relationship between the specifics of the medical condition of the patient and the most impactful SDOH for that type of patient. We need to move from general population trends down to the level of, “for patients with this disease state at this level of severity, the most significant social risk factors you need to address are X, Y and Z.”

Interviewee

Karl Kilgore, PhD
Director, Research Science and Advanced Analytics, Inovalon

Karl Kilgore has over 30 years of experience designing and developing health economics and outcomes studies, descriptive and analytic epidemiology studies, outcomes-based marketing programs, and disease registries.

Kilgore’s professional interests include social drivers of health, how social, behavioral and genetic factors interact to impact the health of individuals and populations, risk adjustment methodologies for health performance measurement and reimbursement, and real-world outcomes of novel therapies, particularly in oncology.

He received his PhD in Psychology from the University of Chicago with concentration in Epidemiology, Statistical Analysis, Research Methods, and Psychometrics.

Disclaimer

The opinions expressed in this feature are those of the interviewee/author and do not necessarily reflect the views of The Evidence Base® or Becaris Publishing Ltd.

Sponsorship for this Peek Behind the Poster was provided by Inovalon, Inc.