Skip to main content
The Evidence Base Post

NIH expands All of Us into the world’s largest integrated genomic and EHR research database

  • Joanne Walker
Illustration of diverse patient profiles connected to DNA strands and electronic health records, representing NIH’s expanded All of Us research database.

The NIH has released the largest update to its All of Us Research Program, providing researchers with linked genomic, electronic health record, and emerging multiomics data from more than 747,000 participants. The expanded resource strengthens opportunities for precision medicine, real-world evidence generation, and population-scale health research.


The Baseline

  • 747,000+ participants now contribute linked genomic, clinical, survey, and physical measurement data.
  • 535,000 whole genome sequences are connected to nearly 482,000 electronic health records, creating the world's largest integrated genomic and EHR research resource.
  • New clinical notes, wearable device data, and multiomics datasets broaden opportunities for precision medicine and longitudinal real-world evidence studies.

The National Institutes of Health (NIH) has issued the largest data release in the history of its All of Us Research Program, making data from more than 747,000 participants available through its cloud-based Researcher Workbench.

Launched in May 2018, the All of Us Research Program set out to enroll a diverse cohort of at least 1 million people across the US to accelerate biomedical research and improve health. Eight years later, the program has made substantial progress toward that goal.

The release establishes All of Us as the world's largest integrated genomic and electronic health record (EHR) database, combining more than 535,000 whole genome sequences with nearly 482,000 linked EHRs. Researchers also have access to participant surveys, physical measurements, genetic data, and information on lifestyle, social circumstances, and environmental factors, creating a broad longitudinal resource of real-world data (RWD) for precision medicine research.

According to the NIH, the latest release adds more than 114,000 participants compared with the previous version and increases EHR coverage by 22%. The expansion was supported through additional participant-mediated health record submissions and data obtained through health information exchange networks, helping to fill gaps for individuals who previously had no linked clinical records.

NIH Director Jay Bhattacharya highlighted the scale required for precision medicine, stating:

"There's a paradox at the heart of precision medicine. To tailor treatments to individuals, you actually need very large populations to uncover the patterns that connect genetics, lifestyle, and the environment to health outcomes. That is exactly what All of Us provides: research at unprecedented scale."

A distinguishing feature of the program remains its participant diversity. More than 645,000 participants, representing 86% of the cohort, come from communities historically underrepresented in biomedical research, including women, older adults, rural populations, people with disabilities, and individuals from diverse racial and ethnic backgrounds. Participants are represented across all 50 states and territories.

The release also broadens the types of data available for research. For the first time, investigators can access proteomics data, RNA sequencing data, and long-read whole genome sequencing, marking the program's entry into large-scale multiomics research. Multiomics combines multiple layers of biological information, such as genes, proteins, and RNA, to provide a more comprehensive understanding of disease mechanisms.

As Geoffrey S Ginsburg, Chief Medical and Scientific Officer and Acting Chief Data Officer of All of Us, explained:

"We now offer researchers not just the world's most robust whole genome sequencing dataset, but a fully integrated multiomics resource, with thousands of participants whose genomes, proteomes, RNA-seq, and long-read sequences can be analyzed together. This resource makes entirely new science possible."

The clinical data available through the platform has also increased in depth. Researchers now have access to 9.5 million deidentified clinical notes from more than 99,000 participants. Using natural language processing, approximately 96 million clinical concepts have been extracted and mapped to the observational medical outcomes partnership (OMOP) common data model, the standardized vocabulary widely used to support consistent analyses across healthcare datasets.

Additional RWD assets include wearable device information from approximately 68,000 Fitbit users, including longitudinal sleep measures, with Apple Health integration planned in a future release. All of Us also outlined plans to introduce linked geospatial and environmental data, enabling studies that examine the influence of neighborhood characteristics, air quality, and other social determinants of health alongside genomic and clinical information.

The growing breadth of linked data has important implications for RWE generation. By integrating genomic information with longitudinal clinical records, unstructured clinical text, wearable data, and future environmental measures within standardized data models, the resource supports more comprehensive phenotyping, facilitates reproducible analyses across institutions, and provides a foundation for methodological development in regulatory science and precision medicine research. To date, nearly 23,000 registered researchers have used the platform, contributing to more than 1400 peer-reviewed publications.

Register for free today to become a member of The Evidence Base and receive the latest news straight to your inbox.