UK researchers to train generative model on 57 million health records to advance disease prediction

A new pioneering AI model is being trained on de-identified health records from 57 million NHS patients in England. The world-first pilot project, called Foresight AI, aims to forecast future health risks and support a shift toward personalized, preventative care.
Foresight is the first generative AI model to be trained on the entire population-level dataset of NHS England. Designed to predict major health events, it has the potential to transform how the NHS plans and delivers preventative care nationwide. Its development is supported by the newly announced Health Data Research Service in the UK, which aims to make it easier for projects like Foresight to securely access data and drive innovation to improve patient care.
The AI model is being developed by researchers at University College London (UCL) and King’s College London (KCL), supported by the NIHR Biomedical Research Centre at UCLH. Training is taking place within the NHS England Secure Data Environment (SDE), which provides controlled access to de-identified health records under strict governance. The dataset comprises routinely collected information from 57 million individuals, including GP consultations, hospital admissions, diagnoses, procedures, prescribed medications, and COVID-19 vaccinations. In total, Foresight draws on more than 10 billion healthcare events recorded between November 2018 and December 2023.
“This is the first time an AI model has been used within health research on 57 million people. This is a real step forward,” said Professor Angela Wood (Associate Director, British Heart Foundation Data Science Centre).
Foresight is built on generative AI architecture similar to models like ChatGPT, but instead of generating text, it applies generative AI techniques to model longitudinal health trajectories and forecast major medical events such as hospitalizations, cardiovascular incidents, and the onset or progression of chronic diseases. Although initially limited to COVID-19 research, it is being evaluated for its ability to predict over 1000 different health outcomes, including the risk of hospitalization or death within a year.
“Foresight is an exciting step towards being able to predict disease and complications before they happen, giving us a window to intervene and enabling a shift towards more preventative healthcare at scale,” said Dr Chris Tomlinson (Lead Researcher, UCL Institute of Health Informatics).
By training on national-scale data, Foresight aims to capture diverse health patterns and support population-wide benefits. Potential applications include informing planning and resource allocation, predicting diseases to enable earlier interventions, advancing precision medicine for improved outcomes, and promoting health equity through inclusive, representative predictive analytics. “AI models are only as good as the data on which they’re trained,” Tomlinson explained. “Using national-scale data allows us to represent the kaleidoscopic diversity of England’s population, particularly for minority groups and rare diseases, which are often excluded from research.”
Simon Ellershaw (PhD researcher, UCL) highlighted the technical achievement:
“Combining the computing resources needed for AI with NHS data has always been challenging, but thanks to the support of our partners we’ve been able to safely and securely apply state-of-the-art AI methods to NHS data at unprecedented scale.”
Experts have praised the model’s scale and potential to reshape care delivery:
“This is the first time a predictive system has been trained on the entire health footprint of a nation, with about 57 million patient records,” said Dr Prasanth Kamma (Lead Architect at the Center of Excellence and Fellow at the Institution of Engineering and Technology). “The model helps identify which communities are being overlooked, allowing support to reach people earlier. It represents a shift from reactive healthcare policy to a more proactive and preventative model.”
The project builds on earlier research involving smaller NHS datasets. Researchers now aim to explore how Foresight can support clinical decision-making and NHS service planning. There are also plans to enhance the model by incorporating richer data sources, such as clinician notes, lab results and medical imaging, when these become available nationally.
Dr Vin Diwakar (National Director of Transformation, NHS England) said:
“AI has the potential to transform the way we prevent and treat disease, if trained on large datasets and safely tested. The NHS Secure Data Environment has been fundamental to this pioneering research, shaping a future where earlier treatments and interventions are targeted to those who will benefit, preventing future ill health. This will boost our ability to move quickly towards personalized, preventative care.”
A BHF Data Science Centre public contributor, involved in reviewing and approving the project, said,
“It’s important that people know how their health data is being used, so it’s encouraging to see a focus on transparency and making sure AI is used in the NHS in a safe, ethical way with public benefit at its heart.”
Foresight is a collaboration between NHS England, UCL, KCL, the British Heart Foundation Data Science Centre, NIHR Biomedical Research Centres and partners including AWS, Databricks and CogStack. It is supported by the NHS AI Lab, UK Research and Innovation, the Medical Research Council and Health Data Research UK.
| Register for free today to become a member of The Evidence Base and receive the latest news straight to your inbox. |