The Changing Regulatory Landscape for Artificial Intelligence in Healthcare
Artificial Intelligence is undergoing a transformation, as AI technologies can now generate content (documents, images, movies etc) that can be indistinguishable from content generated by human experts. Such AI technologies can generate output that is highly plausible, but may not be accurate or real. This has raised many concerns about how AI should be regulated. For many applications of AI in healthcare, however, regulations already exist. Where an AI device is used to prevent, diagnose, or treat a disease, it is covered by the medical device regulations that require not just that the output is “plausible” but that the that device benefits are demonstrated to outweigh risks, and that residual risks are well managed in clinical use. The medical device regulatory framework is already being updated to take account of AI and provides useful framework to ensure that AI entering healthcare with appropriate controls in place.

Artificiel Intelligence becomes Mainstream.
The last 18 months have seen a dramatic increase in public interest in artificial intelligence (AI). There has been a huge amount of coverage in the media about recent developments in artificial intelligence – especially generative AI such as the large language models used in the Chat-GPT software. Many argue that AI algorithms can now generate output that is indistinguishable from human-expert created content, and is therefore poised to transform all our lives, for the better or worse.
Just a few years ago, there was debate about whether or not to regulate artificial intelligence – and much criticism was directed at the European Union proposing the first regulatory framework for AI (the AI act) in April 2021, on the grounds that this could stifle innovation. But in the last 12 months, as a result of rapid technological process and raising public awareness, the debate has shifted from whether to regulate AI, to how to regulate AI. There are increasing concerns that, unless properly regulated, AI could soon start having negative impacts on our quality of life. In late 2023, many world leaders responded to recent developments in AI, for example President Biden’s executive order on AI, and an AI Safety Summit in Bletchley Park hosted by UK Prime minister Rishi Sunak, attended by many business and political leaders.
Within the healthcare world, it is increasingly argued that AI will transform the healthcare system for patients and staff, including a widely reported publication that showed that generative AI can create such smart answers to questions that it can pass 3rd year medical exams, suggesting AI could disrupt patient-physician relations.
Many Applications of AI in Healthcare are Already Regulated.
It is important to realise, however many applications of AI in healthcare are already regulated. If AI is used in software that is intended to diagnose, prevent, manage, treat or alleviate a disease or injury, then it is already regulated as a medical device. I have recently published a paper on the evolving regulatory landscape for AI in medical devices. That paper focuses on medical imaging applications, but the conclusions are more generally applicable, and below I summarise some of the key issues from that paper.
The medical device regulators have been regulating software, whether this is stand-alone (Software as a Medical Device – SaMD), or integrated into hardware (Software in a Medical Device – SiMD) for decades and have been considering the implications of AI for many years. A guidance documenting proposing an international harmonised approach to clinical evaluation of medical device software was proposed in 2017 by the International Medical Device Regulators Forum, and machine learning algorithms (a type of AI) was discussed in this document. More recently, medical regulators have been publishing more dedicated discussion documents specifically related to use of AI in medical devices, and there are new global standards that describe how risks in AI should be managed.
Many hundreds of AI-enabled products have already been put on the market as medical devices. The US FDA, which has the most comprehensive publicly available medical device database, regulatory published the number of AI-enabled medical devices that have received marketing authorisations (510k, dae novo or PMA). In October 2023, the FDA published the list of nearly 700 devices that have received marketing authorisations. either pure software devices, or AI enabled hardware.
It is notable in this FDA list that the overwhelming majority of AI-enabled medical devices on the market are for applications in radiology, with cardiology (eg: ECG arrhythmia detection) second largest.
Medical imaging applications tend to dominate AI-enabled medical devices marketing authorisations because it is relatively easy to bring these onto the market. That is because most AI-enabled medical imaging devices don’t actually collect data from patients themselves – they analyse data collected from traditional medical imaging systems such as radiographs, CT scans, MRI, ultrasound etc. As a result, the AI models in these devices can be trained and validated on pre-existing data: there is no need to prospectively recruit patients for a clinical investigation to demonstrate the device has adequate performance. Similarly, many AI enabled cardiology devices are coming onto the market, both trained and validated on pre-existing ECG data, rather than needing longer and more expensive prospective studies.
Furthermore, many innovators in AI are also able to take advantage of large, publicly available database on which they can train and validate their algorithms. Examples include the Alzheimer’s Disease Neuroimaging Initiative, and the UK Biobank data. These shared databases can dramatically accelerate the time needed to develop an AI-enabled device for applications that can be trained and validated on pre-existing data.
Does AI Work Well Enough to Make Decisions about Patients?
There has been a lot of excitement about generative AI because it can generate very plausible answers to complicated questions or generate “deep fake” images and videos that are indistinguishable from the real thing. There is no-doubt that recent innovations in AI have resulted in such “plausible” output, and that it is increasingly difficult to distinguish AI generated content from human-generated content.
In the world of medical devices, however, the output of devices needs to have well characterised performance that gives a positive benefit:risk profile. The output of a medical device doesn’t just need to be “plausible”. Medical device regulators insist that performance of devices is assessed rigorously against industry standard, or using well established statistical metrics like sensitivity and specificity, accuracy etc. A core part of medical device development is a detailed analysis of risks, identifying how harm might arise to patients, the hazards that might lead to that harm, and putting in place mitigation for significant risks. Furthermore, medical device regulations require that the performance of medical devices is not only assessed before they are put on the market, but that performance and safety monitoring continues during clinical use.
While there is evidence that some AI enabled medical imaging devices can work as well or better than normal clinical practise, there is also evidence that real world performance of AI enabled devices can be substantially less good than the performance data obtained during validation.
The COVID pandemic also was associated with much innovation in AI algorithms, but a systematic review of these in the British Medical Journal found that the great majority appeared to have very limited clinical value.
AI algorithms are not written by programmers like traditional rule-based algorithms, but rather they learn how to generate their output using a process called “training”. The performance of the AI algorithms is then “tested”, a process often called validation. A major challenge in the development of AI-enabled medical devices is that the performance of AI medical devices is highly dependant on having well labelled training and test data. For example, to automatically identify a stroke from a CT scan, the AI model needs to be trained on lots of data that is known to have strokes, as well as data without strokes, so the AI algorithm can distinguish the two cases. For this to work well, the images need to be precisely labelled with the location of the stroke, which is a time consuming process. While hospitals have huge amounts of healthcare data, obtaining sufficient high quality labelled data from representative patient populations is very often a significant challenge. As a result, many AI-enabled devices have been trained on publicly available data such as ADNI or UK biobank, which has been criticised for being unrepresentative of true clinical populations. When an AI-enabled medical device is trained and tested on unrepr¬¬-esentative data, it is likely to perform much less well when subsequently applied “real world” clinical conditions. The data used needs be representative of patient population (demographics, co-morbidities etc) but also data source (eg: type of scanner or electrophysiology device), and clinical practise (eg: how patients are positioned in a scanner, positioning of electrodes in ECG or EEG). When data used for training and/or testing is not representative, it is described as “biased”.
Proactive Response from Medical Device Regulators.
Medical device regulators are aware of these issues, and have therefore been proactive in providing innovators and manufacturers with further guidance on how to develop and evaluate AI enabled medical devices. This is especially case with the US medical device regulator, FDA Centre for Devices and Radiological Health (CDRH).
Regulators are responding to some of the challenges in this field by requiring that AI enabled medical devices have more rigorous clinical evaluation than medical devices containing traditional rules-based algorithms. A recent standard proposed by the AAMI and BSI (BS/AAMI 34971:2023), now recognised as a consensus standard by the FDA, starts with a cautionary note:
Despite the sophistication and complicated methodologies employed, machine learning systems can introduce risks to safety by learning incorrectly, making wrong inferences, and then recommending or initiating actions that, instead of better outcomes, can lead to harm.
The amplification of errors in an AI system has the potential to create large scale harm to patients.
Regulators are now encouraging companies developing AI enabled medical devices consider the particular risks arising from AI, including risks of bias from unrepresentative training data and test data.
Regulation always involves a trade-off between ensuring safety and enabling innovation, and striking the wrong balance can either prevent effective new technology getting to patients, or result in a flood of unsafe devices entering clinical practise.
But a clear regulatory framework can also support innovation by providing clarity on the evidence needed in order to demonstrate that a medical device is safe and effective, and giving healthcare providers confidence that devices that have been cleared or approved by regulators have been properly evaluated, speeding up adoption.
As several systematic reviews have shown, the quality of the rapidly rising volume of academic publications in medical applications of AI is highly variable. Recent regulatory guidance documents should encourage good practise – such as training and testing algorithms on representative data, validating algorithms on a dataset that is independent on the data used for training, that the clinical application of the technology is properly considered, that any upgrade of the devices while in clinical use is carefully managed, and that there is appropriate human oversight.
Are AI-enabled Medical Devices Going to Disrupt Healthcare?
An examination of the published information about AI-enabled medical devices that have been put on the market in the US suggests that the great majority of these devices unlikely to have significant impact on clinical workflows or professional practise. That is because the indications for use of these devices, tends to emphasise that they need to be used under the supervision of clinical experts, with statements such as: “not intended to be used as a primary diagnostic device”, “notified clinicians are ultimately responsible for reviewing full images per the standard of care”, and to be used “in parallel with standard of care”.
These indications for use are caveated in this way, because manufacturers have not managed to generate adequate performance data to show that these AI-enabled devices can operate independently of normal clinical pathways, or without clinical supervision. Medical device regulators therefore insist that manufacturer mitigate the risk that performance is inadequate by caveating the way they can be used in the device labelling.
Current AI-enabled medical devices are therefore unlikely to prove disruptive in healthcare. But with the huge increase in investment into AI, by small and large companies alike, it is very likely that disruptive technologies will come onto the market.
Conclusions
Artificial intelligence technologies are undergoing rapid development, and attracting huge amounts of investment. Ever increasing numbers of AI-enabled devices are coming on the market. However, the requirement imposed by medical device regulators that manufacturers carefully assess performance of devices before they put them on the market means that the impact of current AI-enabled devices is likely to be more incremental than disruptive to healthcare systems in the near future.
Regulators have been careful to insist that risks of AI are mitigated in devices put on the market, and they are being proactive in providing manufacturers with updated guidance as the risks (as well as benefits) of AI in healthcare applications emerges. While the need for more rigorous assessment of AI-enabled devices that traditional software devices does slow down AI reaching the market, it is reassuring given that evidence shows that AI algorithms are particularly at risk of performing less well in a real world setting than they did during testing. These evolving regulatory frameworks should mean that, as more sophisticated AI driven technologies are developed, the testing and clinical implementation of these is done in a way that ensures human oversight and rigorous and on-going testing in realistic clinical conditions.