Diabetes mellitus affects approximately 1 in 9 adults worldwide, with an estimated 589 million individuals living with the disease in 2024 and projections reaching 853 million by 2050; an estimated 43% remain undiagnosed.1,2 Diabetic retinopathy (DR) affects approximately 30% of individuals with diabetes and remains a leading cause of preventable blindness among working-age adults.3,4 Although health authorities recommend regular diabetic eye screening, the global ophthalmic workforce is insufficient to meet demand. An estimated 233,000 ophthalmologists across 194 countries serve this population, with the majority practicing in high-income countries.1,5,6 Approximately 80% of people with diabetes reside in low- and middle-income countries, where systematic screening programs are often limited or unavailable.1
These structural barriers have accelerated interest in automated retinal image analysis systems (ARIAS) powered by artificial intelligence (AI) as a scalable approach to diabetic eye screening.7 Several systems have achieved regulatory clearance, including 3 with US Food and Drug Administration (FDA) approval and multiple CE-marked platforms in Europe.8,9 However, regulatory clearance does not necessarily ensure readiness for real-world deployment. Here, we review current evidence supporting AI-based DR screening and outline key considerations for safe, equitable implementation.
Figure 1. In real-world settings, automated retinal image analysis systems powered by artificial intelligence must demonstrate high performance in detecting high-risk disease, as illustrated by this example of proliferative diabetic retinopathy.
Screening Programs
The English National Health Service (NHS) Diabetic Eye Screening Programme (DESP), launched in 2003, provides a model of what systematic screening involves and what it can achieve. By 2017 to 2018, the program screened 2.2 million people annually (82.7% uptake) using 2-field, 45° mydriatic digital photography graded by up to 3 trained human graders (primary, secondary, and tertiary) within a hierarchical quality assurance system.10,11 By 2009 to 2010, diabetic retinopathy (DR) was no longer the leading cause of certifiable blindness among working-age adults in England for the first time in at least 5 decades, based on national certification records.4,10
Population screening comes at a cost, requiring either dilated examination by an ophthalmologist or centralized remote grading of retinal images by trained human assessors for millions of people with diabetes each year. An independent evaluation of 20,258 consecutive screening episodes in the English DESP showed that automated retinal image analysis systems (ARIAS) can cost-effectively replace first-level human graders, with all disease-positive cases proceeding to human assessment and a random 10% of negative cases reviewed for quality assurance.7
In Scotland, a deep learning system evaluated retrospectively over 11 years demonstrated the potential to reduce manual grading workload to approximately 44% of episodes while maintaining 96.6% sensitivity for observable disease.12 Singapore has deployed the SELENA+ system nationally within a semiautomated triage model, with economic modeling estimating a 20% reduction in screening costs.13,14 In the United States, which has no whole population, centralized screening program, fewer than two-thirds of adults adhere to recommended diabetic eye screening, with disparities by insurance status, ethnicity, and socioeconomic status.14,15
These examples illustrate that the organizational infrastructure of a centralized screening program, not just image acquisition, is a prerequisite for sustained reductions in blindness. Nationwide population screening programs exist in Iceland (since 1980) and across the United Kingdom, with regional programs in Sweden, Finland, and several European countries;16 however, aside from Scotland’s early adoption of automated triage in 2011, widespread integration of ARIAS into these programs has not yet occurred.
AI Screening
In 2016, Gulshan et al demonstrated that a deep convolutional neural network could detect referable DR with an area under the curve of 0.991 on the EyePACS-1 validation set.17 Ting et al validated a deep learning system in the Singapore National DR screening program and 10 multi- ethnic cohorts.18
Three systems currently have FDA clearance: LumineticsCore (Digital Diagnostics), EyeArt (Eyenuk, Inc), and Aeye-DS (Aeye Health). The first clearance was achieved by Abràmoff et al in 2018 in a prospective pivotal trial of 900 primary care patients, which reported 87.2% sensitivity and 90.7% specificity for detection of more-than-mild diabetic retinopathy.19 The study was designed for point-of-care screening in primary care settings, where the prevalence of advanced disease in a sample of this size would typically be low. EyeArt received FDA clearance in 2020 with 95.5% sensitivity20 and was subsequently evaluated prospectively in 30,000 consecutive DESP patients across three UK screening centers.21 In Thailand, a prospective multisite study demonstrated that a deep learning system could match retina specialist performance in real-time community screening, although 8% to 39% of images were rejected based on quality thresholds, with poor lighting, connectivity limitations, and workflow disruption affecting the screening experience.22
In a landscape where multiple ARIAS are commercially available, understanding workflows, reference standards, and the populations in which each system was validated is essential to objectively assess performance and guide deployment in specific clinical contexts.8
Head-to-Head Validation
Differences in population, reference standard, and image acquisition across studies limit the reliability of cross-study comparisons. Head-to-head evaluation on the same dataset provides the most informative comparison.23,24
A head-to-head comparison of 7 algorithms (anonymized in the publication) on 311,604 retinal images from 23,724 veterans across two US Veterans Affairs centers found sensitivities ranging widely from 50.98% to 85.90%.25 Most algorithms performed no better than human graders, and one showed significantly lower sensitivity for proliferative DR (74.42%, P<.001).25
A comprehensive independent evaluation tested 8 CE-marked ARIAS (named in the publication) on 201,438 routine consecutive screening encounters from a curated set of 1.2 million images from the North East London DESP.9 Informed by evidence that ethnic disparities exist in DR progression rates,26,27 the study was powered to ensure equitable precision across ethnic groups for detection of rare but sight-threatening disease, including 1,978 encounters with proliferative DR (R3).9 The English DESP quality assurance standard requires human grader sensitivity above 85% for referable DR; there is no separate standard for R3. In this cohort, primary human graders correctly identified 96.4% of active R3 cases. Sensitivity for ARIAS detection of R3 ranged from 95.8% to 99.5%, and for moderate-to-severe nonproliferative DR from 96.7% to 99.8%. Results were largely consistent across age, sex, ethnicity, and socioeconomic deprivation, with within-ARIAS variation typically less than 2%. Sensitivity for referable DR overall ranged from 83.7% to 98.7%.9
False-positive rates for encounters with no observable DR varied across vendors from 4.3% to 61.4%, and overall screen-positive rates ranged from 23% to 74%.9 Applied to the 2.2 million people screened annually in England (65.4% with no observable DR and 12.2% with referable DR), the best-performing system would remove approximately 1.7 million encounters from the human grading queue (77%), whereas the worst would remove 572,000 (26%). ARIAS-based triage could save the NHS £8 million to £10 million per year.7,9 In the implementation model evaluated in these UK studies, ARIAS served as a first-pass triage system: all ARIAS-positive cases proceeded to human grading, and a proportion of ARIAS-negative cases were also reviewed for quality assurance.7
Variation in DR prevalence (from 20% in Southeast Asia to 27% in the Middle East and North Africa)1 affects the balance between workload reduction and referral volume for any given ARIAS. Although pooled estimates from meta-analyses (82 studies, 887,244 examinations, 25 devices, 28 countries) suggest overall diagnostic robustness,28 evaluations in target populations and real-world workflows prior to implementation provide the most direct evidence.8,28,29
Fairness
Equitable performance across populations is a prerequisite for AI in screening, not an aspiration. Image-based algorithms perform less well on more pigmented skin in other clinical domains.30 Ophthalmic AI development faces similar risks of algorithmic bias. Publicly available retinal datasets are concentrated in a few regions, fewer than 20% include patient demographics, and nearly half of the global population has no representative imaging data.31 Ethnic disparities in DR progression rates further compound these concerns.26,27
Demographic labels are imprecise descriptors of biological variation in retinal imaging. The retinal pigment score (RPS) is a continuous metric that quantifies background retinal pigmentation from fundus photographs and has been validated in UK Biobank and EPIC-Norfolk.32 A genome-wide association study identified 20 loci associated with skin and hair pigmentation, supporting its biological validity. Although RPS correlates with self-reported ethnicity, scores overlap substantially between groups, confirming that fundus pigmentation exists on a biological continuum rather than as a categorical trait defined by ethnicity.32 RPS provides a practical tool for describing pigmentation diversity in retinal image datasets and assessing AI performance across this spectrum, complementing demographic reporting.
Translating these findings into safe deployment requires agreed methodology, regulatory standards, and stakeholder engagement. A replicable, vendor-neutral framework for independent AI evaluation within a national screening program has been described, including standardized assessment of fairness across population subgroups, and is transferable to other settings.29 On behalf of the UK National Screening Committee, existing screening criteria have been mapped to the evidence requirements for AI integration, identifying that comparative accuracy data from diverse populations and prospective implementation studies are lacking for most commercial systems.23 Surveys of patients and practitioners in both the NHS DESP and low-resource settings have found support for AI in screening, conditional on retention of human oversight and clear communication of results.33,34
Future Directions
Beyond diagnostic accuracy, trial-level evidence indicates that AI screening can improve care delivery. A randomized controlled trial of point-of-care AI screening in youth with diabetes achieved 100% completion of eye examinations, compared with 22% for standard referral to an eyecare provider.35 AI-based screening has also been shown to improve follow-up uptake36 and increase access in underserved populations.37 Retinal biomarkers may further extend the role of fundus imaging beyond DR to broader systemic risk assessment.
As AI makes screening scalable in settings where population-level programs previously did not exist, treatment capacity must keep pace. The Wilson and Jungner criteria specify that facilities to diagnose and treat detected disease should be in place before screening is implemented. Screening without a functioning referral pathway risks identifying sight-threatening disease that cannot be treated within recommended timelines. Conversely, ARIAS may themselves free capacity within the system by reducing physician workload in low-risk screening and removing image grading burdens from nonmedical health professionals, allowing staff time to be redirected toward diabetes diagnosis, retinal image acquisition, or patient counseling. Practical barriers to implementation—including electronic health record integration, operator training, image quality assurance, and sustainable reimbursement—remain key determinants of successful adoption and require attention alongside clinical evidence.38
Conclusion
The evidence supports integrating ARIAS into established screening programs as a first-pass triage before human grading. Ensuring that algorithms do not perpetuate or amplify existing inequities in eye care is not a secondary consideration but a design requirement. Objective metrics such as retinal pigment score (RPS), transparent reporting of performance across subgroups, and independent population-level validation are practical steps through which this can be achieved and trust can be built among patients, clinicians, and policymakers. RP
References
1. International Diabetes Federation. IDF Diabetes Atlas 11th Edition. 2025. Accessed April 13, 2026. https://diabetesatlas.org/atlas/
2. Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119. doi:10.1016/j.diabres.2021.109119
3. Teo ZL, Tham YC, Yu M, et al. Global prevalence of diabetic retinopathy and projection of burden through 2045: systematic review and meta-analysis. Ophthalmology. 2021;128(11):1580-1591. doi:10.1016/j.ophtha.2021.04.027
4. Liew G, Michaelides M, Bunce C. A comparison of the causes of blindness certifications in England and Wales in working age adults (16-64 years), 1999-2000 with 2009-2010. BMJ Open. 2014;4(2):e004015. doi:10.1136/bmjopen-2013-004015
5. Resnikoff S, Lansingh VC, Washburn L, et al. Estimated number of ophthalmologists worldwide (International Council of Ophthalmology update): will we meet the needs? Br J Ophthalmol. 2020;104(4):588-592. doi:10.1136/bjophthalmol-2019-314336
6. Burton MJ, Ramke J, Marques AP, et al. The Lancet Global Health Commission on Global Eye Health: vision beyond 2020. Lancet Glob Health. 2021;9(4):e489-e551. doi:10.1016/S2214-109X(20)30488-5
7. Tufail A, Kapetanakis VV, Salas-Vega S, et al. An observational study to assess if automated diabetic retinopathy image assessment software can replace one or more steps of manual imaging grading and to determine their cost-effectiveness. Health Technol Assess Winch Engl. 2016;20(92):1-72. doi:10.3310/hta20920
8. Rajesh AE, Davidson OQ, Lee CS, Lee AY. Artificial intelligence and diabetic retinopathy: AI framework, prospective studies, head-to-head validation, and cost-effectiveness. Diabetes Care. 2023;46(10):1728-1739. doi:10.2337/dci23-0032
9. Rudnicka AR, Shakespeare R, Chambers R, et al. Automated retinal image analysis systems to triage for grading of diabetic retinopathy: a large-scale, open-label, national screening programme in England. Lancet Digit Health. 2025;7(11):100914. doi:10.1016/j.landig.2025.100914
10. Scanlon PH. The contribution of the English NHS Diabetic Eye Screening Programme to reductions in diabetes-related blindness, comparisons within Europe, and future challenges. Acta Diabetol. 2021;58(4):521-530. doi:10.1007/s00592-021-01687-w
11. Scanlon PH. The English National Screening Programme for diabetic retinopathy 2003-2016. Acta Diabetol. 2017;54(6):515-525. doi:10.1007/s00592-017-0974-1
12. Fleming AD, Mellor J, McGurnaghan SJ, et al. Deep learning detection of diabetic retinopathy in Scotland’s diabetic eye screening programme. Br J Ophthalmol. 2024;108(7):984-988. doi:10.1136/bjo-2023-323395
13. Xie Y, Nguyen QD, Hamzah H, et al. Artificial intelligence for teleophthalmology-based diabetic retinopathy screening in a national programme: an economic analysis modelling study. Lancet Digit Health. 2020;2(5):e240-e249. doi:10.1016/S2589-7500(20)30060-1
14. Gunasekeran DV, Miller S, Hsu W, et al. National use of artificial intelligence for eye screening in Singapore. NEJM AI. 2024;1(12):AIcs2400404. doi:10.1056/AIcs2400404
15. Eppley SE, Mansberger SL, Ramanathan S, Lowry EA. Characteristics associated with adherence to annual dilated eye examinations among US patients with diagnosed diabetes. Ophthalmology. 2019;126(11):1492-1499. doi:10.1016/j.ophtha.2019.05.033
16. Huemer J, Wagner SK, Sim DA. The evolution of diabetic retinopathy screening programmes: a chronology of retinal photography from 35 mm slides to artificial intelligence. Clin Ophthalmol Auckl NZ. 2020;14:2021-2035. doi:10.2147/OPTH.S261629
17. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216
18. Ting DSW, Cheung CYL, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318(22):2211-2223. doi:10.1001/jama.2017.18152
19. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. doi:10.1038/s41746-018-0040-6
20. Ipp E, Liljenquist D, Bode B, et al. Pivotal evaluation of an artificial intelligence system for autonomous detection of referrable and vision-threatening diabetic retinopathy. JAMA Netw Open. 2021;4(11):e2134254. doi:10.1001/jamanetworkopen.2021.34254
21. Heydon P, Egan C, Bolter L, et al. Prospective evaluation of an artificial intelligence-enabled algorithm for automated diabetic retinopathy screening of 30,000 patients. Br J Ophthalmol. 2021;105(5):723-728. doi:10.1136/bjophthalmol-2020-316594
22. Ruamviboonsuk P, Tiwari R, Sayres R, et al. Real-time diabetic retinopathy screening by deep learning in a multisite national screening programme: a prospective interventional cohort study. Lancet Digit Health. 2022;4(4):e235-e244. doi:10.1016/S2589-7500(22)00017-6
23. Macdonald T, Zhelev Z, Liu X, et al. Generating evidence to support the role of AI in diabetic eye screening: considerations from the UK National Screening Committee. Lancet Digit Health. 2025;7(5):100840. doi:10.1016/j.landig.2024.12.004
24. Cleland CR, Tufail A, Egan C, et al. Independent and openly reported head-to-head comparative validation studies of AI medical devices: a necessary step towards safe and responsible clinical AI deployment. Lancet Digit Health. 2025;7(11):100915. doi:10.1016/j.landig.2025.100915
25. Lee AY, Yanagihara RT, Lee CS, et al. Multicenter, head-to-head, real-world validation study of seven automated artificial intelligence diabetic retinopathy screening systems. Diabetes Care. 2021;44(5):1168-1175. doi:10.2337/dci21-0007
26. Olvera-Barrios A, Owen CG, Anderson J, et al. Ethnic disparities in progression rates for sight-threatening diabetic retinopathy in diabetic eye screening: a population-based retrospective cohort study. BMJ Open Diabetes Res Care. 2023;11(6):e003683. doi:10.1136/bmjdrc-2023-003683
27. Olvera-Barrios A, Rudnicka AR, Anderson J, et al. Two-year recall for people with no diabetic retinopathy: a multi-ethnic population-based retrospective cohort study using real-world data to quantify the effect. Br J Ophthalmol. 2023;107(12):1839-1845. doi:10.1136/bjo-2023-324097
28. Wang TW, Luo WT, Tu YK, Chou YB, Wu YT. Systematic review and meta-analysis of regulator-approved deep learning systems for fundus diabetic retinopathy detections. NPJ Digit Med. 2025;9:110. doi:10.1038/s41746-025-02223-8
29. Fajtl J, Welikala RA, Barman S, et al. Trustworthy evaluation of clinical AI for analysis of medical images in diverse populations. NEJM AI. 2024;1(9). doi:10.1056/AIoa2400353
30. Buolamwini J, Gebru T. Gender shades: intersectional accuracy disparities in commercial gender classification. In: Friedler SA, Wilson C, eds. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, vol 81. PMLR; 2018:77-91.
31. Jacoba CMP, Celi LA, Lorch AC, et al. Bias and non-diversity of big data in artificial intelligence: focus on retinal diseases. Semin Ophthalmol. 2023;38(5):433-441. doi:10.1080/08820538.2023.2168486
32. Rajesh AE, Olvera-Barrios A, Warwick AN, et al. Machine learning derived retinal pigment score from ophthalmic imaging shows ethnicity is not biology. Nat Commun. 2025;16(1):60. doi:10.1038/s41467-024-55198-7
33. Wahlich C, Chandrasekaran L, Chaudhry UAR, et al. Patient and practitioner perceptions around use of artificial intelligence within the English NHS diabetic eye screening programme. Diabetes Res Clin Pract. 2025;219:111964. doi:10.1016/j.diabres.2024.111964
34. Mathenge W, Whitestone N, Nkurikiye J, et al. Impact of artificial intelligence assessment of diabetic retinopathy on referral service uptake in a low-resource setting: the RAIDERS randomized trial. Ophthalmol Sci. 2022;2(4):100168. doi:10.1016/j.xops.2022.100168
35. Wolf RM, Channa R, Liu TYA, et al. Autonomous artificial intelligence increases screening and follow-up for diabetic retinopathy in youth: the ACCESS randomized control trial. Nat Commun. 2024;15(1):421. doi:10.1038/s41467-023-44676-z
36. Rahmati M, Smith L, Piyasena MP, et al. Artificial intelligence improves follow-up appointment uptake for diabetic retinal assessment: a systematic review and meta-analysis. Eye (Lond). 2025;39(12):2398-2406. doi:10.1038/s41433-025-03849-4
37. Huang JJ, Channa R, Wolf RM, et al. Autonomous artificial intelligence for diabetic eye disease increases access and health equity in underserved populations. Npj Digit Med. 2024;7(1):196. doi:10.1038/s41746-024-01197-3
38. Tran J, Estevez JJ, Howard NJ, Kumar S. Barriers and enablers influencing the implementation of artificial intelligence for diabetic retinopathy screening in clinical practice: a scoping review. Clin Experiment Ophthalmol. 2025;53(7):791-802. doi:10.1111/ceo.14567







