A guide to understanding and evaluating studies on AI-assisted medical image analysis
New technologies based on deep learning, a sub-discipline of Artificial Intelligence (AI), are gradually being integrated into doctors’ clinical practice, particularly for the analysis of medical images for diagnostic purposes. However, guaranteeing the effectiveness and reliability of these diagnostic tools relies on the possibility of rigorous evaluation of model development and assessment methods. But the complexity of the literature in the field of AI often makes it difficult for clinicians, who currently have little training in AI, to understand and critique studies. To meet this need, Prof. Jérémie Cohen, in collaboration with an international working group, has published a reading guide in the British Medical Journal. The guide is intended to help clinicians, researchers and public health decision-makers assess the quality of studies using deep learning to analyze diagnostic images.
In the first part, the authors explain in an accessible way the fundamental principles of convolutional neural networks, the AI models most widely used in deep learning-based image analysis medical devices. This approach gives readers the keys to understanding published studies in the field.
In the second part, the authors propose a 20-point grid for critical reading of articles, organized around several key themes, each aimed at addressing the specific challenges of using deep learning in medicine. Important elements of the grid include the need to ensure the clinical relevance of the technology (relevance to the targeted pathology and integration into the care pathway), the quality of the data used to train the models, and the robustness of the reference test. The authors also stress the importance of external validations, evaluating models on real clinical data never encountered during training, and the need to compare model performance with that of diagnostic tools already in routine use.
Finally, the authors illustrate the different points of their reading grid through 4 clinical studies using deep learning in pediatrics: detection of otitis on otoscopic images, detection of fracture and pneumonia on X-rays, and automatic analysis of facial photographs to help diagnose certain genetic diseases.
Finally, the authors illustrate the different points of their reading grid through 4 clinical studies using deep learning in pediatrics: detection of otitis on otoscopic images, detection of fracture and pneumonia on X-rays, and automatic analysis of facial photographs to help diagnose certain genetic diseases.
Beyond the methodological aspects, the researchers point out that the adoption of AI in medicine must be based on rigorous ethical standards. In particular, it is essential to ensure that AI does not exacerbate inequalities in access to care and diagnostic errors, potential risks if certain populations are under-represented in the databases used to train the models.
The researchers thus propose a widely accessible reading guide, so that clinicians and reviewers of studies evaluating the performance of deep learning models can familiarize themselves with this new and often complex literature.
By Jérémie Cohen
Link to the full article: https://www.bmj.com/content/387/bmj-2023-076703