Publication: A guide to understanding and evaluating studies on AI-assisted medical image analysis

Published on 11 December 2024

Modified on 11 December 2024

A guide to understanding and evaluating studies on AI-assisted medical image analysis

New technologies based on deep learning, a sub-discipline of Artificial Intelligence (AI), are gradually being integrated into doctors’ clinical practice, particularly for the analysis of medical images for diagnostic purposes. However, guaranteeing the effectiveness and reliability of these diagnostic tools relies on the possibility of rigorous evaluation of model development and assessment methods. But the complexity of the literature in the field of AI often makes it difficult for clinicians, who currently have little training in AI, to understand and critique studies. To meet this need, Prof. Jérémie Cohen, in collaboration with an international working group, has published a reading guide in the British Medical Journal. The guide is intended to help clinicians, researchers and public health decision-makers assess the quality of studies using deep learning to analyze diagnostic images.

In the first part, the authors explain in an accessible way the fundamental principles of convolutional neural networks, the AI models most widely used in deep learning-based image analysis medical devices. This approach gives readers the keys to understanding published studies in the field.

In the second part, the authors propose a 20-point grid for critical reading of articles, organized around several key themes, each aimed at addressing the specific challenges of using deep learning in medicine. Important elements of the grid include the need to ensure the clinical relevance of the technology (relevance to the targeted pathology and integration into the care pathway), the quality of the data used to train the models, and the robustness of the reference test. The authors also stress the importance of external validations, evaluating models on real clinical data never encountered during training, and the need to compare model performance with that of diagnostic tools already in routine use.

Finally, the authors illustrate the different points of their reading grid through 4 clinical studies using deep learning in pediatrics: detection of otitis on otoscopic images, detection of fracture and pneumonia on X-rays, and automatic analysis of facial photographs to help diagnose certain genetic diseases.
Beyond the methodological aspects, the researchers point out that the adoption of AI in medicine must be based on rigorous ethical standards. In particular, it is essential to ensure that AI does not exacerbate inequalities in access to care and diagnostic errors, potential risks if certain populations are under-represented in the databases used to train the models.
The researchers thus propose a widely accessible reading guide, so that clinicians and reviewers of studies evaluating the performance of deep learning models can familiarize themselves with this new and often complex literature.

By Jérémie Cohen

Link to the full article: https://www.bmj.com/content/387/bmj-2023-076703

Members

Jeremie Cohen

Professor

OPPaLE

Publication: A guide to understanding and evaluating studies on AI-assisted medical image analysis

Members

Jeremie Cohen

Comparison of definitions of multimorbidity and their association with prevalence, health profiles, and mortality

CRESS Welcomed a Delegation from the NIEHS

BENCHISTA-2026 International Benchmarking of Childhood Cancer Survival by Stage. Second in-person Project Working Group Meeting, held on June 15th and 16th in Milan, Italy

GRELL-2026 50th GRELL scientific sonference (Group for Epidemiology and Cancer Registry in Latin Language Countries), held from 13-15 May 2026 in Dijon, France

PNNS 5: a new initiative to improve the nutritional health of the French population, spearheaded by Mathilde Touvier

Inserm Chair in Pediatric Oncology and Choose France for Science 2026 Laureate: Catherine Métayer Invited to a Ministerial Roundtable on France's Scientific Attractiveness

The sleep-wake cycle: a new behavioral marker for dementia risk prediction

Identifying a fitness tool in early old-age to predict long-term risk of disability, severe disability, and mortality

Visit of Prof. Jonathan Repple and lecture on predictive models in psychiatry