RetinalGPT: Multimodal Large Language Model for Retinal Image Analysis

Invention Description

Multimodal large language models (MLLMs) have shown strong potential in analyzing complex data types such as images, video, and audio, prompting growing interest in their use for medical applications. While several general-domain MLLMs have been adapted for healthcare tasks, including retinal imaging, their performance remains limited when applied to clinically meaningful interpretation. In particular, existing models struggle to provide the quantitative analysis that medical experts rely on for accurate disease detection and assessment. This reveals a critical gap between general-purpose MLLMs and the specialized requirements of medical diagnostics, where precision, interpretability, and domain knowledge are essential. Bridging this gap is necessary for deploying MLLMs as reliable tools in clinical decision-making.

Researchers at Arizona State University have developed RetinalGPT, a multimodal conversational assistant designed specifically for clinically preferred quantitative analysis of retinal images. This tool is an advanced vision-language model tailored for retinal image analysis, combining large-scale retinal image datasets and innovative training methods to achieve these clinically preferred quantitative insights. It leverages a two-stage training process to align generic medical knowledge with specialized retinal diagnostics, enabling superior detection of retinal diseases and detailed lesion and vascular analyses. Beyond classification, it provides quantitative measurements and lesion localization, improving interpretability and clinical relevance.

RetinalGPT is a cutting-edge multimodal large language model designed to improve retinal disease diagnosis and lesion localization through advanced retinal image analysis.

Potential Applications

Clinical diagnosis and monitoring of retinal diseases
Automated lesion detection and localization tools in ophthalmology
Medical research ophthalmology and retinal pathology
Development of AI-assisted diagnostic platforms for eye care
Integration in telemedicine platforms for remote retinal analyses

Benefits and Advantages

Detailed lesion localization and vascular structure analysis capabilities.
Comprehensive processing of clinical features including disease labels, lesion bounding boxes, and vascular characteristics
Two-stage training that balances generic medical knowledge with retinal domain expertise
Improved interpretability in medical image analysis for clinical research
Specialized training dataset curated for clinical preferences in retinal analysis
Maintains broad medical knowledge while enhancing retinal-specific expertise
Superior performance across multiple benchmark datasets for retinal disease diagnosis

For more information about this opportunity, please see

Zhu et al. – arXiv – 2025

Inventor(s)

Technology categories

Licensing Contacts