HomeAI in HealthCOPE: Chain-of-Thought Prediction Engine for Open Source Large Language Stroke Outcome Prediction...

COPE: Chain-of-Thought Prediction Engine for Open Source Large Language Stroke Outcome Prediction from Clinical Notes

Development and Evaluation of the Chain-of-Thought Outcome Prediction Engine (COPE)

The prediction of post-stroke outcomes is a complex and critical task in the management of acute ischemic stroke (AIS) patients. The Chain-of-Thought Outcome Prediction Engine (COPE) is an innovative tool designed to enhance the accuracy of predicting 90-day functional outcomes from unstructured clinical notes, providing a novel approach in the realm of medical AI.

Methods

To evaluate COPE, researchers included a cohort of 464 AIS patients, utilizing their discharge summaries and 90-day modified Rankin Scale (mRS) scores. COPE employs a two-stage Chain of Thinking (CoT) framework utilizing open-source sequential models, specifically LLaMA-3-8B. The first stage of COPE generates intermediate clinical reasoning, while the second stage predicts the mRS outcome. The performance of COPE was benchmarked against several models: GPT-4.1, ClinicalBERT, a structured variable-based machine learning model (XGBoost), and a single-stage large language model (LLM) lacking CoT.

The efficacy of these models was assessed using three metrics: mean absolute error (MAE), accuracy within ±1 mRS points (±1 ACC), and exact accuracy (ACC). These metrics provide a comprehensive understanding of the model’s predictive capabilities, aligning with clinical expectations for precise and reliable outcome forecasting.

Results

COPE demonstrated a mean absolute error (MAE) of 1.01 (95% CI 0.92 to 1.11), an accuracy within ±1 mRS point of 74.4% (95% CI 69.9% to 78.8%), and an exact accuracy of 32.8% (95% CI 28.0% to 37.6%). These results are comparable to those of GPT-4.1, which achieved an MAE of 1.00 (95% CI 0.90 to 1.10), ±1 ACC of 77.9% (95% CI 73.7% to 82.0%). COPE’s performance was consistent with a strong structured data baseline using XGBoost (MAE 0.89) and surpassed both ClinicalBERT and the single-stage LLM.

Subgroup analysis indicated that COPE maintained consistent performance across different genders and age groups. However, higher predictive errors were observed in older patients, those who underwent thrombectomy, and those with longer clinical summaries, highlighting areas for potential refinement in future iterations of the model.

Conclusions

The COPE framework represents a significant advancement in the application of AI for medical outcome prediction. By leveraging lightweight open-source LLMs, COPE achieves performance on par with proprietary models and robust traditional baselines, all without the need for model retraining or manual feature engineering. This positions COPE as a valuable, accurate, and privacy-preserving tool for extracting actionable insights from unstructured clinical text, aligning with the growing demand for AI solutions in healthcare.

For more detailed information, you can access the full study Here.

“`

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here