HomeAI in HealthOptimizing large language models for clinical information extraction: a benchmarking study in...

Optimizing large language models for clinical information extraction: a benchmarking study in the context of ulcerative colitis research

Exploring Optimal Adaptation Strategies for Open-Source Large Language Models in Clinical Information Extraction

In recent years, the use of large language models (LLMs) has surged in the realm of clinical information extraction, offering promising advancements in processing complex medical data. Closed-source models such as the generative pre-trained Transformer 4o (GPT-4o) have been at the forefront, demonstrating powerful capabilities. However, concerns about cost, data security, and adaptability have spurred interest in open-source alternatives. This study delves into identifying effective adaptation strategies for open-source models and assesses their performance against closed-source counterparts.

Methods and Analysis

The study evaluated three primary LLM adjustment strategies: thought chain prompting, few-shot prompting, and fine-tuning. These strategies were employed to extract the Mayo Endoscopic Subscore (MES) from colonoscopy procedure reports, leveraging annotated datasets from the University of California, San Francisco (N=608) and San Francisco General Hospital (N=217). By applying these strategies in various combinations to six open-source models (ranging from 8 to 70 billion parameters), the research aimed to uncover the most effective approaches. A mixed-effects model was utilized to analyze the relationship between these strategies and several performance metrics, ensuring that variability between centers and LLMs was accounted for. For comparison, GPT-4o functioned as a closed-source oracle, offering insights into the cost-effectiveness of different options.

Results

The findings revealed that quantized low-rank adaptation (QLoRA) significantly enhanced the performance of open-source LLMs, with improvements ranging from 9.1 to 15.7 percentage points in precision, precision recall, and annotation authorization accuracy. Despite these gains, GPT-4o with prompt engineering outperformed the best open-source model by 4.9% to 11.2%. A basic cost-effectiveness analysis indicated that GPT-4o is more economical compared to the open-source alternatives.

Conclusion

While GPT-4o currently stands as the most efficient LLM for MES extraction, QLoRA-optimized open-source models offer a viable alternative when closed-source options are unavailable. However, the study highlights that even the most advanced instruction-following LLMs, including GPT-4o, fall short of fully adhering to user-provided instructions, pointing to a need for further refinement. Continued research is essential to achieve consistent, near-perfect performance in clinical information extraction using LLMs.

For those interested in a detailed exploration of this study, the full text is available Here.

“`

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here