Optimizing large language models for clinical information extraction: a benchmarking study in the context of ulcerative colitis research

Exploring Optimal Adaptation Strategies for Open-Source Large Language Models in Clinical Information Extraction

In recent years, the use of large language models (LLMs) has surged in the realm of clinical information extraction, offering promising advancements in processing complex medical data. Closed-source models such as the generative pre-trained Transformer 4o (GPT-4o) have been at the forefront, demonstrating powerful capabilities. However, concerns about cost, data security, and adaptability have spurred interest in open-source alternatives. This study delves into identifying effective adaptation strategies for open-source models and assesses their performance against closed-source counterparts.

Methods and Analysis

The study evaluated three primary LLM adjustment strategies: thought chain prompting, few-shot prompting, and fine-tuning. These strategies were employed to extract the Mayo Endoscopic Subscore (MES) from colonoscopy procedure reports, leveraging annotated datasets from the University of California, San Francisco (N=608) and San Francisco General Hospital (N=217). By applying these strategies in various combinations to six open-source models (ranging from 8 to 70 billion parameters), the research aimed to uncover the most effective approaches. A mixed-effects model was utilized to analyze the relationship between these strategies and several performance metrics, ensuring that variability between centers and LLMs was accounted for. For comparison, GPT-4o functioned as a closed-source oracle, offering insights into the cost-effectiveness of different options.

Results

The findings revealed that quantized low-rank adaptation (QLoRA) significantly enhanced the performance of open-source LLMs, with improvements ranging from 9.1 to 15.7 percentage points in precision, precision recall, and annotation authorization accuracy. Despite these gains, GPT-4o with prompt engineering outperformed the best open-source model by 4.9% to 11.2%. A basic cost-effectiveness analysis indicated that GPT-4o is more economical compared to the open-source alternatives.

Conclusion

While GPT-4o currently stands as the most efficient LLM for MES extraction, QLoRA-optimized open-source models offer a viable alternative when closed-source options are unavailable. However, the study highlights that even the most advanced instruction-following LLMs, including GPT-4o, fall short of fully adhering to user-provided instructions, pointing to a need for further refinement. Continued research is essential to achieve consistent, near-perfect performance in clinical information extraction using LLMs.

For those interested in a detailed exploration of this study, the full text is available Here.

“`

New Unicorn! Humanoid secures 133 million euros at a valuation of 1.1 billion euros to advance industrial robotics and physical AI

Valve’s price hike reportedly wipes out 82% of Steam Deck sales as it goes into “near-total collapse”

Exploring a spatial and scalable AI infrastructure system design

Splatoon Raiders review – Nintendo’s ink shooter goes gray

Optimizing large language models for clinical information extraction: a benchmarking study in the context of ulcerative colitis research

Exploring Optimal Adaptation Strategies for Open-Source Large Language Models in Clinical Information Extraction

Methods and Analysis

Results

Conclusion

New Unicorn! Humanoid secures 133 million euros at a valuation of 1.1 billion euros to advance industrial robotics and physical AI

Valve’s price hike reportedly wipes out 82% of Steam Deck sales as it goes into “near-total collapse”

Exploring a spatial and scalable AI infrastructure system design

Splatoon Raiders review – Nintendo’s ink shooter goes gray

Meta is testing an AI app for bedtime stories for people without imagination

Benchmarking large language models and clinicians using locally generated vignettes for primary healthcare in Kenya

Bristol Myers Squibb buys Nvidia AI system for drug discovery

DHS decouples the length of some visas from the length of the training program

US health authorities are testing OpenAI and Anthropic AI models

Weekly Wrap: Novant Health Launches New Virtual Primary Care Model; GW RhythmX launches Pulse on Microsoft Marketplace

LEAVE A REPLY Cancel reply

Useful Links

Latest News

Valve’s price hike reportedly wipes out 82% of Steam Deck sales as it goes into “near-total collapse”

Exploring a spatial and scalable AI infrastructure system design

Splatoon Raiders review – Nintendo’s ink shooter goes gray

Our Newsletter