Systematic Review of Machine Learning Models for Predicting Weaning and Extubation from Mechanical Ventilation
The integration of machine learning (ML) into healthcare has shown promising potential, particularly in critical care settings. This systematic review aims to evaluate the methodologies and reporting quality of studies utilizing ML to develop predictive models for weaning and extubation from invasive mechanical ventilation. By examining these studies, we can better understand the current landscape and identify areas for improvement.
Methods and Analysis
To ensure a comprehensive analysis, a protocol was registered with PROSPERO (CRD420250651389), and a detailed search strategy was developed for databases including MEDLINE (Ovid), Embase, and PubMed, covering studies from January 1, 2015, to February 19, 2025. The review focused on both prospective and retrospective studies that applied ML to predict weaning or extubation in both adult and pediatric populations. Preprints and studies evaluating non-invasive ventilation were excluded to maintain focus and relevance.
The search results were meticulously verified, and data were extracted using a standardized proforma. The methodological approaches of the included studies were evaluated using the TRIPOD+AI checklist, which provides a framework for transparent reporting of multivariable models for individual prognosis or diagnosis, augmented by artificial intelligence.
The risk of bias in the studies was assessed using the Risk Of Bias Assessment Tool predictive model and the Artificial Intelligence tool, with results presented descriptively through tables or diagrams.
Results
Out of 1245 identified studies, 40 met the inclusion criteria for the final review. A significant majority of these studies were retrospective (90%) and conducted at single centers. Alarmingly, 85% of the studies lacked external validation, a critical component for generalizing findings.
The ML architectures most frequently employed were Logistic Regression (50%), Random Forest (50%), and XGBoost (45%). However, there was considerable inconsistency in reporting data preprocessing, handling of missing data, and feature selection processes. Outcome definitions varied widely, with limited adherence to consensus criteria, and many studies failed to consider time series data, opting instead for averages or final values within a feature window.
While model discrimination was universally reported (100%), calibration (35%) and net benefit analysis (13%) were often overlooked. Interpretability was attempted using post hoc metrics like SHapley Additive ExPlanations (SHAP) in 43% of studies, although these metrics often did not align well with clinical reasoning. Clinical implementation was demonstrated in only 20% of the studies, with 83% classified as having a high risk of bias in at least one area.
Conclusion
This systematic review highlights significant methodological and reporting deficiencies within the body of research on ML models for predicting weaning and extubation from mechanical ventilation. Over 80% of the studies exhibited a high risk of bias in at least one area, undermining their reliability. Future research should prioritize prospective, multicenter data and ensure external validation of results. Adherence to TRIPOD+AI guidelines for design and performance reporting is crucial, as is the adoption of consensus-based criteria to facilitate study comparisons. Emphasizing architectures that leverage time series data, customizing interpretability to specific tasks, and involving clinician end-users in model development are essential steps forward.
For more detailed insights and data, the full review can be accessed Here.
“`

