基于MIMIC-Ⅳ数据库的机器学习模型对缺血性脑卒中危重患者院内再次转入重症监护病房的预测价值

张迪; 刘圆圆; 张键; 胡项俊

doi:10.12025/j.issn.1008-6358.2026.20260149

基于MIMIC-Ⅳ数据库的机器学习模型对缺血性脑卒中危重患者院内再次转入重症监护病房的预测价值

Development and validation of a machine learning model for predicting in-hospital recurrent intensive care unit admission in critically ill patients with ischemic stroke based on the MIMIC-Ⅳ database

摘要

摘要:
目的基于机器学习（machine learning, ML）算法，建立并验证缺血性脑卒中（ischemic stroke, IS）危重患者发生院内再次转入重症监护病房（intensive care unit, ICU）的预测模型。
方法从美国重症监护医学信息数据库Ⅳ（Medical Information Mart for Intensive Care Ⅳ, MIMIC-Ⅳ）中纳入2 929例IS患者的临床资料，采用最小绝对收缩与选择算子（least absolute shrinkageand selection operator, LASSO）回归确定预测因子，并使用合成少数类过采样技术（synthetic minority oversampling technique, SMOTE）形成包含2 583例患者的推导队列，按照8∶2的比例随机分为训练集（n=2 066）与测试集（n=517）。应用决策树、随机森林、自适应提升（adaptive boosting, AdaBoost）、梯度提升决策树（gradient boosting decision tree, GBDT）及支持向量机（support vector machine, SVM）5种ML算法建立预测模型。在训练集中，使用五折交叉验证评估模型性能。在测试集中，采用ROC曲线下面积（area under the curve, AUC）和决策曲线分析（decision curve analysis, DCA）对各模型进行评估与比较。采用Shapley加法解释方法对性能最优的模型进行解释。
结果共纳入2 929例患者，其中704例（24.0%）发生院内再次转入ICU。在5种ML模型中，随机森林模型的预测性能较好，其AUC值为0.839（95%CI 0.801～0.877）。特征重要性分析确定了影响模型预测最重要的5个特征，包括急性生理学评分Ⅲ（acute physiology score Ⅲ, APS Ⅲ）评分、白蛋白、年龄、心率及序贯器官衰竭评分。
结论基于ML算法的模型能有效预测IS危重患者院内再次转入ICU的风险，且随机森林模型展现出更优的预测性能，在早期临床风险分层与干预中具有良好的潜在应用价值。

Abstract:
Objective To develop and validate a prediction model for in-hospital recurrent intensive care unit (ICU) admission in critically ill patients with ischemic stroke (IS) based on machine learning (ML) algorithms.
Methods Clinical data from 2 929 IS patients were included from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Least absolute shrinkage and selection operator (LASSO) regression was used to identify predictive factors, and the synthetic minority over-sampling technique (SMOTE) was employed to create a derivation cohort comprising 2 583 patients. These patients were randomly divided into a training set (n=2 066) and a test set (n=517) at an 8:2 ratio. Five ML algorithms, including decision tree, random forest, adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), and support vector machine (SVM), were performed to construct prediction models. Five-fold cross-validation was used to evaluate the performance of the model in the training set. The area under the receiver operating characteristic curve (ROC-AUC) and decision curve analysis (DCA) were used to assess and compare the models in the testing set. The best-performing model was interpreted by shapley additive explanations (SHAP).
Results Among the 2 929 patients included, 704 (24.0%) experienced in-hospital recurrent ICU admission. Among the five ML models, the random forest model demonstrated the best predictive performance, with an AUC of 0.839 (95%CI 0.801–0.877). Feature importance analysis identified five most significant features affecting model prediction, including APS Ⅲ score, albumin, age, heart rate, and SOFA score.
Conclusions ML-based models can effectively predict the risk of in-hospital recurrent ICU admission in critically ill patients with IS. The random forest model showed superior predictive performance, which may have potential applications in early clinical risk stratification and intervention.

HTML全文

参考文献(24)

施引文献

资源附件(0)