基于可解释机器学习构建脑卒中患者日常生活自理能力风险预测模型
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

R743.3

基金项目:

国家自然科学基金(82104993)


Constructing a prediction model for stroke patients’activities of daily living risk based on interpretable machine learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的:利用机器学习算法预测影响脑卒中患者日常生活自理能力(activities of daily living,ADL)的风险因素,为其 ADL管理决策提供参考。方法:对2015年1月—2019年2月在南京医科大学附属第一医院康复医学中心治疗的423例脑卒中患者进行回顾性分析。根据Barthel指数(Barthel index,BI)评定量表,将患者分为ADL较好组(BI≥60分)和ADL较差组(BI<60分),并进行数据预处理。采用共线性诊断及最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)筛选特征变量。选择逻辑回归、支持向量机、随机森林(random forest,RF)、极限梯度提升及K最近邻5种机器学习算法进行预测建模,十倍交叉验证后,使用受试者工作特征曲线、受试者工作特征曲线下面积(area under curve,AUC)、精确召回率曲线、精确召回率曲线下的面积(area under the precision recall curve,PRAUC)、准确率、灵敏度、特异度分别对模型进行综合评估,引入 Shapley加性解释(Shapley additive explanation,SHAP)对最优机器学习模型进行可解释化处理。结果:经LASSO回归分析后,确定16个特征变量用于构建机器学习模型。RF模型具有最高的AUC(0.74)、PRAUC(0.64)、准确率(0.97)、灵敏度(0.75)和特异度(0.97)。SHAP 模型解释性分析显示,对 ADL 贡献度前 5 的特征中,Brunnstrom 分期(下肢)的影响最为显著,其次是 Brunnstrom分期(上肢)、D-二聚体、血清白蛋白水平及年龄。结论:RF模型预测脑卒中患者ADL的效能最优,为脑卒中患者 ADL管理决策提供了有价值的参考。

    Abstract:

    Objective:To utilize machine learning algorithms to predict risk factors affecting the activities of daily living(ADL)of stroke patients,providing references for their ADL management decisions. Methods:A retrospective analysis was conducted on 423 stroke patients treated at the Rehabilitation Medicine Center of the First Affiliated Hospital of Nanjing Medical University from January 2015 to February 2019. Patients were categorized into a better ADL group(BI ≥ 60 points)and a poorer ADL group(BI <60 points) based on the Barthel Index(BI)assessment scale,and data preprocessing was performed. Feature variables were selected using colinearity diagnostics and the least absolute shrinkage and selection operator(LASSO). Logistic regression(LR),support vector machine(SVM),random forest(RF),extreme gradient boosting(XGBoost),and K nearest neighbor(KNN)were selected as the five machine learning algorithms for predictive modeling. Afterten-fold cross-validation,the models were comprehensively evalutated using receiver operating characteristic(ROC)curves,area under aerue(AUC),precision recall(PR)curves,area under the precision recall curve(PRAUC),accuracy,sensitivity,and specificity. The Shapley additive interpretation(SHAP)was introduced to interpret the optimal machine learning model. Results:After LASSO regression analysis,16 feature variables were identified for constructing the machine learning model. The RF model demonstrated superior performance with the highest AUC(0.74),PRAUC(0.64),accuracy (0.97),sensitivity(0.75),and specificity(0.97). Interpretive analysis of the SHAP model revealed that among the top 5 features contributing to ADL,Brunnstrom stage(lower limb)exerted the most significant effect,followed by Brunnstrom stage(upper limb),D-dimer,serum albumin level,and age. Conclusion:The RF model emerged as the most effective in predicting ADL in stroke patient, providing valuable references for ADL management decisions in stroke patients.

    参考文献
    相似文献
    引证文献
引用本文

叶倩,杨云,徐文韬,刘玲玲.基于可解释机器学习构建脑卒中患者日常生活自理能力风险预测模型[J].南京医科大学学报(自然科学版),2024,(5):672-680

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-01-04
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-05-15
  • 出版日期:
通知关闭
郑重声明