Objective To explore the predictive performance of machine learning models integrating clinico-pathological features and inflammatory markers for lymphovascular invasion (LVI) before gastric cancer surgery.
Methods A retrospective cohort of 193 gastric cancer patients from The First Hospital of Lanzhou University (training set) and 185 patients from Zhongshan Hospital, Fudan University (validation set) was included. Preoperative clinical pathological characteristics, tumor markers, and inflammatory markers were collected to identify independent risk factors for LVI. Six machine learning models were established in the training set. Model performance was evaluated using area under the receiver operating characteristic (ROC) curve (AUC), calibration curve, decision curve analysis (DCA), and Brier scores. Shapley additive explanations (SHAP) was applied for model interpretability.
Results The multivariate logistic regression showed increased tumor invasion depth (T-stage), lymph node metastases (N-stage), and the systemic immune-inflammation index (SII) were independent risk factors for gastric cancer LVI (P<0.05). Using these three indicators, 6 machine learning models were developed, all of which demonstrated favorable predictive performance, with 0.79 and 0.76 of minimum AUC values in the training set and the validation set, respectively. Among them, the light gradient boosting machine (LightGBM) model exhibited the best overall performance, achieving AUCs of 0.83 and 0.82 in the training set and the validation set, along with Brier scores of 0.163 and 0.187, respectively. Calibration and DCA curves further confirmed that the model possesses strong predictive accuracy and application value. SHAP analysis showed the feature importance in LightGBM model, identifying the N-stage as the top contributor, followed by the T-stage and the SII.
Conclusion The machine learning models incorporating clinical pathological features and inflammatory indicators can effectively predict LVI status in gastric cancer, with the LightGBM model demonstrating optimal performance.