Development and validation of an early diagnosis model for severe mycoplasma pneumonia in children based on interpretable machine learning

Xie, Si; Wu, Mo; Shang, Yu; Tuo, Wenbin; Wang, Jun; Cai, Qinzhen; Yuan, Chunhui; Yao, Cong; Xiang, Yun

doi:10.1186/s12931-025-03262-1

Research
Open access
Published: 13 May 2025

Development and validation of an early diagnosis model for severe mycoplasma pneumonia in children based on interpretable machine learning

Si Xie¹^na1,
Mo Wu¹^na1,
Yu Shang¹^na1,
Wenbin Tuo¹,
Jun Wang¹,
Qinzhen Cai¹,
Chunhui Yuan¹,
Cong Yao² &
…
Yun Xiang¹

Respiratory Research volume 26, Article number: 182 (2025) Cite this article

191 Accesses
Metrics details

Abstract

Background

Pneumonia is a major threat to the health of children, especially those under the age of five. Mycoplasma pneumoniae infection is a core cause of pediatric pneumonia, and the incidence of severe mycoplasma pneumoniae pneumonia (SMPP) has increased in recent years. Therefore, there is an urgent need to establish an early warning model for SMPP to improve the prognosis of pediatric pneumonia.

Methods

The study comprised 597 SMPP patients aged between 1 month and 18 years. Clinical data were selected through Lasso regression analysis, followed by the application of eight machine learning algorithms to develop early warning model. The accuracy of the model was assessed using validation and prospective cohort. To facilitate clinical assessment, the study simplified the indicators and constructed visualized simplified model. The clinical applicability of the model was evaluated by DCA and CIC curve.

Results

After variable selection, eight machine learning models were developed using age, sex and 21 serum indicators identified as predictive factors for SMPP. A Light Gradient Boosting Machine (LightGBM) model demonstrated strong performance, achieving AUC of 0.92 for prospective validation. The SHAP analysis was utilized to screen advantageous variables, which contains of serum S100A8/A9, tracheal computed tomography (CT), retinol-binding protein(RBP), platelet larger cell ratio(P-LCR) and CD4+CD25+Treg cell counts, for constructing a simplified model (SCRPT) to improve clinical applicability. The SCRPT diagnostic model exhibited favorable diagnostic efficacy (AUC > 0.8). Additionally, the study found that S100A8/A9 outperformed clinical inflammatory markers can also differentiate the severity of MPP.

Conclusions

The SCRPT model consisting of five dominant variables (S100A8/A9, CT, RBP, PLCR and Treg cell) screened based on eight machine learning is expected to be a tool for early diagnosis of SMPP. S100A8/A9 can also be used as a biomarker for validity differentiation of SMPP when medical conditions are limited.

Introduction

Respiratory infections pose a significant challenge to global public health [1]. According to World Health Organization (WHO),pneumonia causes approximately 740,000 childhood deaths annually, representing a leading cause of under-five mortality globally [2]. Among the causative agents for childhood pneumonia, Mycoplasma Pneumoniae (MP) distinguishes itself with its pathogenicity and epidemiological data in causing respiratory tract infections [3]. Severe mycoplasma pneumoniae pneumonia (SMPP) is a serious condition resulting from MP infection, characterized by a prolonged disease course, complex clinical manifestations, and a propensity for developing necrotizing pneumonia, atelectasis, and other pulmonary complications. Additionally, it can lead to a range of extrapulmonary symptoms, including myocardial and liver injury, which may be life-threatening [4]. In recent years, factors such as the wide prevalence of MP, its tendency to cause repeated infections [5], the increasing rate of drug resistance [6] and co-infections [7, 8] have led to a rise in the incidence of SMPP [9], which seriously threatens the health of children [10,11,12]. Therefore, it is necessary to pay attention to the diagnosis and treatment of SMPP at an early stage.

Presently, the diagnosis of SMPP predominantly relies on imaging techniques and clinical signs [9]. However, the clinical and radiological features of SMPP bear a strong resemblance to those of viral - induced infections and exhibit heterogeneous characteristics [13]. In other words, pneumonia presentations differ significantly across pediatric patients with varying [14]. This presents a formidable challenge to the prompt and precise diagnosis of SMPP in children and the implementation of appropriate treatment strategies [15]. In response to the diverse presentations of pneumonia, comprehensive diagnostic and treatment guidelines, such as the CRUB-65 and PSI scores, have been established for evaluating the severity of pneumonia in adults [16]. However, the assessment of pneumonia in children lacks an objective, quantitative, cost-effective, and convenient diagnostic framework. Traditional biomarkers, including white blood cell count (WBC), C-reactive protein (CRP), and procalcitonin (PCT), are inadequate for accurately distinguishing the severity of pulmonary infections in pediatric patients [17, 18]. Consequently, recent research has focused on identifying biomarkers specific to pneumonia diagnosis and developing evaluation systems tailored to children. Studies suggest that several assessment systems can enhance risk stratification for pediatric pulmonary diseases [19], such as the Pediatric Respiratory Emergency Severity Score (PRESS) [20], the Clinical Pulmonary Infection Score (CPIS) [21], and pro-adrenomedullin (Pro-ADM) [22]. However, these prediction models are not applicable to the assessment of SMPP due to their complexity, low sensitivity and lack of specificity for SMPP diagnosis. Currently, there is no systematic approach for risk classification of pediatric SMPP.

Compared with traditional scoring systems, machine learning (ML) models have demonstrated superior performance in predicting various diseases or clinical conditions [23, 24]. ML models are typically constructed based on large amounts of data recorded in Electronic Patient Record (EPR) systems. Their deep learning capabilities enable them to capture complex nonlinear relationships, and even previously unknown correlations in big data, allowing for more in-depth mining of clinical data [25]. They also show great potential in clinical settings where large amounts of data are collected and integrated daily [26]. Recently, Yang and his colleagues developed a model using ML algorithms to accurately identify severe community-acquired pneumonia (CAP) in adults [27]. In addition, ML has been employed to differentiate pathogens in pediatric CAP [28]. ML has also been used to develop pneumonia-related prognostic models to predict mortality risk and complications, including acute respiratory distress syndrome (ARDS) [29]. Unfortunately, to date, no ML model has been developed for predicting SMPP in children.

The objective of this study is to develop a robust assessment model for the severity of MPP in pediatric patients using machine learning algorithms. Such a model aims to provide healthcare professionals with a valuable tool to design personalized treatment plans for different medical situations, thus optimizing individualized therapeutic strategies.

Methodology

Research design

The research design for this study, as illustrated in Figure 1, encompasses a four-step process: development, internal validation, prospective validation, and interpretation. The initial phase involves the creation of predictive models using a training cohort, which accounts for 78% of the derivation dataset. Subsequently, the remaining of the derivation dataset serves for internal validation. Additionally, an independent dataset is utilized for the prospective validation cohort. Furthermore, the Shapley Additive explanations (SHAP) algorithm was employed to decipher the importance of individual features within the predictive model and to identify non-linear relationships among risk predictors.

Study subjects

This was a prospective, open-label, non-blinded observational study on 859 hospitalized patients with CAP who were admitted to the Wuhan Children’s Hospital, Tongji Medical College, Huazhong University of Science & Technology from January 2023 to July 2024. Patients were included if they fulfilled the following criteria: age between 1 month to 18 years, presence of fever and respiratory symptoms, and having at least one abnormality in the physical examination or chest radiographs according to the guidelines for CAP in children (Commission, 2019) [30]. In accordance with the mycoplasma pneumoniae pneumonia laboratory diagnostic consensus [31], we further selected 597 children with MPP as the research subjects based on the etiological test results. Under the direction of clinical physicians, the 597 patients were categorized into mild cases (321 patients) and severe cases (276 patients) according to the guidelines for MPP in pediatrics [32]. Among these children, the derivation group comprised a total of 537 participants, while the prospective cohort consisted of 60 individuals. Exclusion criteria include: individuals presenting with immunodeficiency disorders, pulmonary chronicles, cardiovascular conditions, chronic glomerulonephritis, rheumatic ailments, nutritional deficiencies, diabetes mellitus, and other inherited metabolic disorders. Patients who were co-infected with other pathogens and those previously undergone pulmonary surgical interventions were also removed. Moreover, children were excluded if their parents or guardians did not provide proxy consent or if data were missing. The process of patient selection was shown in Supplementary Figure 1.

The derivation cohort and prospective validation cohort followed the same inclusion and exclusion criteria. The prospective cohort consisted of patients who presented at the respiratory medicine department of Wuhan Children’s Hospital, Tongji Medical College, Huazhong University of Science & Technology from November 2023 to January 2024. These patients were initially diagnosed with mycoplasma pneumoniae infection in the last three months. Subsequently, patients were subjected to follow-up monitoring at least two weeks. The clinical characteristics of the 60 patients included in the prospective validation cohort are outlined in Supplementary Table S1.

Written informed consent was obtained from all parents. Ethics approval was reviewed and provided by the Medical Ethical Committee of the Wuhan Children’s Hospital, Huazhong University of Science and Technology (2022R048-E01).

Data collection and Laboratory measurements

Demographic and clinical data were collected from electronic medical record system and laboratory management system. The following clinical data were included: (1) clinical characteristics: age, gender, clinical symptoms and hospitalization duration; (2) representative biomarkers with immunomodulatory and inflammatory effects: blood routine examination such as white blood cell (WBC) count; indicators of inflammation severity for instance systemic immunoinflammatory index, inflammatory factors and cytokines; biomarkers of host immunoregulatory function like complement C3; (3) laboratory indicators associated with complications: hepatic enzymes, renal function tests, cardiac biomarkers, coagulation and electrolyte balance status; (4) imaging results. Fasting venous blood was collected within 24 h after admission for blood analysis. Chest X-ray or chest CT was performed 3 days before or within 2 days after admission, and the results were recorded.

Complete blood count, biochemical and immunological related markers were collected from the children recruited upon admission. MP infection was determined by tNGS of the patient's pharyngeal swab/bronchoalveolar lavage fluid or detection of mycoplasma pneumoniae nucleic acid RNA. The laboratory-related information in this study, including testing equipment and some items, can be found in the supplementary materials.

Statistical analysis

To improve data quality and ensure accuracy, consistency, and availability, we cleaned and standardized the collected raw medical data: (1) Data inspection and cleaning: After summarizing and sorting the raw data, the display formats of values, time, date, and full half-angle were integrated for consistency. (2) Data normalization: The four elements of specimen type, test item name, test result unit, and test reference value were calibrated and normalized. The methods for screening variables from the dataset were detailed in the supplementary materials.

All statistical analyses in this study were conducted using R studio (version 4.3.2), GraphPad Prism (version 8.0), and Python (version 3.7). Baseline data analysis of patients began with normality tests on the quantitative data. Normally categorical variables are presented as counts(n, %), while continuous variables are expressed as medians and interquartile ranges (P25, P75). Comparative analyses of the distribution differences in clinical metrics between cohorts were performed using the Mann - Whitney U test and chi - square test. ROC curve analysis evaluated the diagnostic performance, using the optimal Youden index method to determine the cut-off values for each indicator. DeLong test was used for the statistical evaluation of the area under the curve (AUC). Visualizations such as histograms, scatter plots, and receiver operating characteristic (ROC) curves were created with GraphPad Prism, while probability density plots, heatmaps, calibration plots, decision curve analysis (DCA), and clinical impact curve (CIC) diagrams were generated using R packages like"ggplot2"and"Complex Heatmap". SHAP visualizations were built with Python. Variable selection in the model was assisted by Lasso (Least absolute shrinkage and selection operator) regression. A two-tailed P-value of less than 0.05 was considered statistically significant.

Model derivation and validation

The ML models were developed using Python and R studio. During the model development process, multiple algorithms were used for model construction, including Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (Xgboost), Logistic Regression (Logistic), Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT) and Naïve Bayes(NB). The comparative effectiveness of these models was measured by the Kappa statistic and F1 score to identify the best model. After identifying the optimal model, its performance on the validation dataset was assessed in terms of sensitivity and specificity. The efficacy of the eight machine learning algorithms was assessed across the internal validation and prospective validation cohorts. Then, the SHAP framework was applied to interpret the model and quantify the contribution of individual predictors. Predictors with significant contributions were selected to build a multivariate logistic regression model, which was subsequently evaluated for performance effectiveness. Additionally, the CIC and DCA curves were employed to assess the performance of various models, while Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) metrics are utilized to evaluate the impact of incorporating specific metrics on model efficacy.

Results

Demographic characteristics and clinical information

This study divided the derivation cohort (comprising 537 cases) into a training set (419 cases) and a validation set (118 cases). Compared the different sets with the prospective cohort, the results are presented no significant difference in gender and ages among the groups (P > 0.05) (Supplementary Tables S2). There was also no noted variation between the sets on some clinical indicators (Supplementary Table S1).

Variable selection

In order to discern the differential markers among pediatric patients with varying severities of MPP, the derivation cohort was stratified into mild and severe categories based on the established criteria (31) and assessed the expressional disparities among 70 clinical indicators collected. As shown in Table 1, 35 of the 70 analyzed indicators showed significant differences between mild MPP and SMPP. Clinically, severe cases exhibited distinct differences in fever and cough duration before hospital admission compared to mild cases. Peak body temperatures during the disease course also differed significantly between mild and severe MPP. For pulmonary imaging assessment, two radiologists collaboratively evaluated and scored the imaging findings. The results showed that the severe pulmonary imaging findings were mostly unilateral or bilateral lobar consolidation shadows, which were significantly different from the mild pulmonary lesions. Upon further analysis of the serological markers in the two groups of pediatric patients, significant differences were observed in serological markers related to immune and inflammatory regulation. These markers encompassed platelet count (PLT), mean platelet volume (MPV), platelet large cell ratio (P - LCR), neutrophil - to - lymphocyte ratio (NLR), C-reactive protein (CRP), procalcitonin, ferritin, calprotectin, interleukin - 6 (IL - 6), interleukin - 10 (IL - 10), interferon - γ (IFN - γ), the count of CD⁸⁺ T cells, CD¹⁶⁺CD⁵⁶⁺ cell (natural killer (NK) cells) and CD⁴⁺CD²⁵⁺Treg cell count. Meanwhile, to investigate whether mycoplasma infection causes damage to the functions of other organs, the enzymatic indices related to liver and kidney damage were collected, and their expression differences in different infection severities were compared. The results showed that serum alkaline phosphatase (ALP), prealbumin (PA), total protein (TP), albumin (ALB), uric acid (UA), cystatin C (Cys-C), retinol-binding protein (RBP), sodium (Na⁺), calcium (Ca²⁺), international normalized ratio of prothrombin time (PT-INR), D-dimer, activated partial thromboplastin time (APTT) and antithrombin (AT) in the severe group were significantly different from those in the mild group (P < 0.05).

Table 1 The clinical and laboratory characteristics of the mild and severe group of children with MPP

Full size table

Lasso regression analysis was applied to the 35 pre-selected variables for confounding adjustment. Combining the changes in the lambda value and the number of included variables under ten-fold cross-validation, the number of variables varies between 8 and 27 under the two lambda value selection modes (min/1 se) was shown in Figure 2A. Further comparisons of the classification performance within the training set revealed that there was no significant difference in the discriminative effect of the included variables regardless of whether the lambda value was set to min or 1 se (P > 0.05). Since a larger number of variables are included under the min value, which can increase the accuracy of the model as much as possible, as shown in Table 2, the results indicate that a total of 23 variables were incorporated into the model construction. These variables were: gender, age, S100 A8/A9, CT, X-ray, fever duration, cough duration, peak body temperature, PCT_per, P-LCR, TP, PA, UA, Cys-C, RBP, Ca²⁺, D-dimer, APTT, ferritin, hs-CRP, CD³⁺CD⁸⁺T cell counts, CD⁴⁺CD²⁵⁺ Treg counts, NLR. To ensure the stability of the model and to eliminate the impact of inter-variable correlations on the results, a correlation analysis was conducted on 16 continuous variables (Figure 2B).

Table 2 Variable screening in the lasso regression analysis of variables to distinguish SMPP

Full size table

Construction of the SMPP risk prediction model

Using machine learning algorithms to construct SMPP Risk Prediction Model

In the model training, a positive class represented the presence of SMPP, while a negative was mild MPP. After variable screening, the input data for the training model included 23 indicators selected by Lasso regression. Utilizing these 23 indicators, we developed eight different machine learning models, including LightGBM, Xgboost, Logistic model, RF, KNN, SVM, DT and NB model (Table 3). The study results indicated that compared to other machine learning algorithms, the LightGBM model demonstrated higher AUC values in both the training set and the internal validation set (Figure 3A, B). Further examination of the data in the internal validation cohort found that the accuracies of LightGBM, logistic, RF, Xgboost and SVM were all relatively perfect (AUC > 0.90) (Table 4, Figure 3B). The predictive value of each model was assessed using the F1 Score and Kappa statistic. By integrating the F1 scores and Kappa values of the various machine learning models in both the training and validation sets, as well as the difference and decrease in values, combined with the interpretability of the model (sensitivity and precision), LightGBM was comprehensively evaluated as the best performer, followed by random forest (Figure 3C).

Table 3 Diagnostic performance of each model for SMPP in training cohort

Full size table

Table 4 Diagnostic performance of each model for SMPP in internal validation cohort

Full size table

Contribution degree of indicators

The contribution degrees of the corresponding indicators in the LightGBM and RF models were displayed using the SHAP value summary visualization diagrams, as shown in Figure 3D and E. The top 8 indicators primarily included S100 A8/A9, RBP, CT, hs-CRP, PLCR, APTT, NLR, and CD⁴⁺CD²⁵⁺ Treg, with S100 A8/A9 having a significantly higher contribution than the other seven indicators (Supplementary Figure 2A, B). Additionally, by comparing the degrees of dispersion of the indicators in the two models, it could be found that the distribution of each indicator in the LightGBM was more concentrated than that in the RF model, indicating that the influence of this feature in different samples was relatively stable. To further demonstrate the specific contribution values of each feature to predicting mild and severe cases, SHAP force plots were drawn to intuitively analyze the detailed features of individual samples (Supplementary Figure 2C, D). Meanwhile, the study analyzed the diagnostic performance of the LightGBM model in a prospective validation set. The results indicated that the LightGBM model, derived from both the training and internal validation sets, showed a decrease in sensitivity and specificity when its performance was assessed in the prospective validation cohort. However, the positive predictive value approached 85%, the negative predictive value was greater than 95%, and the Kappa value was greater than 0.8, suggesting that the model contains perfect practical value (Table 5).

Table 5 Diagnostic performance of LightGBM model for SMPP in Prospective validation cohort

Full size table

Simple model construction

To make clinical assessment more convenient, the study simplified the indicators included in the model. By utilizing a Venn diagram to consolidate the top 20 indicators with high contribution scores from both the LightGBM and RF algorithms (Supplementary Figure 2 A,B), the study identified that 18 of these indicators overlapped (Figure 3F).

The importance of these 18 overlapping indicators was further ranked. According to the screening criterion of mean (SHAP value) > 0.025, the top 5 indicators in terms of importance ranking were selected, which were S100 A8/A9, RBP, CT, PLCR and Treg cell count respectively. Logistic regression analysis was used next to fit these 5 indicators to construct a simple model, and named this model the SCRPT model. The diagnostic values of this model in the training set, internal validation set and prospective validation set were analyzed respectively. As shown in Table 6, the SCRPT model demonstrated superior diagnostic performance, with AUC values all exceeding 80%.in all cases.

Table 6 Diagnostic efficacy of SCRPT model in different sets

Full size table

Consequently, we created an online computing platform based on the optimal SCRPT model (https://www.evidencio.com/models/show/10603?v=2.0). This platform allows doctors and patients to perform calculations online.

Association between the serum S100 A8/A9 levels in children with SMPP

During the model construction process, the contribution of indicators was analyzed, revealing that S100 A8/A9 outperforms other indicators in guiding the assessment of severity in both machine learning algorithms and the construction of the simplified model. Consequently, further analysis was carried out on this indicator and the results were shown in Figure 4. In the 597 samples composed of the source cohort and the prospective cohort, serum S100 A8/A9 demonstrated no significant differences in its distribution across age and gender (Figure 4A). The study compared the expression levels of serum S100 A8/A9 in patients with mild and severe MPP, as shown in Figure 4B, the level in severe patients was significantly higher than that in mild patients. More importantly, as an inflammatory indicator, the study explored the differences in clinical diagnostic efficacy between S100 A8/A9 and commonly used clinical inflammatory indicators. As shown in Figure 4C, the diagnostic efficacy of S100 A8/A9 (AUC = 0.889) was superior to that of CRP (AUC = 0.746), Ferritin (AUC = 0.634) and PCT (AUC = 0.618). For the purpose of investigating the diagnostic capability of S100 A8/A9 as a single indicator for SMPP, the study further adopted the DeLong analysis to evaluate the prediction effects of the SCRPT model and the S100 A8/A9 prediction model in three datasets. The results presented in Supplementary Table 3 indicated that, compared with the SCRPT model, the efficacy of the S100 A8/A9 model was significantly lower in both the training set and the test set. However, there was no statistically significant difference between the two models in the prospective validation set (P > 0.05). Meanwhile, in the DCA curve, the net benefit (true positive rate - false positive rate) of the single S100 A8/A9 model was above 0.1 (Figure 5A), suggesting that S100 A8/A9 alone has good clinical application value for predicting disease severity.

Discussion

The proportion of severe Mycoplasma pneumoniae pneumonia (MPP) has increased after the pandemic, imposing a heavier social burden [33]. Based on eight machine learning methods, this study identified five indicators—S100 A8/A9, CT, RBP, PLCR, and Treg cell—that are significantly associated with SMPP and constructed a simplified model. It exhibited favorable diagnostic efficacy in the training set, validation set, and prospective cohort (AUC > 80%). Additionally, we explored the ability of S100 A8/A9 to differentiate the severity of MPP and assessed its diagnostic efficacy, and the results showed that it outperformed commonly used clinical severity indicators such as CRP and PCT in the diagnosis of severe MPP. This provides multiple options for the early and accurate diagnosis of SMPP.

SMPP is a life-threatening pulmonary infectious disorder that not only compromises pulmonary parenchymal function but also frequently induces extrapulmonary manifestations, such as cutaneous rashes [34, 35]. SMPP precipitates immune dysregulation, significantly increasing susceptibility to polymicrobial infections [8], thereby complicating treatment regimens [36]. Consequently, early detection of SMPP is crucial. In clinical practice, physicians often rely on changes in physical signs to assess the condition. While this approach is practical, it is subjective and lacks systematic quantitative standards [37]. Our review of existing clinical literature reveals a deficiency in judgment models for pediatric SMPP. The present study addresses this gap. Compared to previous studies, our model demonstrates superior diagnostic performance (AUC = 0.889) over the adult-modified pediatric pneumonia score (CPIS) [38], and its sensitivity surpasses that of the PRESS score, which is based on the oxygen saturation index [20]. Furthermore, our model is specifically designed for pediatric MPP and demonstrates higher specificity than recent ML-based pneumonia assessment models [27, 28].

In recent years, numerous studies have focused on elucidating the pathogenesis of MPP to identify novel biomarkers. Elevated levels of Pro-adrenomedullin (Pro-ADM) [22] and interleukin-18 (IL-18) [39] have been observed in the body fluids of patients with MPP. Compared to traditional inflammatory markers, these indicators have demonstrated significant advancements in enhancing diagnostic specificity. However, challenges such as the specialized nature of specimen collection, the lack of specificity for pediatric populations, and the limited diagnostic efficacy of single indicators have prompted the exploration of conventional and readily accessible biomarkers. This study aims to identify clinically relevant laboratory indicators for SMPP in pediatric patients. Previous research has shown that Mycoplasma pneumoniae can induce the release of inflammatory mediators, thereby triggering a cascade of host immune responses [40]. Consequently, we conducted a comprehensive analysis of 23 clinical and serological parameters associated with SMPP. These parameters include circulating inflammatory biomarkers [41], enzymatic markers for assessing liver and kidney function integrity [35, 42], indices of physiological status, variables reflecting the body's internal milieu [43], and markers indicative of host immune status [44]. Alterations in these indicators contribute to the clinical manifestations of severe pneumonia with multiple organ dysfunction syndrome (MODS), including symptoms such as fever, inflammation, hepatic and renal impairment, myocarditis, and vasculitis. Utilizing these 23 indicators, the study employed eight advanced ML algorithms to develop a diagnostic model for SMPP. Among these algorithms, the LightGBM algorithm demonstrated superior performance, attributed to its exceptional data-processing capabilities, such as efficient feature selection, rapid handling of large-scale datasets, and robust generalization ability [45]. These characteristics make LightGBM particularly suitable for multistage disease classification tasks. This finding lays the groundwork for future research, especially in optimizing ML models to improve the diagnostic accuracy of severe pneumonia.

To enhance the model's clinical utility, we identified five key predictive variables consistently selected across multiple ML algorithms: S100 A8/A9 levels, chest CT findings, RBP, PLCR and Treg cells. These are closely associated with severe pneumonia or inflammation. Our earlier research confirmed S100 A8/A9 was link to the severity of childhood CAP [46]. Studies also showed that serum RBP4 can activate the NLRP3 inflammasome in macrophages, releasing abundant inflammatory factors and triggering inflammation [47]. Reports noted reduced serum RBP levels in COVID-19 patients [48], consistent with our analysis. Moreover,MPP is associated with a hypercoagulable state, and observed platelet ratio alterations may correlate with systemic inflammatory severity [49]. Research revealed that pulmonary Treg cells can affect lung infection severity by regulating the pulmonary γδT17–neutrophil axis [50]. Based on these 5 variables, the study constructed a visual and convenient SCRPT model. These studies bolster the credibility and clinical significance of the SCRPT model's indicators.

DCA and CIC curves were utilized to further assess the clinical applicability of the model. Compared with diagnostic models based solely on S100A8/A9 or the SCRPT model excluding S100A8/A9, the complete SCRPT model demonstrated the highest predictive contribution. When the proportion of severe cases in the examined population was less than 60%, the benefit of the SCRPT model was significantly higher than the model without S100A8/A9 and the single S100 A8/A9 model. When the threshold was greater than 0.6, the benefits of the single S100 A8/A9 model were higher than SCRPT model without S100A8/A9, but still slightly lower than the SCRPT model. In order to further explore the ability of the SCRPT model to discriminate SMPP in clinical diagnosis and treatment, CIC curve was plotted. The results showed that when the model threshold was around 0.4, the number of people with SMPP assessed by the model was close to the actual number of people with MPP, with a clinical decision cost of 0.429 and a clinical decision benefit of 0.571. When the threshold was 0.8, the assessment ability of the SCRPT model was basically consistent with the occurrence probability in clinical practice (Figure 5B). The findings unequivocally demonstrate that the SCRPT model functions as a rapid, convenient, and highly accurate tool for distinguishing severe pneumonia, with considerable clinical application value. By simplifying the model, we aim to diminish dependence on complex imaging data. The study also aims to enhance the model's operability and scalability, thereby facilitating its application in diverse clinical settings.

In the simplified model, S100 A8/A9 contributed more than other indicators. This finding may also contribute to exploring the mechanisms underlying the onset of pneumonia or even MPP. Reports show S100 A8/A9 regulates inflammatory cell activation and migration [51], and boosts inflammatory mediator release [52], worsening the inflammatory response [53]. As key factors in the initiation and progression of inflammation [54], have been reported to be involved in MPP [55]. This study also revealed that its expression in SMPP was higher than that in mild MPP (P < 0.05), and its diagnostic efficacy was better than that of commonly used clinical inflammatory markers. To further explore the potential of S100 A8/A9 as an independent biomarker for diagnosing SMPP, this study compared its diagnostic efficacy with improvement in diagnostic performance achieved tthat of the SCRPT model. AS shown in Supplementary Table S3, although the AUC of S100 A8/A9 in the training set and the validation set was lower than that of the SCRPT model (P < 0.05), the area under the curve was still greater than 0.8, and there was no difference in the diagnostic efficacy between the two in the prospective cohort (P > 0.05). In order to more comprehensively evaluate the improvement in diagnostic performance achieved through the incorporation of variables, a comparative analysis was conducted between the models. As seen in Supplementary Table S4, in the training set, the diagnostic efficacy of the SCRPT model was significantly better than that of the SCRPT model without S100 A8/A9 (both NRI and IDI were positive and P < 0.05), indicating a significant improvement in model performance after the introduction of S100 A8/A9. In both the internal and prospective validation sets, no statistically significant difference in diagnostic efficacy was observed between the SCRPT model and the model without S100A8/A9. However, both NRI and IDI were positive, indicating a moderate enhancement in diagnostic efficacy attributable to S100 A8/A9. This finding suggests that S100 A8/A9 has potential utility in diagnosing SMPP. The model's ability to predict severe risk using the single marker S100 A8/A9 is particularly advantageous in primary healthcare settings with limited medical technology. Compared to traditional diagnostic methods that rely on serial imaging or subjective clinical evaluations, the SCRPT model offers greater feasibility across various clinical environments. In well-resourced hospitals, the SCRPT model improves diagnostic precision, whereas in primary care settings, the use of S100 A8/A9—easily obtained through routine blood tests—serves as a practical alternative to complex tests, effectively addressing infrastructural constraints. Additionally, the online risk calculator facilitates real-time decision-making, even in remote areas, thereby bridging technological disparities.

This study has limitations. First, as a single - center retrospective cohort study using a prospectively - collected dataset, it carries inherent design - related biases. Future research should involve multicenter cohorts to validate the current findings. Second, due to the small sample size, 10 stratification variables were used. A large number of stratification variables with a small sample size may lead to optimistic results in internal model validation. Thus, subsequent studies must increase the sample size to enhance model accuracy.

In summary, this study innovatively combines various data and applies machine - learning algorithms to develop a model, eventually establishing the SCRPT online calculation platform. This platform is a reliable tool for early screening of SMPP in children. Additionally, the study has identified serum S100 A8/A9 as a biomarker, which can be used for the early diagnosis of SMPP in resource - limited medical settings.

Data availability

The data supporting the conclusions of the current study are available from the corresponding author on reasonable request.

Abbreviations

AUC:: Area under the curve
CIC:: Clinical impact curve
CAP:: Community-acquired pneumonia
CRP:: C-reactive protein
CT:: Computed tomography
DCA:: Decision curve analysis
LightGBM:: Light Gradient Boosting Machine
MPP:: Mycoplasma pneumoniae pneumonia
ML:: Machine learning
PLT:: Platelet count
P – LCR:: Platelet large cell ratio
PCT:: Procalcitonin
ROC:: Receiver operating characteristic
RF:: Random forest
RBP:: Retinol-binding protein
SMPP:: Severe mycoplasma pneumoniae pneumonia
SHAP:: Shapley additive explanations
WBC:: White blood cell count

References

Torres A, Cilloniz C, Niederman MS, Menendez R, Chalmers JD, Wunderink RG, et al. Pneumonia Nat Rev Dis Primers. 2021;7(1):25.
Article PubMed Google Scholar
Organization WH. Pneumonia in children. 2022.
Li ZJ, Zhang HY, Ren LL, Lu QB, Ren X, Zhang CH, et al. Etiological and epidemiological features of acute respiratory infections in China. Nat Commun. 2021;12(1):5026.
Article PubMed PubMed Central CAS Google Scholar
Waites KB, Xiao L, Liu Y, Balish MF, Atkinson TP. Mycoplasma pneumoniae from the Respiratory Tract and Beyond. Clin Microbiol Rev. 2017;30(3):747–809.
Article PubMed PubMed Central CAS Google Scholar
Kutty PK, Jain S, Taylor TH, Bramley AM, Diaz MH, Ampofo K, et al. Mycoplasma pneumoniae Among Children Hospitalized With Community-acquired Pneumonia. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2019;68(1):5–12.
Article PubMed CAS Google Scholar
Li H, Li S, Yang H, Chen Z, Zhou Z. Resurgence of Mycoplasma pneumonia by macrolide-resistant epidemic clones in China. The Lancet Microbe. 2024;5(6): e515.
Article PubMed Google Scholar
Xu Y, Yang C, Sun P, Zeng F, Wang Q, Wu J, et al. Epidemic features and megagenomic analysis of childhood Mycoplasma pneumoniae post COVID-19 pandemic: a 6-year study in southern China. Emerging microbes & infections. 2024;13(1):2353298.
Article Google Scholar
Koenen MH, de Groot RCA, de SteenhuijsenPiters WAA, Chu M, Arp K, Hasrat R, et al. Mycoplasma pneumoniae carriage in children with recurrent respiratory tract infections is associated with a less diverse and altered microbiota. EBioMedicine. 2023;98: 104868.
Article PubMed PubMed Central CAS Google Scholar
Kant R, Kumar N, Malik YS, Everett D, Saluja D, Launey T, et al. Critical insights from recent outbreaks of Mycoplasma pneumoniae: decoding the challenges and effective interventions strategies. International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases. 2024;147: 107200.
PubMed Google Scholar
Dungu KHS, Holm M, Hartling U, Jensen LH, Nielsen AB, Schmidt LS, et al. Mycoplasma pneumoniae incidence, phenotype, and severity in children and adolescents in Denmark before, during, and after the COVID-19 pandemic: a nationwide multicentre population-based cohort study. The Lancet regional health Europe. 2024;47: 101103.
Article PubMed PubMed Central Google Scholar
You J, Zhang L, Chen W, Wu Q, Zhang D, Luo Z, et al. Epidemiological characteristics of mycoplasma pneumoniae in hospitalized children before, during, and after COVID-19 pandemic restrictions in Chongqing, China. Frontiers in cellular and infection microbiology. 2024;14:1424554.
Article PubMed PubMed Central Google Scholar
Shah SS. Mycoplasma pneumoniae as a Cause of Community-Acquired Pneumonia in Children. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2019;68(1):13–4.
Article PubMed Google Scholar
Gao L, Sun Y. Laboratory diagnosis and treatment of Mycoplasma pneumoniae infection in children: a review. Annals of medicine. 2024;56(1):2386636.
Article PubMed PubMed Central Google Scholar
Zhao MC, Wang L, Qiu FZ, Zhao L, Guo WW, Yang S, et al. Impact and clinical profiles of Mycoplasma pneumoniae co-detection in childhood community-acquired pneumonia. BMC infectious diseases. 2019;19(1):835.
Article PubMed PubMed Central Google Scholar
Dean P, Schumacher D, Florin TA. Defining Pneumonia Severity in Children: A Delphi Study. Pediatr Emerg Care. 2021;37(12):e1482–90.
Article PubMed PubMed Central Google Scholar
Metlay JP, Waterer GW, Long AC, Anzueto A, Brozek J, Crothers K, et al. Diagnosis and Treatment of Adults with Community-acquired Pneumonia. An Official Clinical Practice Guideline of the American Thoracic Society and Infectious Diseases Society of America. American journal of respiratory and critical care medicine. 2019;200(7):e45-e67.
Meyer Sauteur PM, Krautter S, Ambroggio L, Seiler M, Paioni P, Relly C, et al. Improved Diagnostics Help to Identify Clinical Features and Biomarkers That Predict Mycoplasma pneumoniae Community-acquired Pneumonia in Children. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2020;71(7):1645–54.
Article PubMed Google Scholar
Florin TA, Ambroggio L, Brokamp C, Zhang Y, Rattan M, Crotty E, et al. Biomarkers and Disease Severity in Children With Community-Acquired Pneumonia. Pediatrics. 2020;145(6).
Rosman SL, Karangwa V, Law M, Monuteaux MC, Briscoe CD, McCall N. Provisional Validation of a Pediatric Early Warning Score for Resource-Limited Settings. Pediatrics. 2019;143(5).
Saleh NY, Ibrahem RAL, Saleh AAH, Soliman SES, Mahmoud AAS. Surfactant protein D: a predictor for severity of community-acquired pneumonia in children. Pediatric research. 2022;91(3):665–71.
Article PubMed CAS Google Scholar
Zilberberg MD, Shorr AF. Ventilator-associated pneumonia: the clinical pulmonary infection score as a surrogate for diagnostics and outcome. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2010;51(Suppl 1):S131-5.
Article PubMed Google Scholar
Florin TA, Ambroggio L, Brokamp C, Zhang Y, Nylen ES, Rattan M, et al. Pro-adrenomedullin Predicts Severe Disease in Children With Suspected Community-acquired Pneumonia. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2021;73(3):e524–30.
Article PubMed CAS Google Scholar
Nishi H, Oishi N, Ishii A, Ono I, Ogura T, Sunohara T, et al. Predicting Clinical Outcomes of Large Vessel Occlusion Before Mechanical Thrombectomy Using Machine Learning. Stroke. 2019;50(9):2379–88.
Article PubMed Google Scholar
Macesic N, Bear Don't Walk OI, Pe'er I, Tatonetti NP, Peleg AY, Uhlemann AC. Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data. mSystems. 2020;5(3).
Artzi NS, Shilo S, Hadar E, Rossman H, Barbash-Hazan S, Ben-Haroush A, et al. Prediction of gestational diabetes based on nationwide electronic health records. Nature medicine. 2020;26(1):71–6.
Article PubMed CAS Google Scholar
Li X, Wu M, Sun C, Zhao Z, Wang F, Zheng X, et al. Using machine learning to predict stroke-associated pneumonia in Chinese acute ischaemic stroke patients. European journal of neurology. 2020;27(8):1656–63.
Article PubMed CAS Google Scholar
Yang T, Zhang L, Sun S, Yao X, Wang L, Ge Y. Identifying severe community-acquired pneumonia using radiomics and clinical data: a machine learning approach. Scientific reports. 2024;14(1):21884.
Article PubMed PubMed Central CAS Google Scholar
Chang TH, Liu YC, Lin SR, Chiu PH, Chou CC, Chang LY, et al. Clinical characteristics of hospitalized children with community-acquired pneumonia and respiratory infections: Using machine learning approaches to support pathogen prediction at admission. Journal of microbiology, immunology, and infection = Wei mian yu gan ran za zhi. 2023;56(4):772-81.
Cilloniz C, Ward L, Mogensen ML, Pericàs JM, Méndez R, Gabarrús A, et al. Machine-Learning Model for Mortality Prediction in Patients With Community-Acquired Pneumonia: Development and Validation Study. Chest. 2023;163(1):77–88.
Article PubMed Google Scholar
Commission CNH. Guideline for diagnosis and treatment ofcommunity-acquired pneumonia in children (2019 version). Chin J Clin Infect Dis. 2019;12:6–13.
Google Scholar
Expert Committee on Rational Use of Medicines for Children Pharmaceutical Group NH, Family Planning C. [Expert consensus on laboratory diagnostics and clinical practice of Mycoplasma pneumoniae infection in children in China (2019)]. Zhonghua Er Ke Za Zhi. 2020;58(5):366-73.
China NHCotPsRo. Guidelines for the diagnosis and treatment of Mycoplasma pneumoniae pneumonia in children(2023 edition). International Journal of Epidemiology and Infectious Disease. 2023(2023,36(04)):291-7.
Wu Q, Pan X, Han D, Ma Z, Zhang H. New Insights into the Epidemiological Characteristics of Mycoplasma pneumoniae Infection before and after the COVID-19 Pandemic. Microorganisms. 2024;12(10).
Cheng Q, Zhang H, Shang Y, Zhao Y, Zhang Y, Zhuang D, et al. Clinical features and risk factors analysis of bronchitis obliterans due to refractory Mycoplasma pneumoniae pneumonia in children: a nomogram prediction model. BMC infectious diseases. 2021;21(1):1085.
Article PubMed PubMed Central CAS Google Scholar
Zhang J, Ma HK, Li BW, Ma KK, Zhang YL, Li SJ. Changes in urinary renal injury markers in children with Mycoplasma pneumoniae pneumonia and a prediction model for related early renal injury. Italian journal of pediatrics. 2024;50(1):155.
Article PubMed PubMed Central CAS Google Scholar
Poddighe D. Mycoplasma pneumoniae-related extra-pulmonary diseases and antimicrobial therapy. Journal of microbiology, immunology, and infection = Wei mian yu gan ran za zhi. 2020;53(1):188-9.
Sartori LF, Zhu Y, Grijalva CG, Ampofo K, Gesteland P, Johnson J, et al. Pneumonia Severity in Children: Utility of Procalcitonin in Risk Stratification. Hosp Pediatr. 2021;11(3):215–22.
Article PubMed PubMed Central Google Scholar
Sdougka M, Simitsopoulou M, Volakli E, Violaki A, Georgopoulou V, Ftergioti A, Roilides E, Iosifidis E. Evaluation of Five Host Inflammatory Biomarkers in Early Diagnosis of Ventilator-Associated Pneumonia in Critically Ill Children: A Prospective Single Center Cohort Study. Antibiotics (Basel). 2023;12(5):921.
Article PubMed PubMed Central CAS Google Scholar
Jia Z, Sun Q, Zheng Y, Xu J, Wang Y. The immunogenic involvement of miRNA-492 in mycoplasma pneumoniae infection in pediatric patients. J Pediatr (Rio J). 2023 Mar-Apr;99(2):187-192.
Zhang Z, Dou H, Tu P, Shi D, Wei R, Wan R, et al. Serum cytokine profiling reveals different immune response patterns during general and severe Mycoplasma pneumoniae pneumonia. Frontiers in immunology. 2022;13:1088725.
Article PubMed PubMed Central Google Scholar
Wei D, Zhao Y, Zhang T, Xu Y, Guo W. The role of LDH and ferritin levels as biomarkers for corticosteroid dosage in children with refractory Mycoplasma pneumoniae pneumonia. Respiratory research. 2024;25(1):266.
Article PubMed PubMed Central CAS Google Scholar
Poddighe D. Mycoplasma pneumoniae-related hepatitis in children. Microbial pathogenesis. 2020;139: 103863.
Article PubMed CAS Google Scholar
Zheng Y, Hua L, Zhao Q, Li M, Huang M, Zhou Y, et al. The Level of D-Dimer Is Positively Correlated With the Severity of Mycoplasma pneumoniae Pneumonia in Children. Frontiers in cellular and infection microbiology. 2021;11: 687391.
Article PubMed PubMed Central CAS Google Scholar
Hu J, Ye Y, Chen X, Xiong L, Xie W, Liu P. Insight into the Pathogenic Mechanism of Mycoplasma pneumoniae. Current microbiology. 2022;80(1):14.
Article PubMed PubMed Central Google Scholar
Han K, Lee B, Lee D, Heo G, Oh J, Lee S, et al. Forecasting the spread of COVID-19 based on policy, vaccination, and Omicron data. Scientific reports. 2024;14(1):9962.
Article PubMed PubMed Central CAS Google Scholar
Xie S, Wang J, Tuo W, Zhuang S, Cai Q, Yao C, et al. Serum level of S100A8/A9 as a biomarker for establishing the diagnosis and severity of community-acquired pneumonia in children. Frontiers in cellular and infection microbiology. 2023;13:1139556.
Article PubMed PubMed Central CAS Google Scholar
Moraes-Vieira PM, Yore MM, Sontheimer-Phelps A, Castoldi A, Norseen J, Aryal P, et al. Retinol binding protein 4 primes the NLRP3 inflammasome by signaling through Toll-like receptors 2 and 4. Proceedings of the National Academy of Sciences of the United States of America. 2020;117(49):31309–18.
Article PubMed PubMed Central CAS Google Scholar
Vollenberg R, Tepasse PR, Fobker M, Hüsing-Kabar A. Significantly Reduced Retinol Binding Protein 4 (RBP4) Levels in Critically Ill COVID-19 Patients. Nutrients. 2022;14(10).
Liu J, He R, Wu R, Wang B, Xu H, Zhang Y, et al. Mycoplasma pneumoniae pneumonia associated thrombosis at Beijing Children’s hospital. BMC infectious diseases. 2020;20(1):51.
Article PubMed PubMed Central CAS Google Scholar
Xu R, Jacques LC, Khandaker S, Beentjes D, Leon-Rios M, Wei X, et al. TNFR2(+) regulatory T cells protect against bacteremic pneumococcal pneumonia by suppressing IL-17A-producing γδ T cells in the lung. Cell reports. 2023;42(2): 112054.
Article PubMed CAS Google Scholar
Sprenkeler EGG, Zandstra J, van Kleef ND, Goetschalckx I, Verstegen B, Aarts CEM, et al. S100A8/A9 Is a Marker for the Release of Neutrophil Extracellular Traps and Induces Neutrophil Activation. Cells. 2022;11(2).
Zhao B, Lu R, Chen J, Xie M, Zhao X, Kong L. S100A9 blockade prevents lipopolysaccharide-induced lung injury via suppressing the NLRP3 pathway. Respir Res. 2021;22(1):45.
Article PubMed PubMed Central CAS Google Scholar
Jukic A, Bakiri L, Wagner EF, Tilg H, Adolph TE. Calprotectin: from biomarker to biological function. Gut. 2021;70(10):1978–88.
Article PubMed CAS Google Scholar
Shabani F, Farasat A, Mahdavi M, Gheibi N. Calprotectin (S100A8/S100A9): a key protein between inflammation and cancer. Inflamm Res. 2018;67(10):801–12.
Article PubMed CAS Google Scholar
Bai S, Wang W, Ye L, Fang L, Dong T, Zhang R, et al. IL-17 stimulates neutrophils to release S100A8/A9 to promote lung epithelial cell apoptosis in Mycoplasma pneumoniae-induced pneumonia in children. Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie. 2021;143:112184.

Download references

Acknowledgements

We would like to thank the parents and children for participating in the study. We thank the doctors and nursing staffs of these centers for their detailed assessment. We thank all participants for their contributions to this study.

Funding

This work was supported by the National Natural Science Foundation of China (82202536), the Natural Science Foundation of Wuhan Municipal Health Commission (grant No. WX21Q50, WX21M03, WZ22Q08).

Author information

Si Xie, Mo Wu, Yu Shang contributed equally to the work.

Authors and Affiliations

Department of Laboratory Medicine, Wuhan Children’s Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science & Technology, Wuhan, 430016, China
Si Xie, Mo Wu, Yu Shang, Wenbin Tuo, Jun Wang, Qinzhen Cai, Chunhui Yuan & Yun Xiang
Health Care Department, Wuhan Children’s Hospital (Wuhan Maternal and Child Healthcare Hospital), Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430016, China
Cong Yao

Authors

Si Xie
View author publications
You can also search for this author inPubMed Google Scholar
Mo Wu
View author publications
You can also search for this author inPubMed Google Scholar
Yu Shang
View author publications
You can also search for this author inPubMed Google Scholar
Wenbin Tuo
View author publications
You can also search for this author inPubMed Google Scholar
Jun Wang
View author publications
You can also search for this author inPubMed Google Scholar
Qinzhen Cai
View author publications
You can also search for this author inPubMed Google Scholar
Chunhui Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Cong Yao
View author publications
You can also search for this author inPubMed Google Scholar
Yun Xiang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

S X: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft. M W: Data acquisition. Y S: Data acquisition. Jun Wang: Data acquisition. Wb T: Data acquisition. Qz C: Data acquisition. Ch Y: Conceptualization,, Writing – review & editing. C Y: Analysis and data interpretation. Y X: Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing.All authors reviewed the manuscript.

Corresponding authors

Correspondence to Chunhui Yuan, Cong Yao or Yun Xiang.

Ethics declarations

Ethics approval and consent to participate

This study was conducted in accordance with the Declaration of Helsinki and approved by The Ethics Committee of the Wuhan Children’s Hospital, Huazhong University of Science and Technology (2022R048-E01). Clinical trial number: Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Additional file 2.

Additional file 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xie, S., Wu, M., Shang, Y. et al. Development and validation of an early diagnosis model for severe mycoplasma pneumonia in children based on interpretable machine learning. Respir Res 26, 182 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12931-025-03262-1

Download citation

Received: 01 February 2025
Accepted: 28 April 2025
Published: 13 May 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12931-025-03262-1

Development and validation of an early diagnosis model for severe mycoplasma pneumonia in children based on interpretable machine learning

Abstract

Background

Methods

Results

Conclusions

Introduction

Methodology

Research design

Study subjects

Data collection and Laboratory measurements

Statistical analysis

Model derivation and validation

Results

Demographic characteristics and clinical information

Variable selection

Construction of the SMPP risk prediction model

Using machine learning algorithms to construct SMPP Risk Prediction Model

Contribution degree of indicators

Simple model construction

Association between the serum S100 A8/A9 levels in children with SMPP

Discussion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1.

Additional file 2.

Additional file 3.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Respiratory Research

Contact us