Skip to main content
Fig. 3 | Respiratory Research

Fig. 3

From: The large language model diagnoses tuberculous pleural effusion in pleural effusion patients through clinical feature landscapes

Fig. 3

Machine learning models effectively diagnose TPE. A The plot illustrates the path of coefficients for different biochemical and hematological variables in lasso regression as the regularization parameter λ (Log Lambda) varies. The x-axis represents Log Lambda, the logarithm of the regularization parameter, and the y-axis represents the regression coefficients for each variable. As λ increases, the coefficients gradually approach zero, indicating that lasso regression performs variable selection by shrinking the coefficients. Curves of different colors represent different variables, showing the changes in their coefficients during the model regularization process. B The plot displays the cross-validation process of lasso regression, where the y-axis represents Binomial Deviance, and the x-axis shows different values of Log(λ). The shaded gray area indicates the standard error range, and the red curve represents the mean of the binomial deviance. Through cross-validation, it is observed that as λ changes, the binomial deviance decreases, reaching a minimum, and the corresponding λ at this point represents the optimal regularization parameter. The optimal λ value, marked by the dashed line, is the best regularization parameter chosen by the model. C Heatmap comparing AUC and F1 scores of the top 93 models out of 453 machine learning models. This heatmap compares the AUC and F1 scores of various machine learning models, including those using XGBoost, GBM, DeepLearning, and other algorithms. The results include AUC and F1 scores for the training set, test set, and average values. Each model’s performance is ranked according to its AUC and F1 scores, with higher values indicating better performance. The color bar in the table represents the values of AUC and F1 scores, with the intensity of color reflecting the level of performance. AUC: AUC measures the model’s classification ability. An AUC value closer to 1 indicates better performance. In this heatmap, AUC values are presented for the training set, test set, and average, showing the performance variation of different models across different datasets. F1 Score: The F1 score is an indicator of a classification model’s accuracy, balancing precision and recall. A higher F1 score suggests better balance in the model’s performance across positive and negative classes. This heatmap displays the F1 scores of each model for the training and test sets and provides average values to facilitate comparisons of model performance at different stages. Model Name: Each row represents a different machine learning model, including various configurations of algorithms such as XGBoost, GBM, DeepLearning, etc. (e.g., model names like XGBoost_grid_1_model_108), and their corresponding AUC and F1 scores on the training and test sets. D The plot illustrates the importance of each variable in the best-performing XGBoost-based machine learning model on the test set. The importance of each variable is represented by the length of the corresponding bar, with longer bars indicating a greater contribution of that variable to the model’s diagnostic capability. The most important variables include ADA, alkaline phosphatase, PE biochemical markers (albumin, total protein), and hematological analysis variables (neutrophil count, monocyte percentage, etc.). E This plot displays the SHAP values for each feature in the best XGBoost-based machine learning model, representing the contribution of each feature to the model’s output. Features are listed on the y-axis, and the corresponding SHAP values are plotted along the x-axis. Each point represents a data point, and the color of the point indicates the value of the feature (ranging from low to high, with the color scale displayed on the right). Positive SHAP values (to the right of the vertical line) increase the model’s diagnostic value, while negative SHAP values (to the left of the vertical line) decrease the diagnostic value. The features with the greatest impact on the diagnosis are positioned at the top of the plot. F SHAP analysis showing the contribution of multiple features to the model’s diagnosis of specific samples (TPE vs. non-TPE). The x-axis represents the SHAP values, reflecting each feature’s contribution to the diagnosis. Movement to the right indicates an increase in the diagnostic value, while movement to the left indicates a decrease. The cumulative effect of the SHAP values determines the final model diagnosis. The difference between the final diagnostic value, f(x) = 0.0963, and the expected value, E(f(x)) = − 0.626, is reflected by the SHAP values of each feature. G Training set confusion matrix. This matrix displays the model’s diagnostic results on the training set. In the matrix, Control represents normal cases, and Case represents diseased cases. The model’s correctly diagnostic categories and misclassifications are as follows: True positives (TP): 68; False positives (FP): 1; True negatives (TN): 42; False negatives (FN): 4; The model’s performance metrics on the training set are: sensitivity = 0.977, specificity = 0.944, accuracy = 0.913, recall = 0.944, F1 score = 0.944, and Kappa value = 0.908. H Test set confusion matrix. This matrix shows the model’s diagnostic results on the test set: True positives (TP): 35; False positives (FP): 1; True negatives (TN): 10; False negatives (FN): 2; The model’s performance metrics on the test set are: sensitivity = 0.909, specificity = 0.946, accuracy = 0.833, recall = 0.909, F1 score = 0.87, and Kappa value = 0.829

Back to article page