- Research
- Open access
- Published:
One-step diagnosis of infection and lung cancer using metagenomic sequencing
Respiratory Research volume 26, Article number: 48 (2025)
Abstract
Background
Traditional detection methods face challenges in meeting the diverse clinical needs for diagnosing both lung cancer and infections within a single test. Onco-mNGS has emerged as a promising solution capable of accurately identifying infectious pathogens and tumors simultaneously. However, critical evidence is still lacking regarding its diagnostic performance in distinguishing between pulmonary infections, tumors, and non-infectious, non-tumor conditions in real clinical settings.
Methods
In this study, data were gathered from 223 participants presenting symptoms of lung infection or tumor who underwent Onco-mNGS testing. Patients were categorized into four groups based on clinical diagnoses: infection, tumor, tumor with infection, and non-infection-non-tumor. Comparisons were made across different groups, subtypes, and stages of lung cancer regarding copy number variation (CNV) patterns, microbiome compositions, and clinical detection indices.
Results
Compared to conventional infection testing methods, Onco-mNGS demonstrates superior infection detection performance, boasting a sensitivity of 81.82%, specificity of 72.55%, and an overall accuracy of 77.58%. In lung cancer diagnosis, Onco-mNGS showcases excellent diagnostic capabilities with sensitivity, specificity, accuracy, positive predictive value, and negative predictive value reaching 88.46%, 100%, 91.41%, 100%, and 90.98%, respectively. In bronchoalveolar lavage fluid (BALF) samples, these values stand at 87.5%, 100%, 94.74%, 100%, and 91.67%, respectively. Notably, more abundant CNV mutation types and higher mutation rates were observed in adenocarcinoma (ADC) compared to squamous cell carcinoma (SCC). Concurrently, onco-mNGS data revealed specific enrichment of Capnocytophaga sputigeria in the ADC group and Candida parapsilosis in the SCC group. These species exhibited significant correlations with C reaction protein (CRP) and CA153 values. Furthermore, Haemophilus influenzae was enriched in the early-stage SCC group and significantly associated with CRP values.
Conclusions
Onco-mNGS has exhibited exceptional efficiencies in the detection and differentiation of infection and lung cancer. This study provides a novel technological option for achieving single-step precise and swift detection of lung cancer.
Graphical Abstract

Introduction
Lung cancer (LC) is a malignant tumor that typically originates in lung tissues and ranks among the foremost causes of cancer-related fatalities worldwide [1]. It is generally classified into two major categories: non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC encompasses subtypes such as adenocarcinoma (ADC), squamous cell carcinoma (SCC), and large cell carcinoma (LCC) [2, 3], collectively constituting approximately 85% of lung cancer cases [4, 5]. NSCLC tends to progress slowly, with early symptoms often manifesting subtly, making early diagnosis a formidable challenge. In contrast, SCLC exhibits rapid growth and is typically diagnosed after metastasis to other regions. Therefore, early detection and accurate diagnosis of lung cancer are imperative for enhancing cure rates and survival rates [6, 7].
Currently, in clinical practice, tumor screening typically involves a series of imaging examinations, including abdominal ultrasound, chest X-rays, and CT scans of the chest, abdomen, and pelvis. These tests are often combined with invasive procedures such as needle biopsies to ensure more accurate diagnosis and assessment [6]. In recent years, innovative methods such as liquid biopsies and circulating tumor DNA testing have shown significant potential [7, 8], providing more convenient ways for early screening and monitoring while reducing patient discomfort and risks. Studies have substantiated the efficacy of utilizing human-derived information from extensive genomic sequencing data as a means of assessing copy number variations (CNVs) in clinical samples, thereby contributing to the diagnosis and monitoring of tumors [11,12,13,14]. These findings indicate that leveraging human-derived information from macro-genomic sequencing data is a valuable method for evaluating CNV in clinical samples, contributing to the diagnosis and monitoring of tumors.
Lung cancer patients often present clinical symptoms resembling those of infections, including cough, sputum production, chest pain, and the presence of lung masses [5, 9]. Accurate identification and differentiation between infections and cancer are critical in the diagnosis and treatment of cancer. Earlier and faster identification of pathogens can guide precise individualized anti-infective treatment, which is of great significance for controlling lung infections [10]. Traditional infection detection primarily relies on culture methods, with limitations such as low positivity rates and extended testing periods. In recent years, metagenomic next-generation sequencing (mNGS) has emerged as a valuable tool for pathogen diagnosis due to its shorter testing periods, broader pathogen coverage, and reduced bias [11, 12]. Metagenomic sequencing allows for the simultaneous analysis of both the host and pathogen genomes. However, current mNGS methods are primarily designed for the rapid identification of infection-related pathogenic microorganisms and are not typically utilized for tumor screening [13, 14].
The newly developed Onco-mNGS technology excels at precise identification of pathogenic microorganisms in clinical samples [15]. Simultaneously, it analyzes human genome data derived from mNGS to detect CNV within human chromosomes. Consequently, Onco-mNGS serves as a one-step, quick solution for identifying potential infection factors and tumor-related chromosomal abnormalities in patient body fluids and tissue samples. This approach enhances diagnostic efficiency and minimizes the physical impact of testing. In this study, we assessed a clinical cohort comprising 233 patients with suspected tumor or infections using Onco-mNGS. Our aim was to investigate the cancer detection capabilities of Onco-mNGS in bronchoalveolar lavage fluid (BALF) samples and assess its efficiency in lung cancer typing and staging, utilizing CNV and microbiological data generated through Onco-mNGS.
Methods
Data source and study design
This is a single-center retrospective study analyzing Onco-mNGS data and clinical information from 223 patients enrolled at the Department of Respiratory of the First Affiliated Hospital, Guangzhou Medical University between July 24th, 2020 and December 2nd, 2022 (Fig. 1 and Table 1). Only patients underwent Onco-mNGS testing after admission were included. These patients presented with one of the following three symptoms upon admission: (I) new occurrence of cough, expectoration, or worsened respiratory symptoms, with or without purulent sputum, chest pain, dyspnea and hemoptysis; (II) fever; (III) signs of consolidation and/or moist rales on lung auscultation. Simultaneously, chest X-rays or computed tomography (CT) scans revealed new infiltrates, lobar or segmental consolidations, ground-glass opacities, or interstitial images. Patients infected with human immunodeficiency virus (HIV), those confirmed with tuberculosis at admission, and pregnant women were excluded. In addition to the mNGS data from BALF samples, infection-related tested results, including white blood cell (WBC), procalcitonin (PCT), C-reactive protein (CRP), neutrophil, and tumor markers such as carcinoembryonic antigen (CEA), neuron-specific enolase (NSE), cancer antigen 125 (CA125) and cancer antigen 153 (CA153), along with other biomarkers obtained from contemporaneous blood samples, were also collected. All the patients or their families signed the informed consent form for the collection and study of clinical samples and medical records.
Overall design of the study and diagnostic performance of mNGS test. A Flowchart of patients and samples classification. All enrolled samples were analyzed using Q-mNGS 2.0 and the classification of which was based on clinical diagnoses. All samples were used for analyzing the performance of mNGS, while only BALF samples were further included for microbiota and CNV analyses. Non non-tumor and non-infection samples, LC lung cancer, LY lymphadenoma, MC metastasis cancer, NSCLC non-small-cell lung cancer, SCLC small-cell lung cancer, SCC squamous cell carcinoma, ADC adenocarcinoma, L early stages of cancer, including Phase I, II and IIIa, H late stages of cancer, including Phase IIIb, IIIc and IV. (B) (D) The performance of onco-mNGS in diagnosing cancers based on CNV signals compared with clinical diagnosis. (C) (E) The performance of onco-mNGS in diagnosing infection compared with clinical diagnosis. (B) (C)included all kinds of samples and (D) (E) contains only BALF samples. If the pathogenic microorganisms identified in the Onco-mNGS report completely matched those clinically confirmed as responsible pathogens, it was considered a concordant result. Otherwise, the Onco-mNGS result was regarded as negative. (F) Flowchart showing the statistical result of actual detection duration
DNA extraction, library preparation and sequencing
Each sample was transported at low temperature (ice bag) to the Guangzhou MatriDx Biotechnology laboratory for Onco-mNGS testing. The collected samples were transported, in an ice pack, to the MatriDx Biotech Laboratory for immediate Onco-mNGS testing. For body fluid samples, a total of 200 μL fluid were used for DNA extraction. Briefly, an NGS Automatic Library Preparation System (Cat. MAR002, MatriDx Biotech Corp., Hangzhou, China) was used to automate DNA extraction and library preparation. The accompanied kits were a Nucleic Acid Extraction Kit (Cat. MD013, MatriDx Biotech Corp.) and a Total DNA Library Preparation Kit (Cat. MD001T, MatriDx Biotech Corp.). The Agilent 2100 Bioanalyzer System (Agilent Technologies, United States) was used to quality control the prepared libraries (PCR-free), and adapter-specific quantitative PCR was used to quantify them. The libraries were then adjusted (aiming for 20 million (M) reads) and pooled for next-generation sequencing (NGS) on an Illumina NextSeq550Dx system using the sequencing strategy of single-end (SE) 50 base pairs (bp). For contamination control, irrelevant cell line-based control samples were used throughout the process (in parallel with samples).
Pathogen determination
After obtaining clean reads through raw data demultiplexing and adapter trimming, microbial identification was performed using a proprietary reference database comprising over 20,000 microorganisms. Species identified in clinical samples via mNGS were initially filtered against microorganisms detected in the parallel no template control (NTC) to account for background microorganisms, considering a ratio of unique reads per million (RPM) above 10 (RPM ratio = RPMsample/RPMNTC, or RPM ratio = RPMsample if the organism wasn't detected in the parallel NTC). The authentic microbiota present in clinical samples was defined based on this filtering process. Furthermore, all identified microbiota species were cross-referenced with PubMed to assess their potential pathogenicity, designating those with positive pathogenic attributes as pathogens.
CNV signatures identification
Sequenced reads were initially aligned to the human reference genome (hg19), and only uniquely mapped reads were chosen for subsequent analysis. The reference genome was divided into contiguous windows of fixed length, and read depths were computed for each window, normalized to the total number of reads per sample. The copy number ratio for each window was determined by dividing the normalized read depth by the average read depth in the reference dataset. This ratio was transformed into log2, and adjacent open frames with similar ratios were merged into segments annotated with chromosome position and average ratio. The copy number of each segment was calculated based on the mean ratio and normal copy number of the corresponding chromosome and then compared to a preset threshold for CNV validation.
Results judgment and reporting
The results of the etiological screening of the enrolled patients were evaluated by a panel of clinical experts (including three experienced physicians). mNGS results were interpreted according to MatriDx Biotechnology Co., Ltd.'s own pathogen data filtering principles. Infectious diseases were diagnosed on the basis of microbiological tests, mNGS results, and clinical review results. Tumors were judged on the basis of Onco-mNGS results in addition to histopathology, cytological examination, microscopic examination, and other validation tests.
Statistical analysis
SPSS 26.0 statistical software was used to analyze the data. Categorical variables were counted descriptively using cases (%) or individuals (%) and correlations between different indices were analyzed using Spearman’s correlation and coefficients of determination (R2). P values less than 0.05 were considered a statistically significant difference.
Results
Participant characteristics
A total of 223 patients, comprising 131 (58.7%) males and 92 (41.3%) females, were enrolled in the study. The median age was 58 years (range: 17–90). Among the cohort, 37.2% (83/223) had a history of smoking, and 49.3% (110/223) had underlying diseases. Various clinical samples, including BALF, sputum, blood, etc., were collected from these patients. Depending on the sample type, they were initially categorized into a BALF group (n = 190, 85.2%) and a non-BALF group (n = 33, 14.8%) (Table 1 and S1).
Overall, diagnoses included infection in 97 cases (41.63%), tumor with infection in 29 cases (12.45%), tumor alone in 75 cases (32.19%), and non-tumor, non-infection (Non) in 22 cases (9.44%) (Fig. 1A). Among all cases, 36.8% (70/190) of BALF samples exhibited varying degrees of CNV, while the corresponding percentage in non-BALF samples was higher at 66.7% (22/33). To explore pathological characteristics of patients under different conditions, samples were further divided into sub-groups based on classifications of diseases, following the preliminary CNV signals division (Fig. 1A). Specifically, BALF samples with negative CNV signals were subdivided into infection (n = 88), non-tumor-and-non-infection (Non, n = 22), and tumor (n = 10) groups. Those with positive CNV signals were first divided into the lung cancer group (LC, n = 67), the lymphoma group (LY, n = 2), and the group with other cancers (Other, n = 1; cancer type: hematologic malignancy). Among the LC group, three sub-groups were identified: metastatic carcinoma (MC, n = 3), small cell lung cancer (SCLC, n = 4), and non-small cell lung carcinoma (NSCLC, n = 60). As the predominant components of the LC subgroup, the NSCLC subgroup was subdivided into squamous carcinoma (SCC, n = 20), lung adenocarcinoma (ADC, n = 35), and NSCLC with other types (NSCLC-O, n = 5). Additionally, LC progression stages (early stages (L; phases I, II, and IIIA) and late stages (H; phases IIIB, IIIC and IV)) and the two main subtypes of NSCLC were also considered (Fig. 1A, Table 1). The grouping of non-BALF samples mirrored that of BALF samples, with details provided in Supplementary Materials (Additional file 1: Table S1).
Three infection-related examinations, WBC, neutrophil, and PCT, were observed with minimal variations for different groups. However, CRP results exhibited significant fluctuations across different groups, with the highest value recorded for SCC at 4.91 mg/L and the lowest at approximately 0.2 mg/L for Non and ADC. As anticipated, the expressions of four potential diagnostic biomarkers, CEA, NSE, CA125, and CA153, in the majority of samples with positive CNV signals exceeded those with negative signals. Notably, CEA and NSE remained within the normal range (CEA: 0–5 ng/ml; NSE: 0–17.5 ng/ml), showing lower sensitivity to the cancers investigated. In contrast, NSCLC, excluding SCC and ADC, had notably high CA125 levels (average: 73.8 U/ml).
Cancer and infection diagnosis performance of Onco-mNGS
As depicted in Fig. 1B, onco-mNGS demonstrated a sensitivity of 90.2%, a specificity of 100%, and an accuracy of 95.5% in cancer diagnosis. Similar to the results observed for all sample types, onco-mNGS performance for BALF samples exhibited a sensitivity of 87.5%, a specificity of 100%, and an accuracy of 94.74% (Fig. 1D). As for infection diagnosis, Onco-mNGS demonstrates superior performance with a sensitivity of 81.82%, specificity of 72.55%, and an overall accuracy of 77.58%, referenced by clinical diagnosis (Fig. 1C). In contrast, traditional testing exhibits a sensitivity of 32.23% and a specificity of 100%, resulting in an overall accuracy of 63.23% (Fig. S1). Among the 46 patients with positive results in traditional testing, 91.30% (42 cases) also tested positive with Onco-mNGS. In BALF samples, the performance of Onco-mNGS and traditional methods align consistently with the overall results (Fig. 1E and S1A).
Additionally, the retrospective statistical analysis of the timeliness of Onco-mNGS (Fig. 1F), indicates that 92.36% of the results were reported within 48 h, and 58.30% were reported within 24 h, even accounting for the time consumed in transportation.
Significant clinical implications of distinguishing tumors and infections
Three representative cases exemplify the advantages of streamlined diagnosis for both infections and lung cancer (Supporting information). Case 1 initially raised suspicions of infection, but Onco-mNGS indicated the presence of a tumor, which was later confirmed through liquid biopsy and genetic screening. This ultimately led to a favorable prognosis. In Case 2, the first liquid biopsy showed a negative result, but Onco-mNGS suggested tumor concurrent with EBV infection, later confirmed through the second liquid biopsy. In Case 3, Onco-mNGS simultaneously detected both tumor and fungal infections. The strategy involved prioritizing treatment for the fungal infection before addressing the tumor, effectively preventing the spread of infection.
CNV pattern with lung cancer typing
To evaluate the genomic characteristics, CNV variations across different cancer types were initially examined. In comparison to the LY and the other groups, the LC group displayed significantly higher frequencies of CNV changes, both in terms of variant types and variant sites (Fig. 2A and S8A-B). Specifically, duplication in the chromosomal segments was the most frequently detected variant type, which could be found in all of the 22 autosomes. Chr3, Chr5, and Chr8 exhibited the highest frequencies of CNV variation, while Chr21 demonstrated the lowest. Chr21 also had the fewest variant types of chromosomal copy numbers. Among the three main LC cancer types, NSCLC, which had an absolute numerical advantage, exhibited the highest incidence and the most complex variability of CNV (Fig. 2B and S8C-E).
CNV pattern of BALF samples and its clinical relevance with lung cancer typing. A The frequency of each CNV type occurred in each chromosome. (B) and (C) showed the average frequency of each CNV type in subtypes of lung cancer (ie. MC, NSCLC and SCLC) and that of NSCLC (ie. ADC and SCC), respectively. The data were statistically analyzed with the use of IBM SPSS Statistics 26.0 (independent sample Mann–Whitney U test with the significance of p < 0.05). “*” represents p < 0.05; “**” represents p < 0.01; “***” represents p < 0.001. (D) The proportion of a certain CNV type of each chromosome in the same type of CNV of all chromosomes for comparison of SCC and ADC group
Notably, significant differences in CNV changes were observed between the two groups (SCC and ADC) within NSCLC (Fig. 2C–D). ADC showed a higher variation rate of chromosomal microdeletions and large-scale duplication/deletion compared to SCC (Fig. 2D). In ADC, microdeletions were predominantly located on chromosome 6, while microduplications, along with duplication and deletion of chromosomal fragments, were found in multiple sites (Fig. 2C). In contrast, SCC showed preferential large-scale chromosome losses on Chr8, Chr20, and Chr22. Consistent with findings in the LC, Chr21 in both SCC and ADC displayed the lowest rate of CNV variations (Fig. 2C). Additionally, with the exception of Chr2 across all cancer types and Chr11 with WBC in LC group, chromosomal variants generally exhibited a negative correlation with infection related biochemical indices and potential cancer-related diagnostic biomarkers, especially the tumor biomarker CA153 (Fig. 2E–H).
CNV patterns with lung cancer staging
Overall, there is no notable distinction in the CNV mutation frequency between early (LC_L) and advanced lung cancer patients (LC_H) (Fig. 3A). Nevertheless, in ADC, the late-stage group (ADC_H) shows a markedly higher frequency than the early stage (ADC_L) (Fig. 3D), whereas the reverse is observed in SCC (Fig. 3G). Distinct differences in the CNV variation patterns were observed across different groups. For instance, early-stage lung cancer demonstrates a heightened frequency of large-scale chromosome gain mutations compared to the late stage, with variances predominantly focused on Chr16 and Chr22 (Fig. 3B). A similar trend is observed in ADC (Fig. 3E), but it differs in SCC (Fig. 3H). The large-scale chromosome gain variations mainly existed in Chr20 and Chr22 in the early SCC group (SCC_L) and Chr8 in the late-stage SCC group (SCC_H). Additionally, specific correlation patterns exist between clinical indicators and particular chromosomes. For instance, CEA was observed to be positively correlated with multi-chromosomes in LC_L but negatively in LC_H. CA153 was found to have a significant negative correlation with multi-chromosomes in LC_H (Fig. 3C). Similar to LC_L and LC_H, multiple clinical biomarkers (WBC, CEA, NSE, CA153) showed significantly positive associations with CNV variations in ADC_L but negative associations in ADC_H (Fig. 3F). However, these correlations were not observed in SCC (Fig. 3I). The detailed CNV variations in each sample of the groups described above were summarized in Supplementary Materials (Fig. S9).
CNV pattern of BALF samples and its clinical relevance with lung cancer staging. (A), (D) and (G) showed the proportion of a certain CNV type of each chromosome in the same type of CNV of all chromosomes for different stages of LC, ADC and SCC, respectively. (B), (E) and (H) showed the average frequency of each CNV type in different stages of LC, ADC and SCC, respectively. (C), (F) and (I) displayed clinical relevance of each chromosome in samples of different stages of LC, ADC and SCC, respectively, with certain clinical parameters. The data were statistically analyzed with the use of IBM SPSS Statistics 26.0 (independent sample Mann–Whitney U test with the significance of p < 0.05). “*” represents p < 0.05; “**” represents p < 0.01; “***” represents p < 0.001
Microorganisms related with lung cancer and typing
The relative abundances of Enterococcus faecium and Veillonella atypica, which dominated the infection group, were found almost negligible in the other two groups. Similarly, the relative abundance of Klebsiella pneumoniae accumulated solely in the tumor group (Fig. 4A). Beauveria bassianam, Malassezia restricta, and Purpureocillium lilacinum were most abundant fungi in both the tumor and Non groups, with relative abundances ranging from 14 to 39%. However, Candida tropicalis accounted for over 99% of the relative abundance in the infection group. Human gamma herpes virus 4 (EBV) dominated the viruses in both the infection and tumor groups, while Human alpha herpes virus 1 constituted over 68% in the Non group (Fig. 4A). The Simpson and Chao1 indices revealed significant differences in the diversity and richness of different groups (Fig. 4B), indicating effective discrimination of patients suffering from tumors, infections, and non-tumor-non-infection conditions based on microbial composition.
Characteristics of BALF microbiome in Tumor (for positive Onco-signal samples only), Infection and non-tumor-non-infection (Non) groups. A Relative abundance of top50 bacteria, top15 fungi and all viruses in each group. B Alpha diversity of bacteria, fungi and viruses identified in the three groups of Tumor, Infection and Non. C Specifically enriched microorganisms identified while the group of Tumor compared with each of the other two groups, respectively. Data were analyzed using LEfSe with the LDA threshold = 2. D Clinical relevance of the identified specifically enriched microorganisms with certain clinical parameters. pink: positive correlation; purple: negative correlation. The darker the color, the stronger the correlation. “*” represents p < 0.05. E The fitted smoothing curves of parameters with statistical differences, which was carried out with the Quasi-Poisson model that was performed in R. F Alpha diversity of bacteria, fungi and viruses identified in SCC and ADC. G Specifically enriched microorganisms identified SCC and ADC. Data were analyzed using LEfSe with the LDA threshold = 2. H Clinical relevance of the identified specifically enriched microorganisms in SCC and ADC with certain clinical parameters. pink: positive correlation; purple: negative correlation. The darker the color, the stronger the correlation. “*” represents p < 0.05.
Compared to the infection group, Veillonella parvula, Veillonella tobetsuensis, Rhodococcus erythropolis, Rothia mucilaginosa, Schaalia odontolytica, Moesziomyces aphidis, and Actinomyces graevenitzii had a higher proportion in the tumor group. While the proportion of Streptococcus australis and Corynebacterium striatum was higher in the infection group. Compared to the Non group, the relative abundance of Corynebacterium striatum, Alloprevotella tannerae, Capnocytophaga sputigena and Gemella sanguinis was higher than in the tumor group (Fig. 4C). Additionally, significant negative correlation was observed only between NEU and Gemella sanguinis (Fig. 4D–E). As for the two main subgroups within NSCLC, ADC and SCC could not be distinctly well differentiated based on the overall microbial diversity (Fig. 4F). The LEfSe diagram showed that Candida parapsilosis and Capnocytophaga sputigena had higher levels of abundance in SCC and ADC, respectively (Fig. 4G). Moreover, the highly abundant Candida parapsilosis in the SCC group was observed to exhibit significant positive and negative correlations with CRP and CA153, respectively (Fig. 4H).
Microorganisms related with early- and late-stage lung cancer
Four bacteria exhibited over tenfold differences in relative abundance between the early and late stages of LC (Fig. 5A). Specifically, the percentage of Corynebacterium propinquum was higher in LC_H, whereas Neisseria sicca, Pseudomonas fluorescens, and Haemophilus influenzae had higher proportions in LC_L. Malassezia restricta and Beauveria bassiana were the two main fungi that most abundantly existed in LC_H, while Malassezia restricta dominated in LC_L. LC_H exhibited significantly higher richness of microbial populations than LC_L, according to the Chao1 and Simpson assessment (Fig. 5B). Although Porphyromonas gingivalis widely existed in LC_H (Fig. 5E), it did not show any significant correlation with any clinical detection factor.
Characteristics of BALF microbiome and its clinical relevance with lung cancer staging. A Relative abundance of top50 bacteria, top15 fungi and all viruses in each group of LC_H and LC_L. (B), (D) and (G) showed alpha diversities of bacteria, fungi and viruses identified in different stages of LC, SCC and ADC, respectively. C and E are specifically enriched microorganisms identified in LC_H and SCC_L. Data were analyzed using LEfSe with the LDA threshold = 2. F Clinical relevance of microorganisms specifically enriched in SCC_L with certain clinical parameters. pink: positive correlation; purple: negative correlation. The darker the color, the stronger the correlation. *** represents p < 0.001
Despite no significant differences in microbial diversity across different progression stages of SCC and ADC (Fig. 5D–G), Neisseria mucosa and Rhodococcus erythropolis were more common in SCC_L, and Malassezia restricta was more frequent in SCC_H (Fig. S10A). Haemophilus influenzae was identified as the differentially abundant microbe in the SCC_L group, showing significant correlations with CRP (Fig. 5E–F). In addition, Rothia mucilaginosa was the highest in ADC_H, and Streptococcus mitis was the highest in ADC_L, while Malassezia restricta and EBV dominated in fungi and viruses, respectively, for both groups (Fig. S10B).
Discussion
Lung cancer ranks among the most prevalent cancers globally, underscoring the critical importance of early diagnosis, typing, and staging to enhance patient survival [2, 16]. Recent strides in molecular biology and bioinformatics have opened novel avenues for the early diagnosis and staging of lung cancer [17, 18]. CNV, denoting the presence of insertions or deletions of DNA segments larger than 1 kb between homologous genes, stands as a key aspect of structural variation in the human genome and is a pervasive biological phenomenon [19]. In the realm of lung cancer, CNV signatures can precipitate aberrant expression of crucial cancer-related genes, consequently influencing cell growth, differentiation, and invasiveness. Hence, delving into the role of CNVs in lung cancer is paramount for gaining insights into the pathogenesis and clinical management of the disease [20]. Early detection plays a pivotal role in enhancing the success of lung cancer treatment. Studies have shown that CNV signatures can serve as potential biomarkers for the early diagnosis of lung cancer [21,22,23]. By analyzing the CNV differences between lung cancer patients and normal controls, diagnostic models of CNV signature can be established [24]. These models have demonstrated good sensitivity and specificity in real-world clinical applications [25].
Currently, the prevailing methods for detecting CNV include comparative genomic hybridization (CGH), single nucleotide polymorphism (SNP) arrays, whole genome sequencing (WGS), and single-cell whole genome sequencing technology [19, 26, 27]. These approaches are characterized by their time-consuming processes, high costs, and analytical complexities, rendering them less suitable for widespread clinical adoption. The innovative Onco-mNGS technology, which was derived from mNGS for pathogen identification, can swiftly and accurately identify pathogens and host chromosome CNVs in a single step [15]. Compared to other sequencing-based tumor detection methods, Onco-mNGS has significantly improved in terms of timeliness. Specifically, 92.36% of cases received reports within 2 natural days after sample collection, and the average time from sample collection to report was only 25 h.
Moreover, Onco-mNGS demonstrated outstanding performance in both infection diagnosis and cancer identification. In infection diagnosis, taking clinical diagnosis as the gold standard, Onco-mNGS exhibited a remarkable 49.59% improvement in sensitivity compared to traditional diagnostic methods (81.82% vs. 32.23%). Furthermore, the diagnostic accuracy witnessed a significant increase of 14.35% (77.58% vs. 63.23%). Simultaneously, the early tumor identification based on Onco-mNGS detection of CNV signatures in BALF samples achieved a sensitivity of 87.5% and a specificity of 100%. Additionally, this study highlights the profound implications of Onco-mNGS in accurately identifying tumors combined with infections. Among the 29 patients confirmed with both tumor and infection, 82.79% (24 cases) could be precisely identified for both conditions in a single Onco-mNGS analysis. The high efficiencies of Onco-mNGS performance not only enhance the efficiency of detection but also provide robust support for early targeted interventions, exerting a positive impact on patient treatment and recovery.
Lung cancer encompasses various subtypes, each characterized by distinct biological features and therapeutic responses [28]. Different subtypes of lung cancer exhibit significant differences at both clinical and molecular levels, including different features of CNV [29, 30]. Understanding the CNV profile can be used for typing and staging of lung cancer, which helps to better select individualized treatment and improve outcomes [22]. In this study, NSCLC patients exhibited a higher CNV incidence and a greater diversity of CNV features compared with SCLC patients. ADC patients also demonstrated a higher CNV incidence than SCC patients, involving both chromosomal microdeletions and large-scale chromosomal duplication/deletion. Furthermore, patients with late-stage ADC showed a significantly higher CNV incidence than patients with early-mid-stage ADC. In contrast, patients with early-stage SCC had a higher CNV incidence compared with patients with late-stage SCC. These findings suggest that Onco-mNGS-based in-depth analysis of CNV features in BALF samples is valuable for the early identification of clinical typing and staging of lung cancer, particularly in the cases of NSCLC.
In recent years, studies suggest that the microbiota in the lower respiratory tract (LRT) may play a crucial role in both the pathogenesis and treatment of lung cancer [31]. Different types of lung cancer may exhibit distinct characteristics in their LRT microbiota, essential for a deeper understanding of lung cancer development and clinical management [32, 33]. These microorganisms are not only pivotal for lung homeostasis, maintaining immune balance, and clearing airway mucus [34] but also found to be linked to the development and progression of lung cancer [35]. For instance, an increase in certain bacterial communities, such as Streptococcus, Veillonella, and Fusobacterium, has been associated with the development and progression of NSCLC [36], potentially implicating them in the inflammatory process and immune escape from lung cancer [37]. In this study, significant variations in LRT microbial diversity were identified among distinct cohorts, namely the tumor group, infection group, and non-tumer-non-infection group. Through meticulous comparisons between the tumor and infection groups, as well as between the tumor and non-tumer-non-infection groups, we have pinpointed representative distinct microbial species. The LRT microbial diversity in patients within the tumor group was significantly reduced compared to that observed in the infection group. Although no statistically significant difference was observed between patients diagnosed with ADC and SCC, the abundance of Candida parapilosis and Capnocytophaga sputigera exhibited significant variations between these two histological subtypes. Furthermore, with regard to different cancer stages, a notable decrease in LRT microbial diversity was evident in patients diagnosed with advanced lung cancer compared to those with early-stage lung cancer. A similar trend was observable in SCC, although statistical significance was not achieved. Additionally, Porphyromonas gingivalis has been identified as an indicative species associated with early-stage lung cancer, whereas Haemophilus influenzae was considered a representative species of late-stage SCC. In summary, these variations in microbial information can serve as pivotal features for the classification and staging of lung cancer.
After identifying all these characteristics, we tried to predict the classification and staging of lung cancer using Random-forest method (Additional file 9: Fig. S11). The results demonstrated a relative high AUC and low error rates in the early and late-stage classification of lung cancer. However, less favorable outcomes were obtained in the staging of ADC and SCC, as well as the staging of these two subtypes. Given that the current study is in its preliminary stage, further large-scale and multi-center investigations are essential to validate these findings and subsequently apply them in clinical practice for enhancing the management and treatment of patients with lung cancer.
Conclusions
This study highlights the outstanding performance of metagenomic sequencing in early diagnosis and differentiation between infection and tumor. This enhancement is significant across various methodological aspects, including improved detection sensitivity, specificity, accuracy, and timeliness. Looking ahead, Onco-mNGS holds the potential to establish a comprehensive array of clinical adjunctive diagnostic methods for the early diagnosis and prognostic assessment of NSCLC.
Availability of data and materials
The sequencing datasets presented in this study can be found in National Genomics Data Center (NGDC, https://ngdc.cncb.ac.cn/) with the project number of PRJCA022824, and data related to the patient's clinical diagnosis could be obtained by contact the corresponding author.
References
Hirsch FR, Scagliotti GV, Mulshine JL, Kwon R, Curran WJ Jr, Wu YL, Paz-Ares L. Lung cancer: current therapies and new targeted treatments. Lancet. 2017;389:299–311.
Goldstraw P, Chansky K, Crowley J, Rami-Porta R, Asamura H, Eberhardt WE, Nicholson AG, Groome P, Mitchell A, Bolejack V, et al. The IASLC lung cancer staging project: proposals for revision of the TNM stage groupings in the forthcoming (Eighth) edition of the TNM classification for lung cancer. J Thorac Oncol. 2016;11:39–51.
Travis WD, Brambilla E, Riely GJ. New pathologic classification of lung cancer: relevance for clinical practice and clinical trials. J Clin Oncol. 2013;31:992–1001.
Rudin CM, Brambilla E, Faivre-Finn C, Sage J. Small-cell lung cancer. Nat Rev Dis Primers. 2021;7:3.
Gridelli C, Rossi A, Carbone DP, Guarize J, Karachaliou N, Mok T, Petrella F, Spaggiari L, Rosell R. Non-small-cell lung cancer. Nat Rev Dis Primers. 2015;1:15009.
Nooreldeen R, Bach H. Current and Future Development in Lung Cancer Diagnosis. Int J Mol Sci. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/ijms22168661.
Nikanjam M, Kato S, Kurzrock R. Liquid biopsy: current technology and clinical applications. J Hematol Oncol. 2022;15:131.
Alix-Panabieres C, Pantel K. Liquid biopsy: from discovery to clinical application. Cancer Discov. 2021;11:858–73.
Preda M, Tanase BC, Zob DL, Gheorghe AS, Lungulescu CV, Dumitrescu EA, Stanculeanu DL, Manolescu LSC, Popescu O, Ibraim E, Mahler B. The bidirectional relationship between pulmonary tuberculosis and lung cancer. Int J Environ Res Public Health. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/ijerph20021282.
Jain S, Self WH, Wunderink RG, Fakhran S, Balk R, Bramley AM, Reed C, Grijalva CG, Anderson EJ, Courtney DM, et al. Community-acquired pneumonia requiring hospitalization among U.S Adults. N Engl J Med. 2015;373:415–27.
Chiu CY, Miller SA. Clinical metagenomics. Nat Rev Genet. 2019;20:341–55.
Gu W, Miller S, Chiu CY. Clinical metagenomic next-generation sequencing for pathogen detection. Annu Rev Pathol. 2019;14:319–38.
Chen S, Kang Y, Li D, Li Z. Diagnostic performance of metagenomic next-generation sequencing for the detection of pathogens in bronchoalveolar lavage fluid in patients with pulmonary infections: Systematic review and meta-analysis. Int J Infect Dis. 2022;122:867–73.
Han D, Li Z, Li R, Tan P, Zhang R, Li J. mNGS in clinical microbiology laboratories: on the road to maturity. Crit Rev Microbiol. 2019;45:668–85.
Guo Y, Li H, Chen H, Li Z, Ding W, Wang J, Yin Y, Jin L, Sun S, Jing C, Wang H. Metagenomic next-generation sequencing to identify pathogens and cancer in lung biopsy tissue. EBioMedicine. 2021;73: 103639.
Miller KD, Nogueira L, Devasia T, Mariotto AB, Yabroff KR, Jemal A, Kramer J, Siegel RL. Cancer treatment and survivorship statistics, 2022. CA Cancer J Clin. 2022;72:409–36.
Nana-Sinkam SP, Powell CA. Molecular biology of lung cancer: diagnosis and management of lung cancer, 3rd ed: American college of chest physicians evidence-based clinical practice guidelines. Chest. 2013;143:e30S-e39S.
Wu D, Wang X. Application of clinical bioinformatics in lung cancer-specific biomarkers. Cancer Metastasis Rev. 2015;34:209–16.
Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, Handsaker RE, McCarroll SA, O’Donovan MC, Owen MJ, et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet. 2012;91:597–607.
Samulin Erdem J, Arnoldussen YJ, Skaug V, Haugen A, Zienolddiny S. Copy number variation, increased gene expression, and molecular mechanisms of neurofascin in lung cancer. Mol Carcinog. 2017;56:2076–85.
Tischler V, Pfeifer M, Hausladen S, Schirmer U, Bonde AK, Kristiansen G, Sos ML, Weder W, Moch H, Altevogt P, Soltermann A. L1CAM protein expression is associated with poor prognosis in non-small cell lung cancer. Mol Cancer. 2011;10:127.
Zeng L, Zhou Y, Zhang X, Xu Q, Zhou C, Zeng F, Jiang W, Wang Z, Deng L, Yang H, et al. Copy number variations mediate major pathological response to induction chemo-immunotherapy in unresectable stage IIIA-IIIB lung cancer. Lung Cancer. 2023;178:134–42.
Li F, Sun L, Zhang S. Acquirement of DNA copy number variations in non-small cell lung cancer metastasis to the brain. Oncol Rep. 2015;34:1701–7.
Martignano F, Munagala U, Crucitta S, Mingrino A, Semeraro R, Del Re M, Petrini I, Magi A, Conticello SG. Nanopore sequencing from liquid biopsy: analysis of copy number variations from cell-free DNA of lung cancer patients. Mol Cancer. 2021;20:32.
Hieronymus H, Murali R, Tin A, Yadav K, Abida W, Moller H, Berney D, Scher H, Carver B, Scardino P, et al. Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death. Elife. 2018. https://doiorg.publicaciones.saludcastillayleon.es/10.7554/eLife.37294.
Backenroth D, Homsy J, Murillo LR, Glessner J, Lin E, Brueckner M, Lifton R, Goldmuntz E, Chung WK, Shen Y. CANOES: detecting rare copy number variants from whole exome sequencing data. Nucleic Acids Res. 2014;42: e97.
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20:257.
Howlader N, Forjaz G, Mooradian MJ, Meza R, Kong CY, Cronin KA, Mariotto AB, Lowy DR, Feuer EJ. The effect of advances in lung-cancer treatment on population mortality. N Engl J Med. 2020;383:640–9.
Gao B, Baudis M. Signatures of discriminative copy number aberrations in 31 cancer subtypes. Front Genet. 2021;12: 654887.
Cancer Genome Atlas Research N: Comprehensive genomic characterization of squamous cell lung cancers. Nature; 2012, 489: 519–525.
Jin J, Gan Y, Liu H, Wang Z, Yuan J, Deng T, Zhou Y, Zhu Y, Zhu H, Yang S, et al. Diminishing microbiome richness and distinction in the lower respiratory tract of lung cancer patients: a multiple comparative study design with independent validation. Lung Cancer. 2019;136:129–35.
Zheng L, Sun R, Zhu Y, Li Z, She X, Jian X, Yu F, Deng X, Sai B, Wang L, et al. Lung microbiome alterations in NSCLC patients. Sci Rep. 2021;11:11736.
Han W, Wang N, Han M, Liu X, Sun T, Xu J. Identification of microbial markers associated with lung cancer based on multi-cohort 16 s rRNA analyses: a systematic review and meta-analysis. Cancer Med. 2023;12:19301–19.
Langelier C, Kalantar KL, Moazed F, Wilson MR, Crawford ED, Deiss T, Belzer A, Bolourchi S, Caldera S, Fung M, et al. Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults. Proc Natl Acad Sci U S A. 2018;115:E12353–62.
Goto T. Microbiota and lung cancer. Semin Cancer Biol. 2022;86:1–10.
Tsay JJ, Wu BG, Badri MH, Clemente JC, Shen N, Meyn P, Li Y, Yie TA, Lhakhang T, Olsen E, et al. Airway microbiota is associated with upregulation of the PI3K pathway in lung cancer. Am J Respir Crit Care Med. 2018;198:1188–98.
Liu HX, Tao LL, Zhang J, Zhu YG, Zheng Y, Liu D, Zhou M, Ke H, Shi MM, Qu JM. Difference of lower airway microbiome in bilateral protected specimen brush between lung cancer patients with unilateral lobar masses and control subjects. Int J Cancer. 2018;142:769–78.
Acknowledgements
We thank Ruotong Ren, Wenchao Ding, Wenjie Wu, Zhenshan Du, Hongyun Sun, and Wenfang Cao from MatriDx Biotechnology Co., Ltd. for their assistance with onco-mNGS data acquisition, CNV data analysis, and microbial data analysis.
Funding
This work was supported by grants from Youth Fund of National Natural Science Foundation of China (NSFC) (No. 82100118), National Key Research and Development Program (No. 2021YFC2301101), the Science Foundation for the State Key Laboratory for Infectious Disease Prevention and Control of China (No. 2022SKLID308), National Natural Science Foundation of China (NSFC) (No. 92048203), and Guangdong Special Support Program Project 2024. The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
Zhengtu Li and Feng Ye are the primary physicians who provided diagnosis and treatment of the patients. Shaoqiang Li, Yangqing Zhan, Yan Wang and Weilong Li and Xidong Wang and Haoru Wang and Wenjun Sun collected and analyzed clinical and sequencing data. Shaoqiang Li, Xuefang Cao and Zhengtu Li wrote the manuscript. Shaoqiang Li and Feng Ye designed the project. All authors have read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
The study was approved by the Ethics Committee of First Affiliated Hospital, Guangzhou Medical University with approval number of 2022No.51, and all data were anonymized prior to analysis. The study was conducted in accordance with the Declaration of Helsinki and the study data were obtained from Department of Respiratory, First Affiliated Hospital, Guangzhou Medical University. Informed consent was obtained from all participants or their legal guardians.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, S., Zhan, Y., Wang, Y. et al. One-step diagnosis of infection and lung cancer using metagenomic sequencing. Respir Res 26, 48 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12931-025-03127-7
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12931-025-03127-7