Lipid metabolic gene-wide pro�le and survival signature of lung adenocarcinoma

Background: Lung cancer is the cancer with high morbidity and mortality across the globe, and lung adenocarcinoma (LUAD) is the most common histologic subtype. The disorder of lipid metabolism is related to the development of cancer. Analysis of lipid-related transcriptome helps shed light on diagnosis and prognostic biomarkers of LUAD. Methods: In this study, we performed an expression analysis of 1045 lipid metabolism-related genes between LUAD tumors and normal tissues from the Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) cohort. The interaction network of differential expression genes (DEGs) was constructed to identify. The association between hub genes and overall survival was evaluated and formed a model to predict the prognosis of LUAD using a nomogram, and the model was validated by another cohort (GSE13213). Results: Finally, a total of 217 lipid metabolism-related DEGs were detected in LUAD. They were signi�cantly enriched in Glycerophospholipid metabolism, fatty acid metabolic process, and Eicosanoid Signaling. Then we identi�ed 6 hub genes through network and cytoHubba, including INS, LPL, HPGDS, DGAT1, UGT1A6, and CYP2C9. The high expression of CYP2C9, UGT1A6, and INS, whereas low expressions of DGAT1, HPGDS, and LPL, were associated with worse overall survival (OS) for 1925 LUAD patients. Our model found that the high-risk score group had a worse OS, and the validated cohort had the same result. Conclusion: This study constructed a signature of six lipid metabolic genes, which was signi�cantly associated with the diagnosis and prognosis of LUAD patients. The gene signature can be used as a biomarker for LUAD in the term of lipid metabolic.


Background
Lung cancer is the most commonly diagnosed cancer (11.6% of the total cases) and the leading cause of cancer death (18.4% of the total cancer deaths) in the world [1].Among the subtype of lung cancers, adenocarcinoma is the most common histologic subtype of lung cancer in men and women [2].A 2005-2014 epidemiological survey from China showed that the proportion of adenocarcinoma increased from 36.4% to 53.5%, while the proportion of squamous carcinoma decreased from 45.4% to 34.4% [3].The increasing incidence of lung adenocarcinoma (LUAD) has also been reported to be associated with air pollution-related factors [4][5][6].Researches reported PM2.5 increases pro-in ammatory lipid metabolism in the lung and was associated with lipid alterations [7,8].The importance of alterations related to lipid metabolism is starting to be recognized, and the increase in de novo lipogenesis is considered a new hallmark in many aggressive cancers [9].Lipid pro les of blood plasma exosomes could be used for early detection of the prevalent non-small cell lung cancers (NSCLC) [10].Epidemiological data indicated that a certain number of lung cancer patients with high high-density lipoprotein cholesterol(HDL-C) and lowdensity lipoprotein (LDL) and low-density lipoprotein receptor (LDLR) level has better survival in patients [11,12].Compared with healthy subjects, NSCLC patients showed signi cant increases in phosphatidylcholine (PCs) and phosphatidylethanolamine (PEs) [13].Other lipid metabolism indicators associated with LUAD includes sphingomyelins, phosphatidylinositols, phosphatidylserines, phosphatidylethanolamine, phospholipids, and phosphatidylcholine [14].The cancer cells' requirement of metabolic intermediates for macromolecule production is overwhelming.Fatty acid oxidation(FAO) can help to generate ATP to support the membranes formation, energy storage, production of signaling molecules by coordinating the activation of lipid anabolic metabolism [15].The regulation of lipid metabolic to LUAD is still being explored.Knowing the lipid-related mechanism of the LUAD phenotype will inform better clinical interventions.
To explore the further lipid mentalism relating to regulation network and pathway, we used an integrated bioinformatic method to construct the transcriptome-wide pro le; and a signature of lipid-related genes was analyzed to explore the potential biomarkers for diagnosis and prognostic of LUAD in the term of lipid metabolism disorder.

Patients and datasets
We downloaded 519 lung adenocarcinoma (LUAD) tissues and 58 normal tissues with mRNA expression data and clinical information from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/)database using the R package TCGAbiolinks [16].The ensemble ID of TCGA samples was annotated with human genes GRCh38/hg38.Then we downloaded 117 LUAD tissues with mRNA expression data and clinical information from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo)(GSE13213) using the R package GEOquery [17] to validate the availability of nal prediction model.
Identi cation of lipid metabolism-related differentially expressed genes After using lipid-speci c keywords (fatty acyls, glycerolipids, glycerophospholipids, sphingolipids, sterol lipids, prenol lipids, saccharolipids, and polyketides), 21 lipid metabolism-related pathways and ve lipid metabolism-related gene sets were collected from the Kyoto Encyclopedia of Genes and Genomes (KEGG) web site (http://www.kegg.jp/blastkoala/)[18] and the Molecular Signatures Database (MisDB) web site (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) [19], respectively (Additional le 1).After removing the overlapped genes, a total of 1045 lipid metabolism-related genes were obtained.Lipid metabolism-related differentially expressed genes (DEGs) between LUAD tissues and normal tissues were screened through R package edgeR [20].The parameters set for differential expression analysis were FDR<0.05 and |log2 fold change| (logFC)>1.Bioinformatic analysis We used the R package clusterPro ler to furtherly explore the biological signi cance of lipid metabolismrelated DEGs [21].In GO and KEGG analysis, FDR < 0.05 was considered a signi cant enrichment.Then we uploaded the DEGs that containing gene identi ers and corresponding FDR values and log 2 FC values into the IPA software (Qiagen).The "core analysis" function included in the software was used to interpret the DEGs.

Interaction network generation and hub genes analysis
We built an interaction network of differentially expressed lipid metabolism-related genes using the Search Tool for the Retrieval of Interacting Genes (STRING, http://string-db.org/)database [22].The combined score of ≥0.4 was the cut-off value.Cytoscape software (version 3.6.0)was used to visualize networks [23].According to 12 ranking methods in cytoHubba [24], an APP in Cytoscape, the top ten genes of each method were selected for overlap analysis, and the genes with the highest number of overlaps were used as hub genes and the potential biomarkers.

Survival analysis
The overall survival (OS) analysis of hub genes was shown by Kaplan-Meier Plotter (http://kmplot.com/analysis/),which includes clinical data and gene expression information for 719 lung cancer LUAD patients from GEO and TCGA database [25].Then, the information on the number of cases along with median values of mRNA expression levels, hazard ratios (HR) with 95% con dence intervals (CI), and log-rank P-values were extracted from the KM plotter webpage.Log-rank P-values < 0.05 were considered statistically signi cant.Prediction model Based on the selected hub genes, we use the nomogram package of R ("rms") [26] to develop a model to evaluate the prognosis of TCGA-LUAD patients.Using the formula of the nomogram, we calculated the prognosis score of each patient.According to the score, patients were divided into a low-risk score group and a high-risk score group using the median classi cation method.The prognosis score was validated by the patients' actual prognosis outcome.Then we downloaded hub genes expression data and clinical information of 117 LUAD patients from another data set (GSE13213), and calculated the prognosis score of each patient by the formula of the nomogram.Then they also were divided into two groups using the median classi cation method to perform the survival analysis to validate the availability of this model.

Results
Identi cation and functional analysis of lipid metabolism-related DEGs A total of 217 lipid metabolism-related DEGs were identi ed from the TCGA-LUAD cohort.A volcano plot was constructed to reveal the signi cant DEGs (Fig. 1A), and a heatmap was created to show the hierarchical clustering analysis of the DEGs (Fig. 1B).To get an overall understanding of 217 lipid metabolism-related DEGs, we conducted GO terms and KEGG pathway enrichment using clusterPro ler package, while canonical pathways analysis by IPA.The results of KEGG pathway enrichment showed that DEGs were signi cantly enriched in arachidonic acid metabolism, metabolism of xenobiotics by cytochrome P450, glycerophospholipid metabolism, and steroid hormone biosynthesis.In contrast, they were signi cantly enriched in fatty acid metabolic process, glycerolipid metabolic process, fatty acid dericative metabolic process, and steroid metabolic process from GO terms (Fig. 1C).The genes in each KEGG pathway and GO term were shown in the additional le 2. IPA identi ed signi cant canonical networks associated with the DEGs.IPA showed that the top canonical pathways associated with common DEGs were eicosanoid signaling, FXR/RXR activation, and atherosclerosis signaling (Fig. 1D).
Combining the results of three functional analyses showed that, the DEGs mainly overlapped in glycerophospholipid and steroid metabolism.And the non-overlapping pathways showed more information indicating further exploration of the role of lipid metabolism in LUAD.
Interaction network construction and cytoHubba analysis Lipid metabolism-related DEGs were analyzed by the STRING tool.Ultimately, an interaction network with 216 nodes and 1140 edges was established and visualized in Cytoscape (Fig 2).Then a total of 6 hub genes were identi ed by the overlap of the top 10 genes according to 12 ranked methods in cytoHubba (Additional le 3).Moreover, these genes were related to Insulin (INS), Lipoprotein Lipase (LPL), Hematopoietic Prostaglandin D Synthase (HPGDS), Diacylglycerol O-Acyltransferase 1 (DGAT1), UDP Glucuronosyltransferase Family 1 Member A6 (UGT1A6), and Cytochrome P450 Family 2 Subfamily C Member 9 (CYP2C9).

Prediction model based on survival-related hub genes and validation
Based on the Cox regression model, a nomogram was built to predict the prognosis of TCGA-LUAD patients, using the mRNA expression of the six survival-related hub genes (Fig. 4A).The concordance index of the nomogram was 0.61.Then we calculated the prognosis score of each patient, and found that the patients in the high-risk score group had worse OS of 3 years [HR = 1.51 (1.07-2.13),P= 0.02] (Fig. 4B).We validated the model and found that high-risk score group had worse OS [HR = 1.84 (1.00-3.37),P = 0.05] (Fig. 4C).

Discussion
Metabolic change has been widely observed in cancer cells [27].Among those metabolisms, lipid metabolism widely participates in the regulation of many cellular processes such as cell growth, proliferation, differentiation, survival, apoptosis, in ammation, motility, membrane homeostasis, chemotherapy response, and drug resistance [28].Some recent studies have reported some component of PM2.5 which has been reported as the risk factors of lung cancer [29][30][31], for the component of PM2.5 promotes pulmonary injury by modifying lipid metabolism [7] and might develop to lung cancer.However, there are fewer researches regarding the association between lipid metabolism and lung cancer in the term of transcriptome-wide analysis.This study used a LUAD cohort to generate the transcriptome-wide pro le of lipid-related that includes 217 genes.The enrichment biological pathway found in LUAD included fatty acid, glycerolipid, and glycerophospholipids were the primary driven enrichment biological function that has been reported [32].Besides, arachidonic acid metabolism, PPAR signaling pathway, insulin resistance, eicosanoids signaling, and other pathways were also reported in cancer [33][34][35][36][37].
The results indicate that LUAD-related lipid metabolism was associated with nicotine, estrogen biosynthesis, melatonin, and atherosclerosis.Similar to PM2.5, nicotine may promote LUAD development regulated by lipid disordered.The interaction between estrogen biosynthesis and lipid metabolic may be one of the high-risk factors for LUAD, which is consistent with the observation that LUAD incidence is rising in women, and the incidence rate among female was higher than that among men [38]; Lipid and cancer-related genes were enriched in atherosclerosis and cancer.For long-term survival LUAD patients, their health management should be involved by oncologists and cardiologists [39].
We constructed the network of those genes that are related to lipid and LUAD and nd six hub genes.CYP2C9, which is a drug target of lung cancer, can be slowed by cytochrome P450; and the tumorigenesis was regulated [44,45].LUAD patients with a lower expression of CYP2C9 have a better prognosis.UGT1A variants may play only a minor role in other lung cancer risk [46].LUAD patients with a lower expression of UGT1A6 have a better prognosis.DGAT1 catalyzes the nal step in triglyceride synthesis [47].LPL is a key lipolytic enzyme that plays a crucial role in the catabolism of triglycerides in TG-rich particles [48].
Both of them are involved in triglyceride synthesis.And triglyceride was reported with HPGDS has the therapeutic potential in allergic in ammation [49].Serum triglyceride concentrations were reported to be involved in the pathogenesis of lung cancer[50].Those three genes were positively related to survival time.INS encodes insulin and plays a vital role in the regulation of carbohydrate and lipid metabolism.LUAD patients with a lower expression of INS have a better prognosis.The regulation of triglyceride synthesis, insulin, and in ammation control may be the effective intervention of LUAD patients.Based on those six genes, a risk model was constructed.LUAD patients from two cohorts with a lower risk score had a better prognosis.

Strengths and limitations
The main strength of the study is the establishment of a lipid metabolic transcriptome-wide pro le of LUAD and a gene signature that signi cantly associated with the diagnosis and prognosis of LUAD patients in the term of lipid metabolism.Limitations include: 1) the data eld information of these two cohorts is limited, which leading the covariables related to LUAD missed and brought bias; 2) the further internal mechanism of these six lipid-related genes cannot be illuminated in this study.A well-designed experiment based on our results was required in further research.

Conclusions
In summary, we generated a lipid metabolic transcriptome-wide pro le of LUAD patients and found that signi cant lipid metabolic pathways were correlated with the LUAD.A signature of six lipid metabolic genes was signi cantly associated with the diagnosis and prognosis of LUAD patients.The gene signature can be used as a biomarker for LUAD, and the guidance to prevent the occurrence of LUAD and improve the prognosis of LUAD patients.In (A) and (B), red, white, and blue represent higher expression levels, no expression differences, and lower expression levels, respectively.Supplementary Files

Figures
Figures