Microarray analysis of long non-coding RNA expression profiles in low high-density lipoprotein cholesterol disease

Background Low high-density lipoprotein cholesterol (HDL-C) disease with unknown etiology has a high prevalence in the Xinjiang Kazak population. In this study, long noncoding RNAs (lncRNAs) that might play a role in low HDL-C disease were identified. Methods Plasma samples from 10 eligible individuals with low HDL disease and 10 individuals with normal HDL-C levels were collected. The lncRNA profiles for 20 Xinjiang Kazak individuals were measured using microarray analysis. Results Differentially expressed lncRNAs and mRNAs with fold-change values not less than 1.5 and FDR-adjusted P-values less than 0.05 were screened. Bioinformatic analyses, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and network analyses, were used to determine relevant signaling pathways and predict potential target genes. In total, 381 lncRNAs and 370 mRNAs were differentially expressed based on microarray analysis. Compared with those in healthy individuals, several lncRNAs were upregulated or downregulated in patients with low HDL-C disease, among which TCONS_00006679 was most significantly upregulated and TCONS_00011823 was most significantly downregulated. GO and KEGG pathway analyses as well as co-expression networks of lncRNAs and mRNAs revealed that the platelet activation pathway and cardiovascular disease were associated with low HDL-C disease. Conclusions Potential target genes integrin beta-3 (ITGB3) and thromboxane A2 receptor (TBXA2R) were regulated by the lncRNAs AP001033.3–201 and AC068234.2–202, respectively. Both genes were associated with cardiovascular disease and were involved in the platelet activation pathway. AP001033.3–201 and AC068234.2–202 were associated with low HDL-C disease and could play a role in platelet activation in cardiovascular disease. These results reveal the potential etiology of dyslipidemia in the Xinjiang Kazakh population and lay the foundation for further validation using large sample sizes.


Background
The dyslipidemia epidemic is a public health concern affecting millions of individuals worldwide. Low highdensity lipoprotein cholesterol (HDL-C) disease is an important type of dyslipidemia, as demonstrated by an epidemiological study showing that HDL-C levels are inversely associated with the development of cardiovascular disease [1].
HDL-C is thought to provide protection against cardiovascular disease by regulating cholesterol efflux from peripheral tissues [2], and low HDL-C disease has been recognized as a complex, multifactorial polygenic trait regulated by multiple genes and environmental factors [3].
In a genome-wide association study, almost all screened single nucleotide polymorphisms (SNPs) were located in non-coding regions; however, the contribution of long non-coding RNAs (lncRNAs) to low HDL-C disease remains unexplored.
LncRNAs, transcripts longer than 200 nucleotides lacking coding potential [10], play crucial roles in various key biological processes and human diseases [11]. However, research on the role of lncRNAs in the development of dyslipidemia is still in its preliminary stage [12,13].
In the present study, gene chip technology was applied to identify genome-wide differences in the lncRNA composition between the serum of healthy subjects and individuals with low HDL-C disease from a Kazakh population. Differentially expressed lncRNAs and mRNAs were examined and the relationships among these differentially expressed genes and their contributions to low HDL-C disease were analyzed.

Study population
A total of 20 subjects were recruited, including 10 subjects with low HDL-C disease and 10 healthy individuals. The inclusion criterion for subjects with low HDL-C disease was serum HDL-C level < 1.04 mmol/L [14].
Exclusion criteria were the presence of obesity, tumor, diabetes, coronary artery disease (CAD), stroke, and medication history, such as NSAIDs (such as aspirin).
Whole blood (5 mL) from each participant was collected in an EDTA-containing tube and stored at − 80°C until use.

Microarray experiment Fabrication of the DNA microarray
The Capital Biotech Human lncRNA+mRNA Array V4.0 was designed with four identical arrays per slide (4 × 180 K format), each array containing probes for approximately 41,000 human lncRNAs and approximately 34, 000 human mRNAs. The target lncRNA and mRNA sequences were merged from multiple databases, including 23,898 from GENCODE/ENSEMBL, 14353 from Human LincRNA Catalog [15], 7760 from RefSeq, 5627 from the UCSC database, 13,701 from NRED (ncRNA Expression Database), 21,488 from LNCipedia, 1038 from H-InvDB, 1053 from the Antisense ncRNA pipeline, 407 Hox ncRNAs, 962 UCRs, and 848 from the Chen Ruisheng lab (Institute of Biophysics, Chinese Academy of Science, China). Probes for each RNA were included in duplicate. The array also contained 4974 Agilent control probes.

RNA extraction, labeling, and hybridization
Total RNA (including lncRNAs) was extracted from the peripheral serum using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) and purified using the NucleoSpin® RNA Purification Kit (Life Technologies Corporation, Grand Island, NY, USA) according to the manufacturer's protocol. Spectrophotometry (NanoDrop ND-1000) was performed to determine RNA purity and concentration at OD260/280. RNA integrity was confirmed using 1% formaldehyde denaturing gel electrophoresis (Agilent Technologies, Santa Clara, CA, USA).

RNA amplification, labeling, and hybridization
The cDNA labeled with a fluorescent dye (Cy5 or Cy3-dCTP) was generated using Eberwine's linear RNA amplification method and subsequent enzymatic reaction, as described previously [16]. The CapitalBio cRNA Amplification and Labelling Kit (CapitalBio, Beijing, China) was used to produce higher yields of labeled cDNA. The cRNA amplification and labeling procedure is depicted in Fig. 1.

Microarray imaging and data analysis
Gene Spring GX was used to analyze the lncRNA and mRNA array data, including data aggregation, normalization, and quality control. To select differentially expressed genes, fold changes of ≥1.5 and ≤ 1.5 a Pvalue of < 0.05 for t-tests were selected as thresholds. The Adjust Data function of Cluster 3.0 was used to perform Log2 conversion of the data, median positioning centered on genes, and hierarchical clustering with average links for further analysis. Finally, Java TreeView (Stanford University School of Medicine, Stanford, CA, USA) was used to visualize the tree.

Construction of the coding-non-coding gene coexpression network
Based on a correlation analysis of the differentially expressed lncRNAs and mRNAs, a co-expression network (CNC network) was constructed. For each pair of genes, Pearson correlation coefficients (PCCs) were calculated and pairs with significant correlations were used to construct the network. The open source bioinformatics software Cytoscape was used to generate the lncRNA and mRNA mapping network with a PCC with an absolute value of not less than 0.90 and P < 0.05.

Cis-lncRNA prediction
Cis-acting lncRNA prediction was conducted based on close correlations with a group of expressed proteincoding genes (PCC > 0.90). Each protein-coding gene and lncRNA gene were within 10 kb of each other throughout the genome [17].

Trans-lncRNA prediction
Blat tools (Standalone BLAT v. 35 × 1 fast sequence search command line tool, downloaded from: http:// hgdownload.cse.ucsc.edu/admin/exe/) were used to perform reverse transcription to compare the complete sequence of lncRNA and the 3'UTR of the corresponding co-expressed mRNA with default parameter settings.

Functional analysis of differentially expressed genes GO and KEGG pathway analyses
The differentially expressed genes were input into KOBAS for annotation and visualization. GO was used to identify the molecular functions overrepresented in the gene profile and KEGG was used to analyze the pathways related to the genes, the   significance of GO term enrichment in the differentially expressed mRNAs was denoted by P-value (P < 0.05 was considered statistically significant).
To study the enrichment of potential genes and gene products in biological processes (BP), cellular components (CC), and molecular functions (MF), a GO analysis of differentially expressed mRNAs was performed. The P-value was used to determine significant changes in differentially expressed genes and GO.
To understand whether the dysregulated lncRNAs are involved in the regulation of genes and related signaling pathways associated with low HDL-C disease, potential targets of lncRNAs in the database via cis-and transregulation were predicted.

Statistical analysis
Statistical analyses were performed to compare the two groups in the microarray. Fold change values and Student's t-tests were used to analyze the statistical significance of the microarray results. The threshold value for screening differentially expressed lncRNAs and mRNAs was a fold change of ≥1.5 (P < 0.05). The false discovery rate (FDR) was used to correct the P-value. The differentially expressed genes were analyzed using Student's t-tests implemented in SPSS (version 22.0; SPSS, Chicago, IL, USA). Fisher's exact test was used for the KEGG pathway analysis and to compare healthy subjects with patients, where appropriate. PCC was used for the GO analysis and for the prediction of correlations between lncRNAs and protein-coding genes. P < 0.05 was considered statistically significant. Table 1 summarizes the characteristics of all individuals. There were significant differences in HDL-C levels between the two groups (P < 0.05). No differences in sex, age, TG, LDL-C, systolic pressure, diastolic pressure, and BMI were observed between the two groups (P > 0.05).

General characteristics of patients and healthy individuals
Total RNA was extracted and purified from the blood samples. The samples were of good quality and could be used for the microarray analysis. The distributions of lncRNA and mRNA levels are shown in Fig. 2.
lncRNA and mRNA expression profiles Differentially expressed lncRNAs Based on the microarray data, 381 lncRNAs were differentially expressed (fold change ≥1.5, P < 0.05) between individuals with low HDL-C disease and healthy individuals, including 166 up-regulated lncRNAs and 215 down-regulated lncRNAs in the low HDL-C disease group (Fig. 3). A hierarchical clustering analysis was used to arrange specimens into groups according to expression levels (Fig. 4). The top 15 differentially expressed lncRNAs are listed in Tables 2 and 3; TCONS_00006679 was the most significantly upregulated lncRNA (fold change: 2.9543) and TCONS_ 00011823 was the most significantly down-regulated lncRNA (fold change: 4.8541) (Fig. 5).

Differentially expressed mRNAs
According to the microarray results, 370 mRNAs were differentially expressed (fold change ≥1.5, P < 0.05) between the low HDL-C disease and healthy groups, using the same criteria applied to lncRNAs. Among these, 285 mRNAs were up-regulated and 85 mRNAs were downregulated in the disease group. The top 15 differentially expressed mRNAs are listed in Tables 4 and 5. OLFM4 was the most significantly up-regulated mRNA (fold change: 6.86421, P < 0.05) and LOC102723740 was the most significantly down-regulated mRNA (fold change: 4.05710, P < 0.05).
Values on the X and Y axes in the scatter plot are average normalized values for each group (log2-scaled). The lncRNAs shown above the upper red line and below the lower green line are those with expression differences of 1.5 fold change between the two groups; the red and green lines indicate the fold change threshold.
The red points in the plot represent the significantly differentially expressed lncRNAs (fold change > 1.5, P < 0.05), while the vertical lines correspond to 1.5-fold upand downregulation. Horizontal line indicates P = 0.05.     biological process; P = 4.14E-14), and 'oxygen transporter activity' (GO:0005344, ontology: molecular function, P = 5.66E-07). Additionally, a pathway analysis indicated that 23 pathways were significantly enriched among the differentially expressed transcripts (P < 0.05) and the top 15 pathways are shown in Fig. 7a. The most significantly enriched pathway was platelet activation (KEGG ID: hsa04611, P = 4.49E-06). Based on a disease enrichment analysis, we found that the most significantly enriched disease was cardiovascular disease (P = 3.14E-05) (Fig. 7b).

Construction of a co-expression network
A co-expression network was constructed to explore the associations between lncRNAs and potential target mRNAs with differential expression in low HDL-C disease. Using PCC ≥ 0.90 and P < 0.05 as thresholds, 467 pairs of co-expressed lncRNAs and mRNAs were identified (Figs. 8 and 9).

Long non-coding RNA target prediction
Target genes were identified for ten lncRNAs ( Table 6). The lncRNA AP001033.3-201 (ENST00000584509.1) may contribute to the trans-regulation of the proteincoding gene thromboxane A2 receptor (TBXA2R) and the lncRNA AC068234.2-202 (ENST00000575039.1) may act a cis-regulator of the protein-coding gene integrin subunit beta 3 (ITGB3). Both target genes are involved in the platelet activation pathway and cardiovascular disease.

Discussion
Low HDL-C disease has the highest prevalence among the three types of dyslipidemia [18,19] in the Xinjiang Kazakh population. Epidemiological studies have shown that HDL-C levels play a crucial role in the development of atherosclerosis and CAD [20], and low HDL-C disease has been identified as an independent traditional risk factor for CAD [21,22]. Research has mainly focused on the protein-coding genes associated with low HDL-C disease [5,7,8,23,24], and little is known about the underlying genetic mechanisms, including the roles of lncRNAs, which are closely involved in various important biological processes. Researchers have shown that lncRNAs are involved in the regulation of HDL metabolism [25]; HDL in patients with CAD and hypercholesterolemia could cause the abnormal expression of lncRNAs in vascular endothelial cells and have adverse effects on vascular function. A recent microarray analysis has demonstrated that the APOA1 gene cluster is regulated by the lncRNA APOA1-AS [26], which encodes APOA1, a major protein component of HDL in the plasma. Moreover, the lncRNA known as lncHR1 (HCV regulated 1) decreases lipid metabolism by repressing SREBP-1c gene expression [13]. However, the expression profiles of lncRNAs in low HDL-C disease have not been reported to date.
LncRNAs may function by altering the expression of protein-coding genes [27].
They are key regulators of gene expression, and dysregulated expression levels of certain lncRNAs are associated with a variety of diseases, including diseases arising from defects in lipid metabolism [12,28,29]. Generally, cholesterol biosynthesis in the liver is important for the subsequent synthesis of lipids and lipoproteins and for the correct functioning of lipid metabolism [30,31]. Aberrant expression of lncRNAs may induce specific mRNAs to alter HDL-C levels, which would ultimately lead to low HDL-C disease. In this study, genome-wide lncRNA and mRNA expression profiles were compared between individuals with low HDL-C disease and healthy individuals using a microarray approach and potential functions of differentially expressed loci were evaluated by GO and KEGG pathway analyses. A large number of differentially expressed lncRNA and mRNAs between low HDL-C disease and controls were identified. GO and KEGG pathway analyses showed that the differentially expressed genes were involved in a variety of cellular processes, cellular components, and molecular functions.
To elucidate the molecular mechanisms by which lncRNAs contribute to low HDL-C disease, a CNC network was further constructed by combining aberrantly expressed lncRNAs and mRNAs on the basis of lncRNA levels having an effect on the expression of flanking protein-coding genes [32]. The results of the current study indicated that multiple lncRNAs were clearly associated with mRNAs and the co-expression network could provide a strong foundation for predicting the functions of lncRNAs.
Additionally, a pathway analysis identified 23 pathways related to low HDL-C disease. Among these pathways, platelet activation (KEGG ID: hsa04611) involved 14 differentially expression genes and was the most significantly enriched pathway.
A disease enrichment analysis showed that 370 differentially expressed mRNAs were associated with many diseases, such as CAD, hematologic diseases, Bernard- Fig. 8 Coding-non-coding gene co-expression network. Yellow nodes represent lncRNAs, while green nodes represent mRNAs Soulier syndrome, thalassemia, systemic sclerosis, and dilated cardiomyopathy; the greatest enrichment was observed for CAD.
Based on genomic annotation, enrichment, coexpression, and target gene prediction analyses, 10 lncRNAs with target genes were identified. Based on combined GO, KEGG pathway, and disease enrichment analyses, the lncRNAs AC068234.2-202 and AP001033.3-201 were co-expressed with the target genes ITGB3 and TBXA2R.
Finally, the study results indicated that ITGB3 and TBXA2R were both differentially expressed between the two groups and were both related to the platelet activation pathway and cardiovascular disease in a GO enrichment analysis. They were also co-expressed with the lncRNAs AC068234.  Based on a literature review, ITGB3 is an important member of the integrin family of adhesion molecules. It is located on chromosome 17:47,253,827-47,313,743 forward strand. It is associated with thrombus formation and platelet aggregation [33], abnormal platelet activation and abnormal thrombosis [34], and inflammatory responses and atherosclerosis. TBXA2R, located on chromosome 19:3,594,507-3,606,875 reverse strand, may affect platelet function and venous thrombosis [35], platelet aggregation, and the process of atherosclerosis. Usually, TBXA2R is widely distributed in the cardiovascular system. However, the distribution of TBXA2R is altered in various cardiovascular diseases and is involved in pathophysiological processes. Platelet activation plays an important physiological role in inflammation, and extensive clinical and experimental evidence supports the importance of inflammatory processes in both the initiation and development of dyslipidemia [36,37].
Based on the above analysis, the results of this study suggest that the lncRNA AC068234.2-202 is a cis-acting regulator of ITGB3 and the lncRNA AP001033.3-201 is a trans-acting regulator of TBXA2R, contributing to the induction of platelet activation and subsequently to the pathogenesis of CAD. These results provide a foundation for further investigations of lncRNA functions, signaling pathways, and roles in low HDL-C disease.
To conclude, the expression profiles of lncRNAs and mRNAs in low HDL-C disease were determined using a microarray approach, and 381 lncRNAs and 370 mRNAs were identified. An lncRNA-mRNA co-expression network and predicted network were constructed and combined with the results of GO, KEGG pathway, and disease enrichment analyses. Results indicated that the lncRNA AP001033.3-201 is a transregulatory element for the protein-coding gene TBXA2R and the lncRNA AC068234.2-202 is a cisregulatory element for the protein-coding gene ITGB3. Both of these lncRNAs were involved in platelet activation and CAD. Based on these results and microarray data, the results suggest that platelet activation plays an important role in the occurrence and development of low HDL-C disease.
Moreover, Zhong et al. reported that both high and low HDL-C levels (ranging from 22 mg/dL to 97 mg/ dL) were associated with an increased risk of mortality from CAD [38]. The lncRNAs identified in this study provide insights into the potential molecular mechanisms underlying the effects of low HDL-C levels. However, further research is needed to confirm the functions of the candidate differentially expressed lncRNAs and their potential regulatory relationships in low HDL-C disease.

Study strengths and limitations
A large number of differentially expressed lncRNAs and mRNAs in low HDL-C disease were successfully screened in a Kazak population and potential target genes were predicted by microarray and bioinformatic analyses. However, the study had certain limitations. First, it was based on the results of bioinformatics analyses, and further biological experiments, such as qPCR on an independent cohort, are needed to verify the findings. Second, the cross-sectional study design could not reveal whether changes in lncRNA and mRNA expression were primary or secondary changes in the disease. Finally, it is important to acknowledge that the small study cohort might contribute to a reduced statistical power, to some extent.

Conclusions
In this study, lncRNA and mRNA expression profiles in low HDL-C disease were screened in a Kazak population, and a large number of lncRNAs and mRNAs were found to be differentially expressed. GO, KEGG pathway, and disease analyses suggested that the lncRNAs AC068234.2-202 and AP001033.3-201 may regulate the genes ITGB3 and TBXA2R, respectively. These loci are involved in platelet activation and the pathogenesis of CAD. This microarray analysis clarified the expression patterns of lncRNAs in low HDL-C disease and laid the foundation for future functional and mechanistic studies of lncRNAs associated with low HDL-C disease. Furthermore, key candidate genes associated with low HDL-C disease were identified, providing insights into the underlying genetic basis as well as a basis for follow-up research, including functional analyses of lncRNAs and studies of the roles of signaling pathways in the HDL-C disease in the Kazakh population in Xinjiang. In addition, the results of this study improve our understanding of the etiology of both low HDL-C and CAD as well as the relationship between the diseases.