Genetic risk score (GRS) constructed from polymorphisms in the PON1, IL-6, ITGB3, and ALDH2 genes is associated with the risk of coronary artery disease in Pakistani subjects

Background Coronary artery disease (CAD) is a major killer in today’s world. Pakistan is also affected by this non-communicable disease like other countries. It is a multifactorial disease and is influenced by many gene-gene and gene-environment interactions. Methods A total of 623 (219 controls, 404 cases) Pakistani subjects were genotyped for four SNPs, rs662 (PON1), rs5918 (ITGB3), rs671 (ALDH2), rs1800795 (IL-6) by PCR-RFLP. Various anthropometric parameters were noted and serum lipid profile was measured using commercially available kits. Statistical analysis was done by SPSS version 22. A Genetic Risk Score (GRS) was calculated from individual SNPs. The association of the SNPs and the GRS with CAD was checked using logistic regression. Results The results showed that the risk allele frequencies of all variants were higher in the cases than the controls, however the difference was not statistically significant association (p > 0.0125). The mean GRS in the controls was 3.99 ± 1.42 and in cases, it was 4.29 ± 1.39, the difference between the groups was significant (p = 0.0109). logistic regression of individual SNPs and GRS with the CAD showed that independent SNPs were not significantly associated with the CAD however, the GRS had a strong association (p = 1.4 × 10− 4). The subjects were divided into three groups based on GRS (Gp 1 with GRS 0–2, Gp 2 with GRS 3–5 and Gp 3 with GRS 6–8). The analysis of the effect of the individual SNPs and GRS groups on different lipid profile parameters revealed no significant association of any of the tested SNPs with any lipid parameter, however, the GRS groups showed marginally significant for TC and highly significant association for TG, LDL-c and HDL-c. Conclusion In conclusion, use of a GRS can provide better information than individual SNPs. The larger the number of the SNPs included in the analysis, the better would be the risk prediction.


Background
Cardiovascular diseases (CVDs) are disorders in the two systems, the heart and the circulatory system. Among the group of cardiovascular disorders, coronary artery disease (CAD) is the most frequent [1]. The constriction of blood vessels due to atherosclerosis, leads to the poor blood supply to the heart [2]. In the developing countries its prevalence ranked the highest [1]. The increasing prevalence of coronary heart disease (CHD) in the South Asian countries poses a threat and huge burden to healthcare. The major reason of this increase was the adoption of modern lifestyle which also increases the risk of metabolic syndrome [3]. According to the World Health Organization (WHO) report from Global Burden of Disease (GBD), cardiovascular diseases are responsible for 31% of all global deaths. The mortality rate due to coronary artery disease alone in the year 2012 was 7.4 million while the total deaths due to cardiovascular disease was 17.5 million [4]. In Pakistan, one in every four adults suffered from coronary artery disease [5].
Coronary artery disease is a multifactorial disorder involving a complex interaction between environmental and genetic factors [6]. The modifiable and non modifiable risk factors are two broad categories of conventional risk factors [7]. Smoking, obesity, diabetes mellitus, hypertension, dyslipidemia, stress, depression and sedentary lifestyle are the modifiable risk factors while gender, age and family history are the non-modifiable risk factors [8]. We selected four SNPs from different genes. The rs662 SNP is located in exon 6 of paraoxanase 1 (PON1) gene which results in the substitution of A to G (arginine (R) at the place of glutamine (Q) in the protein) [9]. This single nucleotide polymorphism affects the catalytic activity for different substrates hydrolysis [10]. The change in catalytic activity is substrate dependent as Q allele hydrolyzes soman, diazoxon and sarin rapidly while R allele hydrolyzes paraoxon more efficiently [11]. The R allele carriers are more susceptible to cardiovascular disorders as Q192 is more effective in inhibiting oxidation of low density lipoprotein cholesterol as compared to R192 isoform [10]. The second selected SNP is rs5918 from integrin beta 3 (ITGB3) gene, present on chromosome 17 long (q) arm at 21.32 position. The product of ITGB3 gene is a protein known as integrin beta III. It is also known as platelet glycoprotein IIIa, GP3A, GPIIIa or antigen CD61. A single base change in the exon 2 of this gene (C to T) results substitution of leucine (PlA1) amino acid for proline amino acid (PlA2) [12]. This polymorphism results in the different spatial orientation and conformational change in protein in fibrinogen-binding region [12]. It is suggested that this polymorphism has an important role in the progression coronary artery disease and coronary thrombosis [13] because a key event in acute coronary syndrome is the platelet aggregation with thrombus formation [14]. The third SNP was rs671 in the aldehyde dehydrogenase 2 (ALDH2) gene located on chromosome 12q24.2 [15]. Enzyme activity of ALDH2 reduces due to rs671 (G > A) polymorphism i.e. glutamate-to-lysine amino acid substitution at the protein level (also known as Glu504Lys) [16]. The G allele encodes a functional ALDH2 enzyme needed for aldehyde detoxification, but substituted A allele makes a non-functional isozyme [16]. The fourth selected SNP was from the promotor region of interleukin 6 (IL-6) gene and involves a change of guanine to cytosine, at position − 174. The encoded protein is important in inflammation resulting in increased oxidative stress inside the coronary arteries. These SNPs were selected for analysis in Pakistani subjects because 1) Pakistani population represents a unique ethnic group which allows the study of concentrated risk genetic markers 2) the selected SNPs have been reported to modulate serum lipids in Caucasians therefore their analysis in Pakistan can provide the information on the relationship of these genetic variants with lipids and 3) these SNPs have not been previously investigated in the Pakistani population and the current study is the first report of their investigation in our population. We therefore, aimed to genotype them to find their association pattern with CAD in our cohort and to compare the association of single SNPs and their cumulative genetic risk score.

Methods
For the current study, a total of 623 subjects (219 controls, 404 CAD cases) were recruited. The recruitment, inclusion and exclusion criteria have been described in detail earlier [17]. The patients were angiographically confirmed CAD caes with stenosis of at least one major coronary vessel (50% 0r more of diameter) and diagnosed by cardiologists using biochemical markers like CK-MB, troponin T/I data, echocardiography, ECG, and radiological investigation. The patients were not using lipid-lowering and antihypertensive drugs. Only newly diagnosed cases were included in the study. The controls without any history of CAD at least in first-degree relative were selected [18]. Exclusion criteria for cases were clinical diagnosis of cardiomyopathy, coagulopathy, collagenoses, presence of inflammatory and autoimmune diseases and acute poisoning and for controls were the presence of symptoms of CAD, myocardial infarction (MI), stroke, diabetes mellitus, inflammatory and autoimmune diseases and a familial history of cardiovascular diseases. Serum screening and any infectious blood sample like HIV, hepatitis B and C were not included. Study subjects below 40 years were also excluded. Ethical protocols were strictly followed included all procedures which were in compliance with the declaration of Helsinki and approval was obtained from the institutional ethical committee (Ethical Committee, School of Biological Sciences, University of the Punjab, Lahore, Pakistan).

Anthropometric measurements
For each study participant gender, age and smoking habit were recorded. The prevalence of comorbidities was noted for the cases and controls. Height (m) and weight (Kg) were measured and body mass index (BMI) in Kg/m 2 was calculated for every study participant.

Blood sampling
Venous blood was taken from the median cubital vein by using aseptic technique. 5 ml of blood sample was taken which was divided into two parts. In one part EDTA was added to prevent clotting of blood while the rest was poured in yellow vials with gel clot activator to accelerate blood clotting. The blood in EDTA vials was used for DNA isolation and stored at 4°C while blood in gel vial was used for obtaining serum to be used for the determination of various biochemical parameters. Serum was separated from blood cells by centrifugation of gel activator vials at 14,000 rpm for 10 min and stored in sterilized eppendorf. Prior to used serum for determination of biochemical parameters, serum was screened for any infectious disease. Serum samples were screened by using a one-step device Accu-chek ® for hepatitis b (HBV), hepatitis c (HCV) and human immunodeficiency virus (HIV) infection.

Biochemical parameters determination
Serum lipid profile parameters including total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C) and high-density lipoprotein cholesterol (HDLC) were measured by using commercially variable kits (Spectrum Diagnostics, Obour City, Egypt). Determination of all optical density measurements was done by Epoch microplate reader (Biotek Instruments, Highlands Park, USA).

Statistical analysis
Microsoft Excel and statistical Package for the Social Sciences (SPSS, IBM statistics version 22) were used for the statistical analysis. Allele and genotype frequencies were calculated for each SNP and the study population was tested for Hardy Weinberg Equilibrium (HWE). The significance of difference of allele and genotype frequencies among cases and controls was checked by chi-squared test. Independent t-test was used to test the difference in mean values of quantitative variables between two groups. Logistic regression analysis was done to check the association of SNPs with the CAD. One way ANOVA (analysis of variance) was used to check the effect of the polymorphisms on lipid profile parameters. GRS was calculated for each study subject by following method: the genotypes were unanimously coded as 0 for homozygous protective genotype, 1 for heterozygous and 2 for homozygous polymorphic genotypes. A summation term was then created by adding the risk allele count for each participant. The mean GRS value of cases and controls was compared by t-test. As four SNPs were included, a subject could have a minimum of 0 and maximum of 8 risk alleles. The GRS was divided into three groups, Group I with risk allele count 0-2, Group II with risk allele count 3-5 and Group III with risk allele count 6-8. A corrected p-value less than 0.05 of 0.0125 was used as a statistical cutoff for all tests because of inclusion of four SNPs (0.05/4 = 0.0125).

Study subjects' characteristics
The general characteristics of the study participants have been described in detail elsewhere [17]. The controls had 119 males and 100 females while the cases had 238 males and 166 females. The mean age of the two groups differed significantly, however the controls on the average are of an older age indicating that they have been disease free for a longer time. The prevalence of hypertension and diabetes and smoking was high among patients as compared to the controls. The cases had a more atherogenic lipid profile compared to the controls. The baseline characteristics of the subjects are summarized in Table 1.

Allele and genotype frequencies of the selected SNPS
The allele/genotype frequencies of all the SNPs are given in Table 2 MAFs for all SNPs were higher in the cases compared to the controls, however, the difference was statistically insignificant. The logistic regression analysis also revealed a non-significant association of the variants with the CAD.

Genetic risk score (GRS) analysis
The GRS was analyzed for descriptive parameters. The minimum GRS in the controls was 0 and in the cases was 1 while maximum GRS in the controls was 6 while in the cases it was 8. The mean GRS was 3.99 ± 1.42 in the controls and 4.29 ± 1.39 in the cases and this difference was statistically significant (p = 0.011). The association of GRS with the CAD was checked by logistic regression and was found to be significantly associated (OR: 4.12, CI: 1.003-6.781, p = 1.4 × 10 − 4 . The frequency of the subjects in each GRS group was analyzed. There were 66 (10.6%) subjects in group I (GRS = 0-2), 452 (72.6%) subjects in group II (GRS 3-5) and 105 subjects in group III (GRS 6-8).

Comparison of the effect of individual SNPs and GRS on lipid profile parameters
The effect of individual SNPs across the three genotypes and of GRS across three groups on lipid profile parameters was analyzed and the results are shown in Table 3. The PON1 SNP rs662 increases TG and decreases HDL-c when the subjects with at least one risk allele are compared to the common homozygotes, however, this effect is not statistically significant whereas the SNP has no effect on TC and LDL-C. The ITGB3 SNP rs5918 mildly increased TC, TC and LDL-C and decreased HDL-C but the difference was not significant. The ALDH2 SNP rs671 risk allele increased TC and TG but the effect was insignificant and had no effect on LDL-C and HDL-C. The IL-6 SNP rs1800795 moderately increased TC, TG and LDL-C and decreased HDL-C, but the effect was still insignificant. When the same analysis was done for the GRS groups, it was clear that the group III with highest GRS had significant effect on all lipid parameters with the strongest association with decrease in HDL-C (p = 1.5 × 10 − 3 ) followed by increase in TG (p = 0.001), LDL-C (p = 0.005) and TC (p = 0.012).

Discussion
In the current study, we selected a set of four SNPs from different genes known to affect the coronary arteries and genotyped them in a cohort of Pakistani individuals to construct a GRS and investigate whether the use of a GRS can provide better information compared to the individual SNPs. The Pakistani population represents a unique tool to investigate the contribution of genetic markers to diseases based on the restricted religious, social and cultural setting.
The allele and genotype frequencies of the selected SNPs showed that the cases had a higher MAF for all SNPs as compared to the controls, however, except the PON1 SNP, this difference couldn't reach statistical significance. This is an indication of the low-modest effect size of the individual variants. At the same time, it must be kept in mind that the SNPs have been genotyped only in a set of participants and the frequencies can be different in the general population therefore, the effect sizes of the SNPs may seem different because of the smaller sample size not actually because they have little role in the CAD progression. These SNPs have previously been reported to be associated with CAD in different populations [19][20][21][22].
We showed that the GRS had a significant association with the CAD even when none of the individual variants had a statistically significant association with the disease. This indicated the cumulative power of the GRS over individual SNPs because the risk homozygote frequencies   for all SNPs were higher in the cases than the controls but for single SNPs this difference could not achieve statistical significance. However, when GRS of these SNPs was used, the association was very apparent. The GRS between the cases and the controls overlapped and the difference was a bit smaller. This can, to some extent, be attributed to relatively small sample size. However, the difference is still conspicuous and if the number of the SNPs is increased as well as the sample size, the results can be highly applicable.
Regarding the association of individual, similar was the case for the relationship of the individual variants and the GRS with the lipid profile parameters. Individual variants showed mild but insignificant association with the atherogenic profile but the GRS had a highly significant association with the increased TC, TG, LDL-C and decreased HDL-C concentrations. A previous study of Pakistani individuals used the same approach to construct a GRS for 21 variants [23]. This study reported similar findings as reported by us. The current study in combination with the previous study can be used to generate a panel of common variants that can be clinically implemented to calculate a lifetime risk of an individual for developing CAD.
The current study had the limitation of relatively small sample size and inability to include more SNPs. In addition, the more the biochemical parameters included, the more appropriately the mechanism of action of the SNPs could be elucidated. In future, therefore more research with larger sample size, more variants and biochemical parameters should be done on this unique ethnic group so that 1 day a strategy for preventing this fatal condition can be designed.

Conclusion
In conclusion, a GRS can provide better information for disease association compared to the single SNPs. However, the panel of such SNPs needs to be carefully designed so that the included SNPs are representative of all the candidate and GWAS genes, are of modest effect size and intermediate frequency and have been previously reported to be important in disease predisposition in various ethnicities. This information can then be added to the conventional risk factors so that the high risk individuals can be diagnosed prior to the onset of disease.