Risk prediction model of dyslipidaemia over a 5-year period based on the Taiwan MJ health check-up longitudinal database

Objective This study aimed to provide an epidemiological model to evaluate the risk of developing dyslipidaemia within 5 years in the Taiwanese population. Methods A cohort of 11,345 subjects aged 35–74 years and was non-dyslipidaemia in the initial year 1996 and followed in 1997–2006 to derive a risk score that could predict the occurrence of dyslipidaemia. Multivariate logistic regression was used to derive the risk functions using the check-up centre of the overall cohort. Rules based on these risk functions were evaluated in the remaining three centres as the testing cohort. We evaluated the predictability of the model using the area under the receiver operating characteristic (ROC) curve (AUC) to confirm its diagnostic property on the testing sample. We also established the degrees of risk based on the cut-off points of these probabilities after transforming them into a normal distribution by log transformation. Results The incidence of dyslipidaemia over the 5-year period was 19.1%. The final multivariable logistic regression model includes the following six risk factors: gender, history of diabetes, triglyceride level, HDL-C (high-density lipoprotein cholesterol), LDL-C (low-density lipoprotein cholesterol) and BMI (body mass index). The ROC AUC was 0.709 (95% CI: 0.693–0.725), which could predict the development of dyslipidaemia within 5 years. Conclusion This model can help individuals assess the risk of dyslipidaemia and guide group surveillance in the community. Electronic supplementary material The online version of this article (10.1186/s12944-018-0906-2) contains supplementary material, which is available to authorized users.


Introduction
According to the World Health Organization, cardiovascular disease accounts for more than half of all non-communicable diseases and has become the leading cause of death worldwide [1]. Atherosclerosis is the major cause of cardiovascular disease, consistent and convincing evidence supports an association between dyslipidaemia and cardiovascular disease incidence [2]. Dyslipidaemia is defined as an abnormal lipid profile with high triglyceride (TG), total cholesterol (TC) or low-density lipoprotein cholesterol (LDL-C) or low high-density lipoprotein cholesterol (HDL-C). Studies have shown that LDL-C levels are directly associated with an increased risk of coronary artery disease (CAD) [3], whereas higher serum HDL-C levels are a protection factor [4]; however, there is much controversy concerning the proposal that high serum TG levels may be an additional factor for cardiovascular disease [5,6].
As a result of Westernization of the diet, obesity, and adverse lifestyle changes [7], the prevalence of dyslipidaemia is high and increasing yearly. A report from the American College of Cardiology has indicated that 39% of the global population has elevated cholesterol, and more than one-half of those individuals live in higher income countries [1]. The dyslipidaemia prevalence increased from 18.6% in 2002 [8] to 33.97% in 2010 [9]. Recently, the overall pooled dyslipidaemia prevalence in Chinese adults was estimated to be 41.9% [10]. In Taiwan, the incidence of dyslipidaemia is also high in both adolescents and adults. An epidemiological survey showed that the prevalence of dyslipidaemia significantly increased from 13% in 1996 to 22.3% in 2006 among adolescents [11]. In adults, the prevalence rates of hypercholesterolaemia, hypertriglyceridaemia, an elevated LDL-C level and a low LDL-C level were 53.3% in men and 48.2% in women, 29.3% in men and 13.7% in women, 50.7% in men and 37.9% in women, and 47.4% in men and 53% in women, respectively [12].
Former researchers developed prediction models to calculate a subject's probability of developing dyslipidaemia [13][14][15], but their models were not very suitable for the Taiwanese population [16][17][18][19]. Therefore, we constructed a risk prediction model of dyslipidaemia among Taiwanese individuals enrolled in the MJ Health Check-up Corporation to evaluate the onset risk of dyslipidaemia in a Taiwanese cohort with periodic check-ups and to provide a reference for individual prevention. The development of this prediction model was based on a cohort with a follow-up of 5 years using clinical information alone or in combination with simple laboratory measures.

Study subjects
The MJ Health Screening Centre comprises 4 health screening centres (Taipei, Taoyuan, Taichung, and Kaohsiung) located around Taiwan. The members involved in the physical examination covered nearly 840,000 people. Their ages range from 18 to 80 years old, and the regional distribution involves 23 cities and counties in Taiwan. Thus, there is a certain representation of the Taiwan population.
The study cohort consisted of 11,345 subjects with the following inclusion criteria: 1) aged 35-74 years; 2) information about the variables was complete; 3) no dyslipidaemia at the beginning of the survey and completed the follow-up in 5 years. The exclusion criteria were as follows: 1) had dyslipidaemia at the beginning of the survey; 2) were using lipid-lowering drugs; 3) were lost in the follow-up and had scant information about the key variables.

Ethics statement
This study was approved by the Peking University Institutional Review Board, which made the following decision: this study eliminated all identifiable personal information not belonging to studies involving human beings. Thus, we granted waivers of informed consent and ethical review to the study. All or part of the data used in this research were authorized by and received from the MJ Health Research Foundation (Authorization Code: MJHRFB2014003C). Any interpretation or conclusion described in this article does not represent the views of the MJ Health Research Foundation.

Measurements
The research data were collected in a unified manner that included a questionnaire, physical examination and laboratory testing. The questionnaire included more than 200 indicators, including the demographic data, personal medical history and medication history, family history of cardiovascular disease, physical activity status, smoking status, alcohol consumption level, food type, and symptoms. The physical examination included blood pressure, height, weight, and waist measurements. The laboratory tests included measurements of fasting serum total cholesterol (TC), HDL-C, TG, LDL-C, blood glucose and other routine physical indicators. All the specimens were evaluated at the MJ Central Laboratory, and the biochemical indices were measured in the Hitachi-7150 automatic analyser (Hitachi, Tokyo, Japan) [20].

Dyslipidaemia diagnostic criteria
Adult dyslipidaemia was defined according to the Chinese adult dyslipidaemia Prevention Guide, which was published in 2007 [21] and 2016 [22]. A subject was diagnosed with dyslipidaemia when he/she had one of the following conditions: TC ≥240 mg/dl; TG ≥200 mg/dl; HDL-C < 40 mg/dl in men and < 50 mg/dl in women; LDL-C ≥ 160 mg/dl or non-HDL-C (non-HDL-C = TC -HDL-C) ≥190 mg/dl.

Statistical analysis
We used the Taipei cohort to develop the model of different dyslipidaemia types. The statistical analysis was completed using SAS 9.1.3. After the medical centre verified the measurement data rigorously, our team performed a comprehensive clean-up of the data and sorted out the follow-up database for analysis. Continuous variables were expressed as the means ± standard deviations, and categorical variables were expressed as percentages.
Based on the logistic regression equation (Eq. 1), we constructed the probability prediction equation (Eq. 2). We used the Hosmer-Lemeshow χ 2 test to compare the forecasted probability with the actual probability.
To ensure the variables in the preventive model, after adjusting for differences in sex and age, we utilized one-factor logistic regression to determine the role of each variable in dyslipidaemia, used stepwise regression to filter the meaningful variables, and then fit the variables in the prediction model, followed by testing of the predicted probability with a receiver operating characteristic (ROC) curve for diagnosis.

Results
Baseline prevalence and 5-year incidence Table 1 shows the prevalence at baseline of each health screening centre. The prevalence rates of dyslipidaemia in Taipei, Taoyuan, Taichung, and Kaohsiung were 56.5, 56.2, 58.2 and 63.9%, respectively, and the total prevalence was 57.9%.
Table 2 (Page20) shows the 5-year incidence of dyslipidaemia in the subjects who did not have dyslipidaemia initially. The 5-year cumulative incidence was 19.39%. Those cases of dyslipidaemia included 252 cases with increased LDL-C, 243 cases with increased TG, 629 cases with decreased HDL-C, and 352 cases with increased TC. According to Table 2, most of the "1 item" cases were a HDL-C decrease, and most of the "2-4 item" cases were increases in LDL-C and TC, followed by an increase in TG combined with a decrease in HDL-C. Table 3 (Page 21) describes the characteristics of the 5-year follow-up of the study population according to the presence of dyslipidaemia.

Single variable risk analysis (adjusted for age and sex)
According to the risk factors of dyslipidaemia proven in published reports, the value of those factors was explored in the MJ database information. Table 4 (Page 24) lists the relationships between the variables and dyslipidaemia developed using single-variable logistic regression analysis.

Risk prediction model of dyslipidaemia
The prediction model was named as the MJ Dyslipidaemia Risk Score Model (MJ-DRSM). Table 5 contains information of the model. The prediction model is based on the multivariate logistic regression model. Although differences in the incidence of sex were not obvious after a certain age, all fittings of the prediction models were not divided by gender. The main risk factors in the MJ-DRSM (or the so-called variables) included sex, a family history of diabetes, the HDL-C level, the LDL-C level, the TG level and the BMI. Conversely, exercise, drinking or smoking, and dietary factors were excluded from the model. The included factors were consistent with the aetiology results from a large number of previous studies (both domestic and foreign), met the criteria of risk factors and were confirmed in some large-sample prospective studies.
After calculating the LogitP value of a subject, the probability of an individual developing dyslipidaemia within 5 years can be calculated in accordance with the following formula (Eq. 4): Predictive power and cut-off point of the model Figure 1a shows the receiver operator characteristic curve (ROC) of MJ-DRSM and the AUC of training set is 0.707 (0.008) (95% CI: 0.691, 0.723). The sensitivity and specificity shows that the best cut-off point is when the predictive probability is P = 0.1771, the sensitivity is 68.57%, the specificity is 63.53%, the positive predictive value is 30.33%, and the negative predictive value is 10.27.

Model validity test
The MJ-DRSM and formula parameters are based on the data of the Taipei medical group, and the model's validity must be tested in external samples. In this study, we utilized the data of the Taoyuan, Taichung, and Kaohsiung health screening centres to examine the model's validity. Figure 1b shows the area under the curve AUC = 0.708, indicating that the model has a high degree of curve fitting and good external validity. Thus, the prediction model above can be applied practically to an individual to predict the risk of dyslipidaemia within 5 years. Classification of the dyslipidaemia 5-year incidence risk Figure 2 shows the correspondence relationship of the prediction probability and risk level. After transforming the distribution of the 5-year prediction incidence probability into a normal distribution via natural log transformation, we classified the interval of the prediction probability of no dyslipidaemia in the subjects using the mean (M) and standard deviation (SD) and sorted them in ascending order as follows: lower than M-SD was deemed low risk; the range from M-SD to M + SD was deemed general risk; the range from M + SD to M + 2SD was deemed moderate risk; and higher than M + 2SD was deemed high risk.

Model application
According to the blood test results, the risk of dyslipidaemia within 5 years can be predicted using the following process. The first step is to calculate the LogitP using Eqs. 3 to predict an individual's risk of dyslipidaemia. Then, the individual's probability incidence of dyslipidaemia within 5 years can be predicted using eq. 4 to calculate the probability. Finally, Fig. 2 shows the process used to distinguish this person's risk grade.
(More details about the model example are provided in the Additional file 1.)

Discussion
Estimating the absolute risk required for the prevention or treatment of dyslipidaemia commonly relies on prediction models developed from the experience of prospective cohort studies [23]. Logistic regression analysis was used to examine the associations of some factors with dyslipidaemia in this study. The findings based on the training dataset revealed six common parameters significantly associated with dyslipidaemia (gender, a family history of diabetes, the HDL-C level, the LDL-C level, the TG level and the BMI). Those predictors are consistent with the large number of previous aetiological studies performed worldwide [24][25][26][27][28][29].
A Japanese study also used logistic regression to evaluate the predictive power of a body shape index (ABSI) for the development of dyslipidaemia; that article focused on the influence of the BMI, WC, and ABSI on the incidence of dyslipidaemia [14]. Another study tried to use plasma free amino acid (PFAA) profiles to predict dyslipidaemia [15]. Their findings suggested that each increase of 1 SD in PFAA index 1 was related to an    approximately 34% (12-60%) increased risk of developing dyslipidaemia; for PFAA index 2, the increase was 20% (2-40%). However, the researchers did not illustrate the precision of their model. Our model utilized relatively all-sided information, and its precision was good. Thus, MJ-DRSM was a better choice for predicting the incidence of dyslipidaemia. Recently, some studies had proposed that inflammatory factors such as high-sensitivity C-reactive protein(hs-CRP), Interleukin 6(IL-6) and tumor necrosis factor(TNF) were associated with dyslipidemia, cardiovascular disease and metabolic syndrome [30][31][32][33], which was based on the theory that these factors can affect lipid metabolism by promoting the expression of adhesion molecules, the recruitment and activating and gathering the inflammatory cells. Note: "-"means no dyslipidaemia after the 5-year follow-up, "+" means new patients with dyslipidaemia after the 5-year follow-up. Data are expressed as % or means Model sensitivity and specificity are important when testing whether a model can accurately recognize positive and negative outcomes [34]. The ideal model has both high sensitivity and high specificity [35]. The results of the predictive performance showed that the MJ-DRSM model could be used to screen undiagnosed dyslipidaemia patients, because it had good sensitivity (69.9%) and specificity (62.3%). The AUC provides a superior performance index in addition to superior accuracy; therefore, it has often been used to evaluate the predictive accuracy of classifiers [36]. The AUC of a classifier can be defined as the probability of the classifier ranking a randomly chosen positive example higher than a randomly chosen negative example, and higher AUC values can be interpreted as having a higher predictive accuracy [36,37]. For the MJ-DRSM model, the AUC value was 0.709 (0.693-0.725) in the train cohort and 0.708 (0.691-0.725) in the test cohort. The AUC value of the Japanese study [14] was 0.572 (0.564-0.580). Although the two models were appropriate for different populations, our model contained more variables and had a higher AUC value, which meant that the MJ-DRSM had a better probability of predicting dyslipidaemia accurately. The availability of the predictors is also very important when evaluating whether a model can be feasibly used to identify positive and negative outcomes [13].
To the best of our knowledge, the lipid profile measurement is a standard method to identify and diagnose dyslipidaemia. In this study, we used general epidemiological survey data (gender and a family history of diabetes) with biochemical parameters (HDL-C, LDL-C, TG, and BMI) to develop the MJ-DRSM and distinguish subjects who would be patients with dyslipidaemia after 5 years. To optimize resource use, researchers usually categorize more than just high-and low-risk groups and implement graded intensities of interventions according to the degree of risk [38]. Our data suggest that cut-off points for categorization of risk in the Taiwanese population may be based on the mean and standard deviation (Fig. 2). The high-risk individuals identified will benefit from receiving health education and having the According to the parameters listed in Table 5, we can obtain a formula (Eq. 3) to compute LogitP of dyslipidaemia; x 1 -x 6 represent sex, family history of diabetes (0 = no, 1 = yes), HDL-C (mg/dl), LDL-C (mg/dl), TG (mg/dl), and BMI(kg/m 2 ), respectively A B opportunity to engage in healthy lifestyles at an early stage to prevent or delay the onset of dyslipidaemia.
The MJ-DRSM model can accurately calculate the individual probability of dyslipidaemia after 5 years (P1); then, the model puts forward corresponding health suggestions for the subjects and recalculates the 5-year probability after adoption of the proposal (P2). Subjects can compare P1 and P2 to see the advantages of health educational intervention when they follow the advice. Thus, we predict that this model will improve the wiliness of people to change their unhealthy lifestyles according to the health promotive education.
The MJ-DRSM model has a good predictive ability and can directly estimate the 5-year risk of new dyslipidaemia patients in physical examination populations. Additionally, the model can calculate the benefit to the individual after changing the risk factor level to facilitate health education development. Although this study creates a simple scoring tool to predict dyslipidaemia, the study limitations should be noted. First, there were differences in the sociodemographic characteristics between subjects with long-term and rare participation in physical examinations. Second, the model could only be used to predict the 5-year incidence of dyslipidaemia and could not be extrapolated directly to people beyond 35 to 74 years of age. Third, we didn't measure the plasma hs-CRP, IL-6 and TNF in this study,then we cannot analyze these indexes' effects on the development of dyslipidemia, we would take those indexes into account in further research. Despite these limitations, the results were based on a large population-based study that combined multiple risk factors, and the prediction model was reliable and effective for the screening of undiagnosed dyslipidaemia patients among the MJ Health screening population.

Conclusion
The predictability and reliability of our dyslipidaemia risk score model based on the Taiwan MJ Longitudinal Health Check-up Population Database were satisfactory in the testing cohort, with simple and practical predictive variables and risk degree forms. This model can help individuals assess the risk of dyslipidaemia and guide group surveillance in the community.