A study of the GeGDP problem base on the LightGBM regression

. This study uses AHP hierarchy analysis to select three indicators as the measurement standard of green GDP: GDP, resource consumption and reduction cost and environmental degradation cost, and analyzes the weight of green GDP indicators and the importance of the three indicators. With GeGDP as the main indicator of a country's economic health, a Lasso regression model is established to analyze the predicted global climate mitigation impacts. Then, this paper establishes LightGBPGBM regression model to predict the future GeGDP and GDP of the United States, and uses r square and other indicators to test the accuracy of the model, and makes a human correlation analysis of GeGDP and GDP before and after the prediction, to analyze the degree of correlation between the two. Finally, taking the United States as an example, we substituted relevant data into the LightGBM regression model developed above, predicted its value, and conducted one-way analysis of variance on it to determine the degree of change before and after. This study uses AHP hierarchy analysis to select three indicators as the measurement standard of green GDP: GDP, resource consumption and reduction cost and environmental degradation cost, and analyzes the weight of green GDP indicators and the importance of the three indicators. With GeGDP as the main indicator of a country's economic health, a Lasso regression model is established to analyze the predicted global climate mitigation impacts. Then, this paper establishes LightGBPGBM regression model to predict the future GeGDP and GDP of the United States, and uses r square and other indicators to test the accuracy of the model, and makes a human correlation analysis of GeGDP and GDP before and after the prediction, to analyze the degree of correlation between the two. Finally, taking the United States as an example, we substituted relevant data into the LightGBM regression model developed above, predicted its value, and conducted one-way analysis of variance on it to determine the degree of change before and after.


Research Background
Gross Domestic Product (GDP) is an important indicator of a country's economic health, calculating the monetary value of the final goods and services produced by the country over agiven period of time. Although GDP is often quoted as the main indicator of economic health, it fails to take into account the depletion of natural resources and the negative impact on the environment and therefore may not be a good measure of a country's true economic health. This has led to the creation of a new indicator, the Green GDP (GeGDP), which is an economic indicator that takes into account environmental and sustainability factors, including the consumption of natural resources, loss of ecosystems and environmental pollution. Compared to traditional GDP, GeGDP is a more comprehensive and accurate indicator of a country's true economic health.

Study purpose and significance
In this study, AHP hierarchical analysis will be used to construct the measurement standard of green GDP, and GeGDP will be taken as the main indicator of a country's economic health, and a regression model will be established to analyze the predicted global climate mitigation impact. The first aim is to develop a more stable and concise model, using the GeGDP as a primary indicator of a country's economic health, to estimate the expected global impact on climate mitigation. The second purpose was to determine whether the models showed that such a shift would be worthwhile on a global scale, comparing the potential advantages and disadvantages of replacing GeGDP with climate mitigation impacts.

Innovation point
Few scholars have been able to go on to subdivide these factors into different types and study interactions between factors under different types of interaction combinations or single factors. This experiment uses Lasso regression algorithm, which can process multicollinearity data, such as ridge regression, which is a biased estimation.
By using a histogram-based algorithm, the LightGBM algorithm can be trained on large data sets in a relatively short time.

Literature review
The net economic welfare index proposed by Jamps Tobin became the theoretical basis for many scholars to discuss green GDP. Robert Lupetto's measure of net domestic product, which adjusts for the net change in natural resources, excludes elements that cannot be measured in money, such as air pollution.
Konin put forward "National accounting Matrix including environmental accounts" (NAMEA for short), NAMEA matrix uses the traditional input-output model to connect with environmental data, and uses real material quantity to calculate environmental data and further transform it into environmental indicators that can be converted accordingly, so as to calculate green GDP.
In 1993, the Economic and Social Information and Policy Analysis Branch of the United Nations Statistical Office proposed "System of Environment Economic Accounting" (SEEA). In this system, the framework of integrated environmental and economic accounting, asset classification, environmental cost and benefit assessment, natural resources and environmental valuation methods and so on are discussed.

Experimental procedure
We use the AHP hierarchical analysis method to select three indicators: GDP, resource depletion and reduction costs and environmental degradation costs, give the relative importance values of these three indicators, and analyse the weight of the indicators of green GDP accounted for by these three indicators from this perspective, and analyse the importance of these three indicators. The table above shows the judgment matrix constructed. The results of the weight calculation of the hierarchical analysis (square root method) show that the weight of Environmental downgrade costs is 16.342%, the weight of Resource consumption and cost reduction is 29.696% and the weight of GDP is 53.961%. The results of the hierarchical analysis showed that the maximum characteristic root was 3.009 and the corresponding RI value was found to be 0.525 according to the RI table, so CR=CI/RI=0.009<0.1, which passed the one-off test.

Results of the AHP model
Ultimately, we obtain the formula as GeGDP = GDP -Resource Depletion Costs -Environmental Degradation Costs

Experimental procedure
We collected annual GDP data, CO2 emissions data, temperature, environmental index data, and resource data from the World Bank and UNEP data sites for five different countries for the period 2000-2020 to create Lasso regressions to analyse the expected global impact on climate mitigation by prediction. the Lasso method is an alternative least squares compression estimation method.

Results of the social value orientation experiment
The

Experimental procedure
We first used the data collected to calculate the GeGDP indicators corresponding to different countries in different periods, build a LightGBM regression model to forecast the future GeGDP and GDP of the US, and also use indicators such as R-squared to test the accuracy of our model, and conduct Person correlation analysis on GeGDP and GDP before and after the forecast to analyse their the degree of correlation between the two.
After that, we collected the US GDP, GeGDP, economic status status, and future ability to provide for future generations for the years 2000-2020, used the data above, substituted them into the LightGBM regression model developed above, predicted their values, and performed a one-way ANOVA on them to determine the degree of change before and after.  Note: ***, ** and * represent 1%, 5% and 10% significance levels respectively

Results of the LightGBPGBM regression
The table above shows a table of the results of the parameters tested for the model, including the correlation coefficient, and the significant P-value. The test is first conducted to determine whether there is a statistically significant relationship between XY and whether the p-value presents significance (p<0.05). If it shows significance, then there is a correlation between the two variables, and if not, then there is no correlation between the two variables. Finally, analyse the positive and negative direction of the correlation coefficient and the degree of correlation. . Figure 6 Predicted graph of test data The graph above shows LightGBM's predictions for the test.

Results of one-way ANOVA
Analysis term: GDP, sample using Shapiro-Wilk test, significance p-value of 0.469, level does not present significance and cannot reject the original hypothesis, therefore the data satisfies normal distribution.
Analysis term: GeGDP, sample using Shapiro-Wilk test, significance p-value of 0.460, level does not present significance and cannot reject the original hypothesis, therefore the data satisfies normal distribution. The results of the chi-square test show that for GDP, the significance p-value is 0.294, which does not present significance at the level that the original hypothesis can be rejected, and therefore the data satisfies the chi-square.
The results of the chi-square test show that for GeGDP, the significance p-value is 0.248, which does not present significance at the level where the original hypothesis can be rejected, and therefore the data satisfies the chi-square.

Figure 7
One-way ANOVA comparison chart The graph above shows the results of the ANOVA for the means, which can be mined for differential relationships by comparing the means. The mean values of good and unstable versus recession on GDP are 15.319*/17.765*/14.420* respectively; the ANOVA results in a p-value of 0.632>0.05 and therefore the statistical results are not significant, indicating that there is no significant difference between the different Economic status on GDP.
The mean values of good and unstable versus recession on GeGDP are 20.840*/24.925*/19.720* respectively; the ANOVA results in a p-value of 0.631>0.05 and therefore the statistical results are not significant, indicating that there is no significant difference between the different Economic status on GeGDP. The results of the quantitative analysis of effects show that the Eta square (η² value) is 0.05 based on GDP, indicating that 5.0% of the variation in the data is derived from differences between groups. cohen's f value is 0.229, indicating that the degree of variation in the quantification of effects of the data is a small degree of variation.
The results of the quantitative analysis of effects showed an Eta-square (η² value) of 0.05 based on GeGDP, indicating that 5.0% of the variation in the data was derived from differences between groups. the Cohen's f value was 0.229, indicating that the degree of variation in the quantification of effects of the data was small.

Conclusion
This study uses AHP hierarchy to analyze GDP, resource consumption and reduction costs and environmental degradation costs, and analyzes the importance of indicators. Using GeGDP as a primary indicator of a country's economic health, a Lasso regression model was established to analyze the predicted global climate mitigation impacts. Then, the LightGBPGBM regression model is established to predict the future GeGDP and GDP of the United States, and the r square and other indicators are used to test the accuracy of the model, and the GeGDP and GDP before and after the forecast are analyzed by human correlation, to analyze the degree of correlation between the two. Finally, relevant data were substituted into the LightGBM regression model developed above to predict its value, and one-way analysis of variance was performed on it to determine the degree of change before and after.