Study on the Demographic Analysis of Countries and the Construction of Population Health Index

The data in this paper are from the POPULATION Data Bureau of the United States, Statistics Bureau of the European Union, CEIC global database, etc. SPSS software was used to carry out statistical analysis on the population data of about 230 countries and regions in the world from 2002 to 2012. Through the hierarchical cluster analysis, the population data of the Middle East countries are clustered, and it is found that the population distribution of the Middle East countries is relatively uniform. Multiple linear regression analysis was used to explore the determinants of the total American population, and it was found that the population aged 15-64 and the birth rate were significantly positively driven to the total American population. Finally, the population health index was constructed by factor analysis, and the supporting factors were life expectancy health factor and population quantity factor respectively.


Introduction
Because the population change of a country has a great impact on the society, economy and policy, the number of population has always been the focus of researchers from all walks of life. In the 21st century, the world is experiencing unprecedented population changes, which include the rapid expansion of population as well as other population trends, such as the regional imbalance of population structure, the impact of modernization and urbanization, the aging of population, etc.
For the age structure of the population, many scholars at home and abroad have used different methods to analyze and study: Chen Baoping used the grey clustering theory to establish the evaluation model of the population age structure, so as to divide the different regions of China into three grades according to the excellent, the medium and the poor; Bai Xuemei et al. Evaluated the age structure of the population by the improved fuzzy comprehensive evaluation model, which can not only describe the degree and ranking of the population, but also once again Zhang et al. Explored the impact of population age structure on rural environment in China, taking biomass cogeneration capacity as an example.
Similarly, many scholars and professional institutions have devoted themselves to the research on the development trend of world population, which is the next hot spot of population study hours. Guo ran et al. Made a deep explanation on the theory and reality of the world population development trend and population transformation, and found that the world population development is still unbalanced, the total population is increasing but the growth rate is decreasing; Hou Yankun combined with data analysis method, based on Leslie model, constructed the world population growth prediction model, which has more mathematical significance; Ren Qiang analyzed it from the perspective of life expectancy evolution The development track of world population.
In recent years, the growth of population is more and more rapid, and the population structure and population development trend around the world are extremely unbalanced. Therefore, how to control population growth and what policies to promote population health have gradually become the next research focus. Based on this, this paper analyzes the population data of about 230 countries and regions in the world from 2002 to 2012. The data are from the U.S. population information bureau, Eurostat, and CEIC global database.

Descriptive Statistics of Global Population by Country, 2009-2012
Due to some limitations, real-time population data can not be collected, only the annual population data of about 230 countries in the world from 2002 to 2012 are obtained. In order to reserve space, only the global population data from 2009 to 2012 are used for descriptive statistics, in which few countries fail to collect population information due to unknown reasons, and such data are treated as missing values.
The results of descriptive statistics on the population of all continents and countries from 2009 to 2012 are shown in Table 1. The results show that the average population of every continent has an obvious growth trend from 2009 to 2012. Among them, the population is mainly distributed in Asia, America and Africa; the average population in Asia is the largest, reaching 114836534 in 2012, with the standard error of 52316304.30; from the perspective of variance, the average population variance of Oceania countries is small from 2009 to 2012, and the variance value in 2012 is 3.543e + 13, which shows that the population growth of Oceania countries tends to be a comparison However, the average population variance of Asian countries is large, which may be caused by the uneven population and unbalanced growth of each country. This section will classify the population data of nearly 230 countries and regions in the world, and first draw the population distribution map of each continent and country in 2012 according to the data. Due to the large amount of data, only the radar map of population distribution of African countries in 2012 (see Figure 1) and the fan chart of population distribution of Oceania countries in 2012 are given here. It can be seen intuitively from the figure that the population distribution among the countries of each continent is not uniform, even polarized; from the total population data of 2012, Asia still has the largest population, reaching 3904442155, and Oceania is the least, with only 37295830. In 2012, the global population eventually totaled 7014951860 (excluding a small number of missing values).

Basic Methods and Steps of Multiple Linear Regression
The general form of multiple linear regression model is represents the influencing factors, which are usually controllable or given in advance, and are called explanatory variables or independent variables; y is the research object, that is, the prediction target, which is called the explanatory variable or dependent variable; e represents the sum of the influences of various random factors on y, which is called random error term, and obeys normal distribution, that is, , and i  is called the regression coefficient of multiple linear regression model; P is the number of explanatory variables in multiple linear regression model.
If you give a set of observations } ,..., 2 , 1 : The overall regression model can also be written: The matrix representation of the multivariate linear statistical model can be obtained as follows (2) Among them, In general, the least square method is used to estimate the regression model, that is, to minimize the objective function Whether the preliminary estimated regression model can objectively reveal the relationship between various factors in the economic phenomenon studied, whether it conforms to the objective law among variables, whether the introduction of the influencing factors is effective, whether there is a linear correlation between variables, and whether the model can be applied should be determined by model test.

Regression Analysis of Population Data from Countries in the Americas
This paper will take the population data of the Americas in 2012 as an example to conduct multiple regression analysis and establish regression equations. The variables in the regression model include: total population (y), 0-14 age group (x1), 15-64 age group (x2), population over 65 years old (x3), urban population (x4), birth rate (x5), neonatal mortality rate (x6), male life expectancy (X7), female life expectancy (x8) (1) Standardized processing of sample data Due to the different dimensions of the selected sample data, in order to avoid the error caused by the dimension, it is necessary to standardize the sample data first (2) Regression analysis SPSS software was used to conduct stepwise regression analysis on the total population (y), standardized population of 0-14 age group (z1), 15-64 age group (z2), population over 65 years old (Z3), urban population (Z4), birth rate (Z5), neonatal mortality rate (Z6), male life expectancy (z7), female life expectancy (Z8). Finally, the selected variables are population (z2) and birth rate (Z5) of 15-64 age group. Excluding the other six variables, the regression results are shown in Table 2.  Table 2, it can be seen that the explanatory variables selected by stepwise regression are significant, and the overall equation is significant, and the fitting effect is also good. Finally, the regression equation is obtained as follows: 5 Among them, the regression coefficients of the two explanatory variables are positive, indicating that the population and birth rate of 15-64 age group have a significant positive impact on the total population; while the regression coefficient of z5 is far greater than that of z5, which means that the contribution of population of 15-64 age group to the total population is far greater than the birth rate.

Presentation of Population Health Indicators
Population health index is a set of indicators that can comprehensively diagnose and objectively compare the health status of population development in various countries, and give early warning to some population development status. As early as 2005, the International Monetary Fund clearly mentioned in the world economic outlook that world economic imbalance is an important risk of world economic growth. This paper will take each country in the world as the unit, select the typical index reflecting the population characteristics to construct the population health index. Specifically, the birth rate, neonatal mortality, male life expectancy and other indicators are selected to form a comprehensive index to measure the population health level of each country through factor analysis, and then calculate the population health index. The advantage of the establishment of population health index is that it can promote the governments of all countries in the world to pay more attention to their own population development and control the population in a balanced and coordinated way. After effectively controlling the population, it will greatly promote the sustainable and coordinated development of economy, society and resources, so as to promote the development of various countries.

Take Asia as an Example to Construct Asian Population Health Index Model
This paper will take the population data of Asian countries in 2012 as an example to construct the health index model. The original variables include: urban population (x1), birth rate (x2), neonatal mortality (x3), male life expectancy (x4), female life expectancy (x5), and total population (x6). Firstly, whether the data is suitable for factor analysis is tested. The value of kmo test is 0.815 > 0.6, indicating that the data is suitable for factor analysis and the statistics are significant. Using SPSS software and principal component analysis method, all variance is introduced into the matrix, and the least variable is used to explain the variance as much as possible. After that, the orthogonal factor is rotated by the variance maximization method to change the coordinate axis position and redistribute the variance proportion explained by the factor, so that the factor structure is simpler and easier to explain. Finally, two common factors were selected, and the cumulative contribution rate was 82.974%, indicating that factor analysis was successful. According to the rotated factor matrix (see Table 3), the factor load of life expectancy of men and women on the first common factor is relatively large, while the factor load of total population on the second common factor is relatively large. The meaning of each factor should be clear and easy to explain, so the first factor is named as the life span health factor, and the second factor is named the population quantity factor. Then, we construct the population health index and calculate the population health index of each country in Asia. In this paper, the proportion of the variance explained by each common factor is used as the weight. The variance explained by the first common factor is 65.457%, and the variance explained by the second common factor is 17.517%. The weighted sum of the two common factors of each country in Asia is obtained, and then the population health index of Asian countries is obtained through standardized processing. (See Table 4) The results show that in Asia, China and India rank the top two in terms of population health index, which is mainly due to the advantage of the total population; on the whole, the population health index of East Asian countries and Southeast Asian countries is generally higher, while that of Western Asian countries is slightly lower, and that of other small countries is generally lower, which is also related to the comprehensive national strength of Asian countries to a certain extent.

Summary
This paper collects the population data of 230 countries in the world from the U.S. population information bureau, Eurostat data, CEIC global database, and makes hierarchical cluster analysis on the population characteristics of different countries in different regions; and carries out regression analysis on their population variables, and obtains the regression equations of population variables in different regions and countries; finally, it constructs a comprehensive population model by factor analysis Population health index of each country in the world.
The results show that the clustering results of population characteristics in different regions and countries are mainly determined by the total population of each country; the variables that have significant influence on the total population of a country are population of 15-64 age group and birth rate, which have a positive driving force on the total population, and the contribution of population in the 15-64 age group is greater; the factors that constitute the population health index are population life-span health factor and population size Factors, namely life expectancy and population size, mainly determine the health level of the population.
However, there are still some deficiencies in the analysis of this paper: it is one-sided to cluster analysis only through the variables of population quantity characteristics, ignoring the influence of population structure; the factors influencing population quantity and health level in various countries and regions are far more than the superficial characteristics of population, and economic, policy, cultural and other factors also have a great influence on population, which are not found in this paper in consideration of. The improvement of the above problems needs more in-depth study in the future.