S&P 500 Index and Volatility Forecast of Chinese Stock Market

. The main purpose of this article is to examine the role of the S&P 500 index in predicting the volatility of China's stock market. Our work is based on the autoregressive model (AR). We further extend this simple benchmark model by adding the volatility of the S&P 500 index. Intrasample regression shows that after adding this indicator, the overall goodness of fit of the model is rising, the explanatory ability is enhanced, and the added variables are also very significant. The out of sample prediction shows that, in terms of statistical test, the extended model has a positive out of sample R-square compared with the benchmark model, and has passed the CW test; In terms of economic test, we find that the extended model has positive CER and Sharp Ratio (SR) compared with the benchmark model. The out of sample predictions of these two aspects show that the newly added S&P 500 index has a good prediction effect. In addition, we also conducted various robustness tests, such as replacing the previous dependent variable (Shanghai Stock Exchange Index) with the CSI 300 index, replacing the previous extended window with a rolling window for prediction, and extending the previous single period prediction to multi period prediction. In the multi period forecast, we found that the S&P 500 index is only effective in a short period, for example, within 3 months, but cannot play a predictive role when it is extended to 6 months.


Introduction
China's stock market started late. It was not until November 26,1990 that the Shanghai Stock Exchange was established, and the next July 3, the Shenzhen Stock Exchange was established. The development history of China's stock market is only 30 years. China's stock market is a young and dynamic market. In the early stage, only a few stocks were listed and traded. Up to now, there are about 3000 stocks listed and traded in the whole A-share market. The investment sector is also divided more carefully. There are trading sectors such as the main board, the small and medium-sized board, the growth enterprise board and the science and innovation board, which can meet the listing requirements of various companies and the investment preferences of investors. These changes reflect the vigorous development and increasingly perfect of China's stock market. On march 31,2018, the US Morgan Stanley Capital International (MSCI) began to include some Chinese A-share large market capitalization stocks in the MSCI Emerging Market Index, showing that China's stock market is playing a more prominent role in the global stock market.
With the development of China's economy, the improvement of people's living standards and the strengthening of the concept of investment and financial management, people show great enthusiasm for investing in stocks. At the same time, with the process of deepening financial reform in China, the stock market is becoming more and more perfect. Stock is not only a reasonable investment product, but also accounts for a large proportion in people's financial products. Therefore, it is necessary to study the stock market in China. The volatility of the stock market has become the core of asset pricing, asset allocation and corresponding risk management. However, the stock market fluctuates very frequently, especially in the period of economic recession, such as the COVID-19 this year and the global financial crisis in 2008. The volatility of stocks is huge, so it is particularly important to predict the volatility of stocks, so that we can better avoid risks. At present, there are two main ideas in academic circles when predicting volatility. The first is to predict future volatility based on historical data, which is called historical information method. For example, the GARCH model used by Gaotong and zhuhailong [1] to predict volatility is historical information method, which is also the method used in this paper; The option pricing formula can also be used to reverse the volatility, that is, the implied volatility.
In terms of measuring volatility, with the rapid development of financial technology, it has become possible to use high-frequency data to capture the volatility of the market, and in the research of scholars, the use of high-frequency data has become a trend. In a pioneering paper, Andersen and bollerslev [2] first introduced the use of intraday high-frequency returns to characterize volatilityrealized volatility (RV). The realized volatility calculation is simple. Compared with the arch and SV models that dominated before, there is no need to estimate other parameters, and it has gradually become a recognized measurement criterion in the academic community. The proposal of realized volatility has greatly promoted the research on volatility. Corsi [3] proposed heteroscedasticity autoregressive model (HAR) to study daily volatility on this basis. Other scholars expanded this model and proposed various har models with jumps, such as HAR-J and HAR-C-J models. These improvements will strengthen the prediction ability to a certain extent. When predicting the monthly volatility, most scholars use the autoregressive model (AR) based on the realized volatility, and the specific lag order can be determined by referring to the AIC information criterion.
Because volatility is closely related to risk management, asset pricing and asset allocation, the research on volatility has always been the object of academic concern. In addition, in the research of Chauvet et al [4], volatility can also predict the business cycle and provide early warning information for the upcoming recession. These practical significance prove the importance of volatility in the financial field and the necessity of research. Through the research of many scholars, although some variables can affect the volatility, it is difficult to improve the prediction ability outside the sample by adding a single variable to the benchmark model. In our research, we introduced the S&P 500 index to predict the volatility of China's stock market, and found that the original benchmark model was significantly enhanced both within and outside the sample, which shows that our work is helpful to improve the prediction effect. At the same time, our research perspective is on the Chinese market. Compared with the research on the American market, this research is relatively scarce. Our research can make up for the lack of attention from relevant parties, and enable investors to accurately predict the future volatility of China's stock market, so as to further provide a basis for investment decisions and avoid relevant risks.
The rest of this article is composed of the following chapters. The second chapter mainly introduces the specific econometric methods and the regression model used in this paper. The third chapter is mainly about data preprocessing and descriptive statistics, so that it can better meet the requirements of regression. The fourth chapter is the core of this paper, including in sample research, out of sample prediction and economic application. The fifth chapter is to test the robustness of the results of the fourth chapter. The last part is the conclusion of this paper, summarizing our research results.

Realized Volatility
This paper uses the monthly volatility of the S&P 500 index to predict the monthly volatility of China's Shanghai Stock Exchange Index. In the case of no parameter estimation, the academic community has used several different methods to measure volatility. The first measure is to calculate the monthly volatility by using the square of the monthly return, such as Sadorsky [5] and Kang et al [6] used this method. The second measure is realized nuclear volatility (RK) proposed by Barnorff -Nielsen et al [7], and the third measure is realized volatility (RV) proposed by Andersen and Bollerslev [2]. This paper will use the latter measurement index, because the realized volatility is widely used in the academic circles, and has a good application effect, which can reduce the noise. We will calculate the sum of the squares of the daily index returns to construct the monthly volatility of the stock. For each month, the realized volatility is calculated as follows: t=1, 2, 3,……..T (1) T here represents the month T, M is the trading days in month T, then , represents the return rate of the stock on the j day of the T month.

Prediction Model
When predicting the monthly volatility, because the volatility has a strong autocorrelation, the academia unanimously adopts the autoregressive model (AR) as the benchmark model: (2) +1 is the residual term. It is considered that it follows the normal distribution of zero mean and co variance, and P is the lag order. In this paper, the lag order will be set to 6 according to the model of Yudong wang [8]. According to Yudong wang [8], the longer lag order can fully capture the autocorrelation of stock fluctuations.
Based on the benchmark model, we add an explanatory variable to extend the benchmark model. The newly added explanatory variable is the monthly realized volatility of the S&P 500 index. The model is as follows: , is the monthly volatility of the S&P 500 index in t months. The same residual term follows the normal distribution of zero mean and homovariance. In regression of the above two models, the ordinary least square (OLS) method will be used in this paper.
In order to compare the prediction effect of the newly added explanatory variables, we divide the whole sample period into in sample estimation and out of sample prediction. Assuming that the total sample observation value is t, we divide it into two parts. If the number of in sample observations is m, the out of sample observations are t-m. There are two methods for forecasting out of samples, one is rolling window, and the other is expanding window. The former is mainly used for forecasting daily volatility, because there is a large amount of data and it is easy to have sudden changes in daily data, while the latter is used for monthly forecasting with a small amount of data. Therefore, this paper will use the extended window for out of sample prediction, which is specifically defined as: the initial estimator is m, and the coefficient of M data regression is used to calculate the volatility in the m+1 month as the prediction value. Then, when predicting the volatility in the m+2 month, the observed value in the sample becomes m+1, and then the coefficient of these data regression is used to predict the volatility in the m+2 month. By analogy, add an observation value in each prediction until the last month after the prediction. The mathematical expression is as follows: α M , β i,M and γ M are the regression coefficients of formula (3) within the sample, and the predicted values outside the second sample are: α M+1 , β i,M+1 and +1 are regression coefficients from the previous m+1 observations in formula (3). Forecasting the volatility of m+3 months, and use the regression coefficient of the previous m+2 observations. And so on until the last predicted value ̂ is obtained.

Evaluation of out of sample prediction results
In the previous step, we need to evaluate the prediction quality to measure the interpretation and prediction effect of the newly added explanatory variables. Here we need to introduce an evaluation standard. According to Campbell and Thompson [9], we use out of sample 2 , and the calculation result is the percentage of the reduction between the mean square prediction error of the extended model ( ) and the mean square prediction error of the benchmark model ( ℎ ), which is defined as: , is the real value of volatility in month t, ̂, is the volatility forecast value of the t month predicted by the i (i is the extension model or benchmark model). In addition, we used the CW test proposed by Clark and west [10] (to test the significance of 2 outside the sample. The zero hypothesis is that the mean square prediction error of the prediction of the benchmark model is less than or equal to the mean square prediction error of the extended model, and the other hypothesis is that the mean square prediction error of the prediction of the benchmark model is greater than the mean square prediction error of the extended model. The t-test here is a one tailed test, which is different from the general two tailed test. The specific CW statistics are calculated by the following formula: Therefore, the t statistic of CW test is derived from the above is the adjustment statistic of the mean square prediction error obtained from the regression of the constant term.

Data
The data of Shanghai Composite Index in this paper comes from tonghuashun software, which downloads the daily closing price of Shanghai Composite Index on tonghuashun, and then calculates the daily yield to get the realized volatility of each month; The data of the S&P 500 index comes from the Yahoo Finance website. The specific method for calculating the monthly volatility is the same as that of the Shanghai Composite Index.

Fig. 1 Volatility of SSEC and S&P 500
It can be seen from the above figure that the Shanghai Stock Exchange Index fluctuated greatly in the first few years, which is related to the development process of China's stock market. China's stock market started late, and the Shanghai Stock Exchange was not established until december1990. The sample period selected in this paper is from July, 1991 to June, 2020. Therefore, at the beginning of its establishment, the Shanghai stock index contained a small number of samples, and the immaturity of market investors would also lead to large fluctuations in stock prices. As the market becomes more and more mature, the volatility tends to be flat. At the same time, through the S&P 500 index, we can see that during this period, the U.S. stock market has two periods with particularly large fluctuations. One is around 2008, and the other is this year. The former is mainly the U.S. subprime mortgage crisis in 2008, and the latter is the global COVID-19 epidemic this year. Due to the large fluctuation of Shanghai stock index in the first two years, the latter is almost small compared with the latter, covering up the fluctuation of China's stock market. In fact, during these two periods, China's stock market was also in a period of turmoil. At the same time, China's stock market changed from bull market to bear market in 15-16 years, and the fluctuation range was also increasing. It can be seen from table 1 that the fluctuation of Shanghai stock index is larger than that of S&P 500. The mean value of Shanghai stock index is 1.05%, the S & P is 0.28%, the standard deviation of Shanghai stock index is 3.49%, and the S & P is 0.6%. At the same time, the skewness values of Shanghai Composite Index and S & P are particularly large, indicating that both are right skew, the tail probability on the right side is large, with fat tail phenomenon, and the kurtosis values of Shanghai Composite Index and S&P 500 index are also large, indicating that both are peaks. The J-B statistic indicates whether it conforms to the normal distribution. We find that both the Shanghai Stock Exchange Index and the S&P 500 reject the original assumption and do not conform to the normal distribution. However, after taking the natural logarithm of the two, the J-B statistic decreases significantly and is closer to the normal distribution. The normal distribution has good properties in statistics. Therefore, in this paper, the variables are treated as logarithms. In the penultimate row, the Ljung box Q statistic is displayed, which is used to test whether the observation value of the sample is an independent random observation value. The result shows that the original hypothesis is rejected, indicating that it is not independent random, and the volatility has strong memory, so the autoregressive model has a good interpretation effect. The last line is the value of the unit root test, that is, the stationarity test is performed on the time series. The results show that they are all stationary series, so no special data processing is required.

Insample regression
Intra sample regression when evaluating the interpretation effect of the S&P 500 index, we first need to conduct intra sample regression. According to Inoue and kilian [11], intra sample testing is essential when making out of sample predictions. If the regression interpretation effect within the sample does not pass, the prediction outside the sample is certainly not ideal. Here, we conduct intra sample regression for formulas (2) and (3) mentioned in Section II to compare the interpretation effect of the S&P 500. From the regression within the sample, it can be found that when the S&P 500 index is not added, only 6 not significant, and the adjusted R-square is 53.85%, indicating that the benchmark model can better explain the fluctuations in the sample. After the S&P 500 index is added, the coefficient symbol of the benchmark model remains the same as that of the original model. At the same time, the coefficient of the S&P 500 is significant, indicating that the newly added variables can enhance the prediction effect of the model. Here we find that the regression coefficient of the S&P 500 index is -0.113, indicating that the volatility of the S & P index is rising, while the volatility of the Shanghai Stock Exchange index is declining, reflecting that China's stock market will be much more stable during the turmoil of the US stock market. It may be that when the U.S. stock market fluctuates greatly, many speculators and hot money flow into the U.S. market for speculation, so the hot money in the Chinese market decreases. Hot money is mainly short-term investment and follows the principle of fast in and fast out. When they flood the market, it is easy to cause stock market fluctuations. Therefore, when hot money flows to the U.S. market, the volatility of China's stock market is declining. Finally, let's look at the last line, that is, the change in the R-side outside the sample. After adding the S&P 500 index, the explanatory power has increased by 0.8%. According to the research of many scholars, for example, Fuwei jiang [12] mentioned in the article, many scholars' explanatory power of adding variables is very low. As long as the newly added explanatory variables can increase the explanatory effect of the original benchmark model by more than 0.5%, it shows that the explanatory effect of this variable is very good.

Out of sample test
Although in the intra sample regression, the S&P 500 index has enhanced the interpretation effect on the basis of the original model, the out of sample test is obviously more important and practical than the intra sample regression, and the core significance of financial forecasting is also related to this. At the same time, many scholars pointed out that although some variables have good explanatory power within the sample, they can no longer perform well outside the sample. For example, Goyal and welch [13] mentioned this phenomenon in the article. Here we use the previous R-square outside the sample to test the prediction effect, that is, whether the mean square prediction error generated by the extended model is less than that of the basic model. Here, we divide the 348 total observed values into two parts, 312 in one part and 36 in the other. The former is within the sample and the latter is outside the sample. We use the extended window to carry out regression and predict the volatility from July 2017 to June 2020.  It can be seen from the test results outside the sample in the above table that the R-square outside the sample is significantly positive, and the value reaches 6.99% after passing the 5% significance test, indicating that the prediction effect outside the sample is significantly improved compared with the previous benchmark model. As mentioned earlier, as long as the significance of 0.5% can be improved, the prediction effect is good. Here, the T value is only 1.693, but the corresponding P is 4.95%, because the CW test is a one tailed test. If it is a two tailed test, it is only about 10% significant. It can also be seen from the comparison of the above figure that, in general, the volatility predicted by the benchmark model is closer to the real value than the benchmark model in most of the time. In a word, from the performance outside the sample, the S&P 500 index can better predict the volatility of China's stock market.
Compared with the out of sample forecast test, investors will pay more attention to whether the improved model can be used to earn more income. Therefore, after the out of sample statistical test, it is necessary to carry out economic test. We assume that the investor's utility function conforms to the mean variance combination, and he will distribute his wealth between risky assets (stock index) and risk-free assets (national debt) to maximize his utility, which is also the practice of many scholars in the academic community, such as Guidolin and na [14]; Rapach et al [15], the utility that investors get from this portfolio is: is the investment weight of investors in risk assets, is the part where the return on risk assets exceeds the risk-free return, that is, the excess return, , represents risk-free return. γ is the investor's risk aversion coefficient, γ the larger the investor is, the more risk averse it is. The data of risk-free return here is from Rex database.
(.) and (.) represents the expected return and variance of the portfolio.
By maximizing the utility function, we can get the optimal investment proportion of the stock index in t+1 month in advance: Here +1 and +1 2 is the excess return and volatility predicted in t+1 month. Here we use the historical mean to predict the excess return, which is a good historical benchmark model in predicting the stock return. According to Goyal and welch [13], it is difficult for a single model to significantly beat this historical mean model in out of sample prediction. Therefore, the optimal investment ratio here only depends on our predicted volatility. For the benchmark model and the extended model, the predicted excess return rate is the same.
Obviously, the optimal investment ratio here is also related to the risk aversion coefficient of investors. If the risk aversion coefficient is larger, it means that the proportion of investment in the stock index will decrease. Here, we will follow the settings in Yudong wang [8], and γ if it is set to 3, it is more in line with the situation of most investors. At the same time, we also set the investment weight here between 0 and 1.5 to make it more realistic. If it exceeds 1, it means that investors will borrow from banks or other institutions to buy more stock indexes. If it is greater than 1.5, it means that investors have borrowed too much money, which is not in line with the situation of rational investors. At the same time, we are not allowed to short sell stock indexes, because China has no mature securities lending institutions and markets.
Therefore, the yield of investors in t+1 month is: +1 and +1, is the true value, which is not the result we predicted before. We use the popular deterministic equivalent income (CER) and sharp ratio (SR) to evaluate the performance of the portfolio: Here ̂ is the expected return of the portfolio constructed outside the entire sample, 2 is the variance of the portfolio return constructed outside the whole sample. Here, we calculate the difference between the extended model and the benchmark model to determine the equivalent income, and then multiply it by 1200 for annualization. At the same time, we also annualise the sharp ratio, and multiply the difference by √12. We can see from the table that the extended model can bring investors an additional 2.70% of the determined equivalent return every year, and the excess return per unit risk is also 1.91% more. So to sum up, our extended model can beat the benchmark model in the out of sample test, whether for statistical test or economic test. Therefore, the S&P 500 index can indeed better predict the volatility of China's stock market.

Rolling window prediction
In the above evaluation of the performance outside the sample, we use the extended window to calculate the R 2 outside the sample. Here we will change an estimation method and replace the previous extended window estimation with a rolling window. This change is used to test whether the S&P 500 index can predict the volatility of China's stock market. As the name suggests, the rolling window is to keep the whole number of observed values unchanged when predicting the values outside the sample. When predicting the volatility of the latest period, the oldest period information is deleted and the closest period information is added. For example, the initial estimate is m, and the coefficient of regression of M data is used to calculate the volatility of the m+1 month as the prediction value. Then, when predicting the volatility of m+2 months, the number of observed values in the sample is still m, but the data of the first period is deleted and added to the data of m+1 period, and then the coefficient of regression of these data is used to predict the volatility of m+2 months. And so on to the last issue. It is consistent with the above. There are 312 values in the sample, and the volatility in the 36 months from July 2017 to June 2020 is predicted. It can be found from the above table that when the rolling window is used for prediction, the R 2 outside the sample is still positive, reaching 5.67%, and the p value of CW test has also passed the significance level test of 10%. It shows that the S&P 500 index can still enhance the prediction ability of the model when the rolling window is used for prediction, and the prediction ability of the extended model is not worse than that of the benchmark model due to the change of estimation method. At the same time, it should be noted that when changing to rolling window prediction, the R-square and pvalue T statistics outside the sample are smaller than the extended window, indicating that the best way to predict the monthly volatility is to use the extended window prediction, which can improve the prediction accuracy.

Multi period forecast
In the previous article, we predicted the volatility of the next period. Here we will extend the prediction period. The prediction model here is: +1: +ℎ is the average value of volatility in h months from t+1 to t+h. Change the single period forecast into multi period forecast to reflect whether the S&P 500 index can predict the changes of China's stock market. In the multi period forecast, we will set the forecast period as 3, 6. It can be seen from the above table that in the multi period forecast, the R-square outside the sample in the three-period forecast is positive, and the coefficient is significant, indicating that the S&P 500 index can also play a role in the three-month volatility forecast. However, when the expansion reaches 6 months, not only the significance fails, but also the R-square outside the sample is -3.18%, indicating that the prediction effect of the basic model is better than that of the extended model, reflecting that the S&P 500 index fails to predict the volatility in 6 months. This is mainly because the information of the US stock market has been digested by the Chinese stock market after six months, so it can no longer play a predictive role. This is similar to the idea put forward by Wang et al [16]. In his article, he pointed out that the activity of the stock market will respond to the fluctuation of crude oil only within a certain period of time. This phenomenon also exists here. The response of the Chinese stock market to the U.S. stock market will exist in the short term, but it will become invalid after a certain period of time.

Substituted dependent variable
The purpose of this paper is to study the impact of the S&P 500 index on the volatility of China's stock market. In the above content, we all take the Shanghai Composite Index as the research object. Here, we will use the Shanghai and Shenzhen 300 index to replace the Shanghai Composite Index, so as to better cover China's stock market. Since the CSI 300 index was founded later than the SSE index and was not issued until January 2005, our entire sample period was from January 2005 to June 2020, so the total observed value was 186. Similarly, in order to keep consistent with the above research, we still predict the volatility in the last three years, and the period outside the sample is from July 2017 to June 2020. According to the above table, in the out of sample prediction, the out of sample R-square is 4.09% and passes the significance level of 10%, so our extended model performs better than the benchmark model. Thus, when the CSI 300 index is used as the research object, the previous conclusion is still valid. The S&P 500 index can indeed predict the volatility of China's stock market to a certain extent.

Conclusion
In this article, we use autoregressive model and its extended model to predict the volatility of China's stock market. The extended model further adds the monthly volatility of the S&P 500 index to the benchmark model. The regression results within the sample show that after adding the S&P 500 index, the explanatory ability of the whole model is rising, and the sign and significance level of each explanatory variable still maintain the original level. In the prediction performance outside the sample, in the statistical sense, we use the out of sample R-square and CW test to compare the prediction effects of the benchmark model and the extended model. According to the R-square and CW test outside the sample, after adding the S&P 500 index, the prediction effect of the extended model is obviously better than that of the benchmark model. In the economic sense, we use the popular deterministic equivalent return (CER) and sharp ratio (SR) to compare their advantages. According to the economic test, our extended model still performs better than the benchmark model. In addition, we conducted various robustness tests to ensure the rigor of our research. In the robustness test, we used the rolling window prediction to replace the previous extended window prediction. The results show that under the rolling window prediction, the prediction effect of the S&P 500 index is good, but it is worse than that of the extended window. Therefore, the extended window is generally used to predict the monthly volatility. Second, in the robustness test, we extended the previous single period forecast to multi period forecast. The forecast perspective includes 3 months and 6 months. The results show that the S&P 500 index can play a role in the forecast of 3 months, but its role is completely invalid in the forecast of 6 months, which shows that the index can only predict the situation of China's stock market in a certain period of time. Third, in the robustness test, we changed the previous dependent variable Shanghai Stock Exchange index into Shanghai and Shenzhen 300 index. The results show that the S&P 500 index can also play a predictive role. Thus, combined with the above, we come to the conclusion that the S&P 500 index can predict the volatility of China's stock market.