Research on the components of CSI300: Perspective from quantitive finance

. This paper takes Shanghai and Shenzhen 300 component stocks as the research object. Specifically, this paper selects stock market data, risk data and corporate financial data from March 2020 to March 2022 for quantitative investment research, in order to obtain effective influencing factors to explain excess returns and obtain timeliness and positive investment returns. The empirical process is summarized as follows. Firstly, the factor data obtained from CSMAR are analyzed by time series single factor combination analysis to obtain three effective influencing factors: turnover rate, fluctuation and capital debt ratio. Then, Fama-MacBeth regression is carried out to verify the independent influencing factors and whether there is interaction. The regression results show that the intercept and coefficient are relatively obvious, these three factors have a good explanation effect on the stock excess return.


Introduction
High speculation and volatility have always been the prominent characteristics of Chinese stock market. [1] China's stock market has been developing for several years. Major institutional investors or individual investors are trying to find effective investment profit methods. Based on the factor analysis method of quantitative investment, various information reflected by individual tickets are analyzed to obtain an effective investment model to overcome the market. According to the logic of stock value, the higher the value of the parent company is, the higher the stock price will be. The value of the parent company is related to various factors, such as asset-liability ratio, sales cash ratio, asset return rate and asset turnover rate. The stock price fluctuation of the company is related to the risk factors of the company, such as β value, correlation coefficient of risk factors, non-systematic risk and systematic risk. It may also be related to the trading information of the stock market, such as price-earnings ratio, net market rate, market sales rate, turnover rate, circulation market value and so on. The data used in previous studies are mostly 5-7 years ago. The objective of this study is to build a multi-factor investment stock selection model by univariate analysis and regression analysis based on the data of the company ' s stock price impact factors. The data are the quarterly stock data of Shanghai and Shenzhen 300 index stocks from June 2020 to March 2022. The objective is to build a more reasonable and timely multi-factor stock selection model to provide investors with an updated stock selection analysis model, so as to provide help for the establishment of the correct portfolio in the current stock market and adapt to the needs of the Chinese stock market.
Globally, quantitative investment has been developed for more than 50 years. In terms of investment strategy, it almost covers the whole process of investors ' investment. It has a very wide range of research in quantitative stock selection, quantitative arbitrage, asset allocation and risk control. In China, many scholars have constructed many different types of quantitative stock selection models by drawing on foreign advanced quantitative models. Wang and Liu [2] used Shanghai and Shenzhen 180 index stocks are taken as the research objects. Based on the regression method, the factor model is quantitatively selected, the portfolio model with relatively stable positive returns is obtained, and the corresponding quantitative timing strategy is set to control investment risk. Lin [3] used python to analyze the financial data of sample stocks and constructed a quantitative portfolio to achieve more than four times the investment income within a decade, which was suitable for stock selection in the industry. Wang [4] proposed an eight-factor model index system by analyzing the quantitative stock selection models at China and abroad and using the six-factor model and realized a more accurate prediction of stock price rise and fall in 2013 by using random forests. Taking Shanghai and Shenzhen 300 as the stock pool, Lv [5]constructed a quantitative stock selection model that continuously defeated the market，and introduced the vector algorithm to analyze the long-term dominant stocks.
The objective of this study is to find more effective influencing factors in China' s stock market in the past two years and select the constituent stocks of China' s CSI 300 index for research. Firstly, the research ideas and research methods of this paper are stated. After the data are processed, the single factor combination analysis is carried out in the next part. Then, the Fama-Macbeth regression is carried out to eliminate the influence between factors and obtain effective factors. The last part is the summary and reflection.

Research route
The research idea of this paper is that the volatility information of stocks depends on various market information, risk factor and financial information reflected by stocks themselves. From the perspective of market information, the information reflected by the stock market almost reflects the investors' investment attitude towards the stock. Risk information such as factor (β), market correlation coefficient and stock liquidity also reflect the rising trend of stock prices in the future and the investment of investors. The financial information released by enterprises to the society also has potential messages. The information reflected by investors or institutions will also affect the trend of stock prices. The market information, risk information and financial information of enterprises are listed below. Book-to-market ratio； Capital-liability ratio This paper uses the quarterly data of the Shanghai and Shenzhen 300 index stocks from April 2020 to March 2022, which comes from the CSMAR database. The stocks that are listed less than 6 months are eliminated and the stocks that display extreme values of each factor are also eliminated. The financial data come from the enterprise financial statement database of CSMAR research series. The book market value ratio to asset-liability ratio is calculated by the enterprise financial data such as the number of issued shares, the number of circulating and uncirculated shares, the total debt, the total assets and the total equity.
In the above equations, BM represents book market value ratio, DCR represents corporate capital liabilities ratio, OE represents owner equity. TS represents the number of tradable shares, nTS represents the number of untradable shares, Cp represents the closing price. Ta represents total assets and Tl represents total liabilities.

Data processing and research methods
The analysis method used in this paper is the single variable grouping test and Fama-Macbeth regression. The stock is divided into nine periods according to the quarter, and the information contained in the stock is contained in it. Because some factors do not have quarterly data, for quarterly data such as PE, PS, PCF, PB, turnover rate and liquidity indicators that cannot be obtained, this paper selects the monthly data for the first three months of the time point for time series weighted average. Firstly, descriptive statistics and pre-processing are carried out on the selected factors. According to the correlation test, it is found that the market value of enterprise circulation has a high correlation with the book value ratio, COR and R 2 , risk factor β and non-systematic risk has a high correlation. Therefore, the non-systematic risk and the market value of enterprise circulation are eliminated.
The test method is the single-factor combination test. [8] Firstly, the stocks without data are eliminated, and then the stocks are divided into k groups according to the sample size. The number of samples selected in this paper is 328. After the stock information processing, the stocks are sorted according to the size of the studied factors. The yield data corresponding to the top 30 factors are selected to form the first group of data, and then the yield data corresponding to the last 30 factors are selected to form the second group of data. The average values of Yn,t and Y1,T of the two groups are calculated to obtain YDiffi. According to whether the obtained time series values are significantly different from 0, one can determine whether there is explanatory power.
The formula is as follows.
The Fama-Macbeth regression is carried out on the stocks that have explanatory power for the excess return of stocks and are obviously indigenous. [9] The factor data of the first quarter and the return date of the next quarter are regressed, and the intercept term μi and the correlation coefficient σi are obtained. The time series μi and σi are obtained and the single sample T-test is carried out. Then all the effective values and excess returns are regressed and μ and σ values are recorded. Then determine the independent explanatory power of effective factors according to the regression results.

Single factor combination analysis
In the analysis, PS, PE, PCF and PB are first tested by single factor grouping, and the return rate is the sum of three-month excess return rates. PE and PB have small missing values in some samples. PCF has a large number of missing values, and the number of samples is only 200, so the sample size of each group is adjusted to 20. Group1 represents the group average with the smallest ranking result, and Group10 represents the group average with the largest ranking result. The excess return is multiplied by 100 to facilitate the result analysis.
= ℎ * 100 (7) Return is the data used for data analysis, and monthly return is the original data. The analysis results are as follows. The analysis results of these four factors are not very ideal, and the t value of the first group of PE, PCF and PB is very small, with almost no difference from 0. The T value obtained by subtracting the maximum group from the minimum group is not large enough. Therefore, these four groups of factors have no ability to explain the yield rate.
Next, the turnover rate, change ratio, liquidity index and risk factor β are verified. Since both the change ratio and the risk factor β are from the data of CSMAR, the data are not processed with multiple deviations. The analysis results are as follows. The results of the above four factors show that the T value of the maximum group minus the minimum time series of liquidity and β is not obvious, and the diff value of liquidity is obviously negative. Therefore, these two factors cannot explain the stock return rate. In the turnover rate and the change ratio index, it can be seen that the T value of the maximum group minus the least group time series of these two factors is very obvious, indicating that these two factors have explanatory power for the return rate.
Finally, the correlation coefficient, R 2 , Book-Market value ratio and Debt to capital ratio analysis, the results are as follows. The above four factor analysis results show that the diff results of the correlation coefficient, R 2 and the book-to-market ratio factor are not significantly different from 0, and they have weak explanatory power for the return rate. The T value of the Debt to capital ratio is large, and they have strong explanatory power for the return rate. So far in the initial factors to be tested there are three factors to explain the rate of return: turnover, change ratio and Debt to capital ratio.

Fama-Macbeth regression
After obtaining the initial explanatory factors, in order to further test whether the predictive ability is affected by each other and verify the predictive ability of specific factors on stock excess returns, Fama-Macbeth regression is used to control other cross-sectional information to affect returns. Returns in the next quarter were used as explanatory variables, including turnover, change ratio and DCR. Table 5 reports the summary of regression results of each period. The average values of coefficients are μt, σ1, σ2 and σ3.
, +1 = 0, + 1, , + , , +1 = 0, + 2, ℎ , + , , +1 = 0, 3, , + , , +1 = 0, + 1, , + 2, ℎ , + 3, , + , There is a cross-sectional relationship between the meaning independent variables Turnover, Change Ratio, and Dcr represented by σi and the excess return rate of the dependent variable. When there is more than one independent variable in the regression test, the statistical significance indicates that after controlling the effect of other independent variables, it still has a relationship with the dependent variable. Usually, the T statistic of the sequence value needs to be obvious at the 5 % statistical confidence level, that is, the p statistic is less than 0.05.
In the regression results of the three factors, the average intercept (μ0) was 1.33, the T statistic was 0.709, the average σ1 of turnover rate was 0.372, the standard deviation was 0.266, T statistic was 1.44, and the results were very obvious. The average value of σ2 of the change ratio is − 0.744, the standard deviation is 1.55, the T statistic is − 1.69. The average value of σ3 of the capital-liability ratio is − 0.00194, the standard deviation is 0.0158, the T statistic is − 0.1385. The R square of the interface regression and the adjustment of R direction are 0.044 and 0.0259, and the number of observations is 314.5.
The regression results are all significant, which can reasonably explain the cross-sectional relationship between turnover rate, Change ratio, capital-liability ratio and the return rate in the next quarter. The significant indigenity of the intercept term also proves that the prediction ability of the other two variables on excess return is still significant when any of the three variables is controlled.

Conclusion
This study finds that the influencing factors of three effective image stocks' excess returns show relatively strong explanatory power from mid-2020 to March 2022. In the Fama-Macbeth regression, the intercept term and the coefficient term are significantly indigenous, showing a strong correlation. It can be seen that the Change ratio, Turnover rate and Asset-liability ratio of enterprises can all be related to the stock return of enterprises, and the correlation is strong in the quarterly range. In the period after 2020, except for ST shares, GEM and financial stocks, these factors are effective factors in China' s A-share market. It has obvious reference significance for investors in future investment operations.
The lack of this study may be reflected in three aspects: First, in terms of data, the financial data obtained is less, mostly related to the size and market value of the enterprise, not the earnings per share, net value per share, Acid-test Ratio and other details of financial factors. And in the selection of time period only picked nearly two years of data, time span is less, in the regression analysis of T test may be there is no obvious situation. The second point is that the analysis method of bivariate combination analysis is not selected in the data analysis process, and the Fama-Macbeth regression alone cannot fully explain the impact of the interaction between the two variables on the yield rate. Finally, it is found that the effect of the change ratio in the single factor analysis is positive, while the coefficient in the second regression is negative, which may be due to the large span of the analysis interval according to the quarter, which is susceptible to the reversal effect. With the concept of quantitative investment gradually rising around the world, many financial industry practitioners pay more and more attention to quantitative investment. Research on quantitative investment strategy and investment income has gradually become a hot issue of financial practitioners. [10] Many influence factors related to research is also worth learning, in the future work will continue to improve my research.