Financial Time Series Prediction Based on EMD-SVM

. The market plays an essential role in the national economy and society, and people pay more and more attention to the investment in the capital market. Every valuable investment needs the guidance of scientific theory. With the improvement of China's securities market system, more and more researchers have conducted in-depth research focusing on the development of the stock market. This paper will study the stock price prediction algorithm, select the Bank of Ningbo as the research object, and propose a prediction model based on the combination of EMD and SVM algorithms. This paper first decomposes the original sequence by EMD and comprehensively considers the three related factors of the Shanghai Composite Index, Shenzhen Composite Index, and China Merchants Bank in the prediction model. Then, the SVM model is used to predict multiple decomposition sequences, respectively, and the prediction results are obtained after integrated processing. Then, regression analysis is carried out on them, and the weight is designed according to the constructed regression analysis model. The final prediction result is obtained after reconstruction. Compared with the basic SVM model, the combined prediction model constructed in this paper has a better prediction effect, and the prediction results are more objective and scientific.


Introduction
Today, the market plays an important role in the national economy and society, and people pay more attention to capital market investments. Every valuable investment needs the guidance of scientific theories, such as understanding the box of stock price volatility, the reasonable price, the potential and future of stocks, etc. Many securities analysts and investment advisers are keen to predict stock prices and can give investment suggestions such as recommended ratings and target stock prices. With the proper investment guidance, investors can invest with confidence. Stock price forecasting plays an important role in guiding funds, which is conducive to the rational allocation of capital, the delicate operation of the capital market, and the reduction of financial risks. Therefore, stock price forecasting has become an important research direction and has great research value. This article is produced under this background, which provides some help to meet this demand.
Meanwhile, the stock market prediction at home and abroad has a long history. Many researchers continue to introduce various models and parameters to study, which makes the research on stocks mature gradually. However, the stock price is affected by many factors, such as economic situation, political policy, social environment, market investment sentiment, etc. At the beginning of the study, researchers only regarded stock data as traditional time series [1][2][3]. The technology and research methods of studying stock price data through time series have been perfect. However, because many factors affect the stock price and are often unstable, the previous stock forecasting methods have considerable limitations. As people enter the era of big data analysis and intelligence, machine learning has become the hottest and most popular research direction, and the support vector machine model has also achieved remarkable results in practical application.
Stock price forecast has been paid great attention for a long time. Prediction methods can be divided into different and causal models. The difference model is mainly the ARIMA model [4][5][6] and grey GM (1, 1) [7][8][9] model, and the causal model mainly includes neural network and multiple linear regression model [10][11]. If ARIMA and GM (1, 1) models are used to predict time series data, they must be stable. If the data is unstable, the law cannot be captured. For example, stock data cannot be predicted is that stock data is unstable and often fluctuates under the influence of policies and news.
For the causal relationship model, if you want to predict the dependent variable, you need to predict the independent variable first, but sometimes it may be more challenging to predict the independent variable than the dependent variable.
The EMD-SVM algorithm in this paper is suitable for predicting unstable time series. It is less affected by noise and can predict unstable sequences. The above factors make up for the defects of the four algorithms in the last paragraph. In addition, the existing algorithms do not fit the prediction results, and there are some errors in the prediction results. The EMD-SVM algorithm in this paper has achieved this very well.
The financial time series data itself has great volatility. The whole time series is permeated with high-frequency and low-frequency data. EMD [12][13][14] decomposes the signal according to the time scale characteristics of the data itself. SVM is a generalized linear classifier that classifies data in a supervised learning way. Its decision boundary is the maximum margin hyperplane for solving the learning samples. Therefore, this paper uses the combined model based on EMD-SVM to predict the financial time series [15]. Taking the financial time series (Bank of Ningbo from 2007 to 2022) as the research object, this paper will predict the financial time series based on EMD-SVM. The idea is to decompose the time series by EMD-SVM to establish the prediction model and then use EMD reconstruction technology to obtain the prediction value of the original time series.

EMD
EMD can decompose any signal into the sum of several IMF and a remainder [11]. Generally, signals or functions meeting the following two requirements are defined as IMF: (1) In all statistical series, the total amount of extreme points (including maximum points and minimum points) and the total amount of zero-crossing points should be equal, or at most similar to some; (2) At any time, the average number of the upper envelope points defined in the local overall maximum of the signal and the lower envelope points defined in the overall local minimum is zero.
The specific steps of EMD decomposition are as follows: (1) Assuming that the signal is ( ), and it takes the sequence composed of the local mean of its upper and lower envelopes as 1 ( ), then: For non-linear and non-uniform data, one processing is usually insufficient to form IMF, and some asymmetric waves may still appear. At this time, regard h 1 (t) as the data is to be processed, repeat the above operation k times, and you can get: When the IMF conditions are met, the first IMF is obtained, which is recorded as 1 ( ) = ℎ ( ).
(2) Separate the first IMF from the signal and the remaining signal 1 ( ) is: (3) As the signal is to be decomposed, repeat the steps of equations (1) to (3), and then decompose it in turn to obtain: Until the remaining information has little impact on the research content, it becomes a monotonic function, and IMF can no longer be selected. So far, the signal has been decomposed into n IMF, that is, the sum with a remainder: Until the information in the remaining ( ) has little impact on the research content, or it becomes a monotonic function, and IMF can no longer be selected. So far, the signal x( ) has been decomposed into n IMFs, that is, the sum of ( ), (i = 1, 2, … , n) and a remainder r n (t): Equation (5) shows that the EMD decomposition of the signal has complete properties, which are determined by the analysis process itself.

SVM
SVM is one of the most used machine learning classification algorithms. Support vector machine (SVM) solves the linear indivisibility of original data sets in multidimensional space by mapping multidimensional space to higher dimensions. The main idea is to find the N-1-dimensional hyperplane in the n-dimensional space so that the distance between different sample points and the hyperplane is the largest. SVM minimizes structural risk and the confidence interval on the premise of controlling empirical risk.
Assume that the training set is ( , ) ∈ , set the error bandwidth and penalty function, and transform the maximization interval problem into an optimization problem: Where and are the two parameters of the discriminant function, represents the relaxation variables of the sample points , and is the penalty coefficient of the relaxation variables. The kernel function projects the original feature data into the high-dimensional space. The required segmentation hyperplane can be obtained by solving equation 2.3, and then the corresponding label can be predicted by inputting the characteristics of the test set.  The EMD-SVM decomposition prediction algorithm in this paper is different from the basic EMD-SVM algorithm, and its flow chart is shown in Figure 1. This paper cites three influencing factors and makes multiple regression to make the prediction more accurate and reasonable. The flow chart of the improved SVM decomposition algorithm in this paper is shown in Figure 2. below.

Prediction model of Bank of Ningbo based on EMD-SVM
However, this paper improves based on EMD-SVM and introduces three related factors: the Shanghai Composite Index, the Shenzhen Composite Index, and the China Merchants Bank, which improves the accuracy and preciseness of prediction. The steps of the EMD-SVM algorithm in this paper are as follows: (1) EMD decomposition of the original sequence; (2) Using the Shanghai stock index, Shenzhen stock index, and China Merchants Bank to predict each decomposition sequence by SVM; (3) After reconstruction, the prediction sequences under their respective conditions are obtained, and they are analyzed by regression analysis respectively; (4) Finally, according to the original series' three and the relevant factors, the final prediction series is combined according to the proportion.
The EMD-SVM algorithm flow chart in this paper is shown in Figure 2.below:

Data acquisition and preprocessing
The logarithm yield of the daily closing price of Bank of Ningbo from July 20, 2007 (the second day after the listing of Bank of Ningbo) to February 18, 2022, in the sample of this model, excluding holidays and suspension, totals 3524 data and takes the first 70%, that is, the first 2467 data, as the initial value to establish a prediction model to predict the next 1056 values. Data source: Netease financial database.
Three factors related to the share price of the Bank of Ningbo are selected to join the SVM prediction model. This paper selects the Shanghai stock index, Shenzhen stock index, and China Merchants Bank. The reason for choosing the three influencing factors from the subjective logic is: that the Bank of Ningbo is the stock of the Shenzhen composite index, and the first thing to consider is Shenzhen composite index. At the same time, as a powerful and important sector, the trend of the market is closely related to it, so the Shanghai Stock Exchange index was selected. Bank of Ningbo and China Merchants Bank are two major banks with relatively good equity and prosperous performance in the banking sector. They are located in the same sector and have a close linkage. Therefore, China Merchants Bank is selected. These three factors will make up for the impact of performance and investment sentiment.

Decomposition results based on EMD
The original data is decomposed into 1~9 and a residual value . By EMD, and their relationship is: The decomposition results are shown in Figure 3 below.   Next, SVM prediction is required for each IMF and res. Here, the Shenzhen composite index is taken as an example. Figure 4 shows the EMD-SVM prediction of imf1 ~ 9 and res. under the Shenzhen Composite Index. After EMD reconstruction, the EMD-SVM prediction under the Shenzhen Composite Index is obtained.
Similarly, after EMD reconstruction, the EMD-SVM prediction results under Shanghai Stock Exchange Index and China Merchants Bank are also obtained. The EMD-SVM prediction results under the three related factors are shown in Figure 5.
After EMD Analysis of the original series, SVM prediction is carried out for each decomposed series by using Shanghai Composite Index, Shenzhen Composite Index, and China Merchants Bank. After reconstruction, the prediction series under their respective conditions can be obtained, but there is still room for improvement in the current prediction effect. Therefore, this paper considers their regression analysis to obtain the fitted Shanghai Composite Index, Prediction series under Shenzhen Composite Index, and China Merchants Bank. Finally, according to the original series' three and the relevant factors, the final prediction series is combined according to the weighted average of the proportion.

Comparison and analysis of prediction results
The prediction results using only SVM are shown in Figure 7, and the prediction effect is not very good. The final prediction result of EMD-SVM is shown in Figure 8. The prediction is relatively accurate and fits the original sequence very well.  Next, four error indicators Mae, RMSE, MAPE, and R2, will be used to analyze the error of the prediction results of SVM and EMD-SVM.

RMSE
RMSE (Root Mean Square Error) formula is shown in (8), which represents the expected value of the square of the error. Its value range is [0, +∞). It is used to compare the prediction errors of different models of a specific dataset rather than the prediction errors between datasets because it is related to the proportion. Generally, a lower RMSE is better than a higher RMSE.

MAE
MAE (Mean Absolute Error) formula is shown in (9). Its value range is [0, +∞). When the predicted value is completely consistent with the real value, it is equal to 0, that is, the perfect model; the larger the error is, the larger the value is-the smaller the Mae value, the better the prediction model's accuracy.

MAPE
The MAPE (Mean Absolute Percentage Error) formula is shown in (10). Its value range is [0, +∞). A MAPE of 0% indicates a perfect model, and a MAPE greater than 100% indicates a poor model. The smaller the MAPE value, the better the accuracy of the prediction model.
3.3.4 R 2 R 2 (R-Square) formula is shown in (11), where the numerator represents the sum of the square differences between the real value and the predicted value, similar to the mean square error MSE; The denominator part represents the sum of the square difference between the true value and the mean, similar to variance. Judge the quality of the model according to the value of the R-squared. The value range is [0, 1]. If the result is 0, the model fitting effect is inferior; If the result is 1, there is no error in the model. Generally speaking, the larger the R-squared, the better the fitting effect of the model. The error results are shown in Table 3. Through comparison, the error-index of the prediction model constructed in this paper is small.
The stock price is affected by many factors, such as performance, investment sentiment, etc. The financial time series is unstable and irregular. Then, the traditional SVM algorithm can not effectively consider many kinds of influencing factors. It has certain limitations and large errors for the intermittent and highly volatile financial time series prediction. It can be seen from figure 3.8 that the EMD-SVM model has an excellent fitting degree, which greatly fits the original sequence. The correlation coefficient is as high as 0.997, and the prediction result is good with a high fitting degree. Compared with the traditional SVM algorithm, the EMD-SVM algorithm constructed in this paper can effectively improve the prediction effect.
In this paper, SPSS software is further used for correlation analysis. After the research, the correlation coefficient between the sequence predicted by SVM and the original sequence is 0.85, and the correlation coefficient between the Shanghai Stock Exchange Index and the Bank of Ningbo is 0.562. After fitting, the correlation coefficient is increased to 0.995. The correlation coefficient between the Shenzhen Composite Index and the Bank of Ningbo is 0.608. After fitting, the correlation coefficient is increased to 0.997. The correlation coefficient between the Shenzhen Composite Index and the Bank of Ningbo is 0.928. After fitting, the correlation coefficient is increased to 0.987. The correlation coefficient between the final prediction series and Bank of Ningbo is as high as 0.997, which is in good agreement with the original series. The feasibility of applying the EMD-SVM algorithm to stock price forecasting is verified. Compared with the traditional SVM algorithm, the prediction effect of this algorithm is better.

Conclusion
To aim at the problem of stock price forecasting with substantial volatility in financial time series, this paper constructs a combined forecasting model based on EMD-SVM, considers a variety of influencing factors in the model, forecasts the financial time series, and makes a practical analysis with SPSS data to verify the effectiveness of this model. The main conclusions of this paper are as follows: (1) Financial series sometimes show the characteristics of instability and irregularity. The traditional SVM algorithm has certain limitations in processing such signals, and there are large errors in the prediction of financial series. (2) Stock data is a kind of time series with noise. EMD decomposition can effectively reduce data noise, greatly save information, and significantly improve the stock price prediction algorithm. (3) EMD-SVM algorithm has specific feasibility and practicability for predicting stock price. According to the actual situation, a specific EMD-SVM model is constructed.
This paper constructs a model framework based on the EMD-SVM stock price prediction algorithm, which provides the possibility and reference for the EMD-SVM algorithm to be widely used in stock price prediction in the future. In the future, if you want to predict other stocks or modify the corresponding model according to the current situation, you only need to replace the influencing factors and carry out the corresponding regression test.
As far as investment suggestions are concerned, the share price of Bank of Ningbo is a trend of the combination of band and box. Mastering and understanding it will help to invest in it; From the aspect of management suggestions, the prediction model is applied to the prediction of financial stock model, which has strong anti-noise interference ability and greatly improved prediction ability. It is suitable for stocks with strong volatility and can provide more effective suggestions for investors.