Comparison Between ARIMA Model and OLS Model Based on the Economic Representation

. This paper investigates the comparison between Autoregressive Integrated Moving Average (ARIMA) model and Ordinary Least Square (OLS) model. As two good ways to deal with time series datas, these two methods have been widely used in the economic world. Based on these phenomenon, it has a significant research value in the field of finance. Since ARIMA model and OLS model are fit the historical datas and make prediction, it is important to know about the charateristics of them. In this paper, basic information of ARIMA model and OLS model are mainly discussed, including the definition, modeling process and summary of the main protries. Then, the paper will do comparison from three parts: applicable data types, treatment of errors, validity. And it is concluded that both models take a good fitting effects. Finally, this paper derives the practical applicability of ARIMA and OLS models to be provided to research members as a reference.


Introduction
In recent centuries, as economic and financial development with the globalization, more and more people join the business market, related financial knowledge has also been updated and developed, and gradually popularized. At the same time, in the face of changing market prices and investment directions, people need some scientific forecasting methods to avoid most of the risks and ensure the maximum return in the complex business world. With the development of modern science and technology, people gradually apply mathematical methods and computer analysis more deeply in financial data such as stocks to make various financial time series.
Time series model is a kind of method to observe a variable or several variables over time, and the type of data is serially correlated. In the field of economics and finance, time series are widely used in the analysis and forecasting of many data, such as stocks, virtual currencies, asset prices, volatility, etc., as they are usually presented through a time series. Since many economists around the world conduct research on economics and finance, time series are a fundamental part of their research. Approaching has a logical aesthetic and the model is usually in the form of a mathematical method or set of methods, Time series model not only has expressive simplicity, but also has better interpretability and comprehensibility, providing greater convenience for further processing, derivation and application [1]. There are numerous theories and techniques for forecasting time series in use today, including the seasonal adjustment approach and the exponential smoothing method. [2]. As two of the most popular models using when studying time series, the ARIMA model and Ordinary Least Square (OLS) focous on the short-term forecast, and is gradually being used in the real transaction and talking frequently in finance. They contain basic part of time series component which are a trend, a seasonal and an irregular component. It is useful that they are also important parts in the world of economic. ARIMA and OLS models have excellent performance, so this paper will show the differences between the two models by comparing them in detail, with the intention of giving readers a clear idea of model selection.
The two models mentioned in this paper are a representative part of time series processing, and the research methods and ideas can generally reflect some of the underlying logic of time series analysis, which has some significance for readers in the real world practical process.

ARIMA Model
ARIMA processes are a type of stochastic process that is commonly used to analyze time series. Box and Jenkins are responsible for the use of the ARIMA methodology in the study of time series analysis [3]. In reality, most series are non-stationary and contain trends, seasonality or cyclicality, and it is often difficult to extract effective factors from historical data because of the diversity and uncertainty of their components. arima model provides a difference-in-difference idea that can extract deterministic factors with good fitting effect. The autocorrelation function and partial autocorrelation function can be used to estimate the stochastic aspect of the time series, from which information such as trend, random variations, periodic component, cyclic patterns, and serial correlation can be identified. [4]. In general, the arima model comes with parameters (p, d, q) where p denotes the order of the autoregressive process; d denotes the order of the difference order; q denotes the order of the moving average process, and is decomposed into three parts AR(p), I(d), MA(q) [5].

Autoregression (AR)
In the ARIMA model, it predicts future changes through autoregression, using a linear combination of past variable values to calculate the coefficients of interest. Based on that, the autoregressive process of order p is denoted AR(p), and defined by equation (1), and for the first one AR(1) is derived as the equation (2), where formula, ∅r represents the fixed constant correspond to X t and is random variable. If it is a pure autoregression function, will be white noise that are with mean of 0 and varriance of 2 like a multiple regression.

Intergrated (I)
In this step, it needs to intergrate the data, and the order of intergrating needs to be determined according to the actual situation. arima model is an autoregressive analysis of historical data, so it is required that it should be a stational series itself. The parameter d represents the d-order difference of this non-stationary time series until it is transformed into a stationary series, thus ensuring the adequate extraction of the historical data of this series. After differencing, further fitting can be ensured by ARIMA model. This is the key step in the ARIMA model, which is essentially an organic combination of differential transformation and ARMA model, so that more time series can be fitted and predicted.

Moving Average (MA)
The moving average part which has a parameter p has a definition as the equation (3), where { 1 , 2 , ⋯ , } are fixed constants and { , −1 , ⋯ , − } is white noise that are random variable with mean 0 and varriance 2 . MA process is invertible because any MA process can be written as an AR process if some constraint is imposed on the MA parameter p.

ARIMA Modelling Process
(1) Acquisition of data: Data from the target study subjects over time were selected, combined and eliminated from invalid data and supplemented with missing data [6].
(2) Determining the smoothness of the series: ADF smoothness test is performed on the data to detect whether it is smooth or not, and only smooth data can be fitted by the ARIMA model.
(3) the non-stationary series for the d-order difference: If the data is found not to be smooth, it is next smoothed by differencing it by d order until it is smooth.
(4) Fitting the smooth series using AR and MA: The identification of the values of the parameters p and q for AR and MA can be obtained by the observation of the autocorrelation and partial autocorrelation functions of the samples. Determining the specific values descend into their equations and determine the final form.
(5) Forecast by model: Model forecasting is the use of a well-developed ARIMA model to project future trends over time based on historical time series data. The ARIMA modeling approach is now used in many fields, in large part because of its success in forecasting, especially in the short term [7].

OLS Model
In the OLS model, its principles are more mathematical, using the thought of statistics, the historical data is considered as a sample and the combined past and future data is considered as a total. Thus, the sample is used to estimate the total. The OLS model is combined with the theoretical analysis of the one-dimensional linear regression model, which is divided into two parts: sample and overall, which are defined as the equations (4) and (5), respectively.
The parameter cannot contain information on the mean of the unobserved elements, according to the conditional mean independence principle. In the model fitting and forecasting period, it may use linear fitting by connecting the variable with coefficient. Meanwhile, it does not know the values of the coefficients of β 1 , β 2 , ⋯ , β n . Finding a suitable value is significant in this process so that it can correctly work and make prediction.
In the ordinary least square, it shows a principle that can be effectively deal with the question. It provides a way by minimize the sum of the squared errors, and OLS model squares the errors to avoid error disappears since some of them are negative. By the equation (6), the least value for the sum of squared errors can be calculated. After minimizing the sum of suquared regression residuals, the parameter β 1 , β 2 estimated value could be calculated, this process is called emstimate which is defined by the equation (7).
Fitting data by OLS regression, it has some algebraic properties: (1) Deviations from regression line sum up to zero, as the equation (8).
(2) Correlation between deviations and regressors is zero, that means the residual is uncorrelated with the , as the equation (9).
(3) Sample averages of y and x lie on regression line, as the equation (10).

Applicable Data Types
There are two forms of time series data: stationary data and unstationary data. A stationary time series is one whose attributes are unaffected by the time at which it is seen. More precisely, if { }is a stationary time series, then for all s, the distribution of ( ,…, + ) does not depend on t. Therefore in it is not independent, that is an unstationary time series. For the OLS model, the core idea is that the expectation and variance of the data as a whole are stable, so the time series data that the model can handle needs to be stationary. For the ARIMA model, it can transform a nostationary series into a stationary series by differencing the data. Differencing can aid in the stabilization of a time series' mean by removing variations in the level of the time series, hence eliminating (or lowering) trend and seasonality [8]. Essentially, only linear relationships can be captured, not nonlinear ones. So whether it is an ARIMA model or an OLS model, they are both used for modeling linear relationships, while ARIMA is able to mine the data more deeply and therefore has a broader applicability.

Treatment of Errors
In terms of the treatment of errors, the OLS model cleverly uses the expectation of error = 0 to achieve a fit to the data and keeps adjusting as the endogenous variables increase. When the sample size reaches a certain number, the parameters are basically equal to the actual ones. However, the moving average method adopted in arima model deals the effect of errors. The moving average method is a way to compute the time series average with a specific number of terms to depict the long-term trend in accordance with the time series' progressive development. The moving average approach can reduce the influence of these elements and assess and anticipate the long-term trend of the series when the values of the time series are subject to periodic changes and irregular changes that have huge ups and downs and do not easily reveal the development trend.

Validity
OLS belongs to multiple linear regression (MLR), and the validity of MLR statistical properties (which is also callde Gauss-Markov theory) is as follows standard assumption: (1) Linear in parameters: The overall model is defined as the equation (11). This equation is flexible enough to adjust the number of variables and change the relevant parameters according to the actual situation. y = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β k x k + μ (2) Random sampling: The OLS model obtains the sample { ( , ) : i = 1, 2 ... n } by random sampling, The thematic idea is to predict the overall situation from the sample.
No matters what the explanatory variable value is, the errors have the same variance, because the explanatory variable's value must not contain any information regarding the variability of the unobserved factors. From MLR.1 to MLR.4,the property that the OLS model is unbiased is derived. This means that, on the one hand, the unbiasedness of the intercept and other slope estimations is unaffected by the inclusion of an irrelevant variable in the model. On the other side, the OLS model predictions may become skewed if the pertinent variable is excluded. In most cases, OLS models are unbiased. [9] The fifth assumption determines that the OLS model has the best linear unbiased estimator (BLUE) which is ^.
While the individual series values that make up a time series have some degree of uncertainty, the variation of the entire series has a certain regularity and can be roughly characterized by the associated mathematical model. This is the underlying notion behind forecasting using ARIMA models. This mathematical model's examination and study allow for the attainment of optimal prediction in the sense of least variance and a deeper comprehension of the time series' structure and properties. [10].

Conclusion
Financial and economical datas are complex and volatile type bacause it is always influenced by many factors, and its use in reality often requires consideration of its endogenous variable relationships, which is the core idea of most time series models. In this study, reliability of time series models has been discussed based on their representations in data fitting and further forecasting. It is about whether the model can clearly reflect the inner laws of financial and economic data and help modelers to make sound decisions based on it. ARIMA model and OLS model as two more representative models, this paper chooses them as the objects to study, starting from their fitting essence, combing the definition and modeling process of both, and summarizing their properties in the procedure. In order to give a more convincing conclusion and based on the elements people should consider, this paper starts a detailed point-to-point comparison of the two models. The comparison between them consist of three parts: data applicability, treament of errors and validity. In terms of data applicability conditions, ARIMA models are able to handle a wider variety of data sets including both stationary datas and unstationart datas while OLS model can just handle the stationary type; in terms of handling errors, both models are able to minimize the impact of errors on the model in different ways to prevent fitting errors; in terms of validity, both models can essentially capture the linear relationship between the dependent variable and the independent variable and reflect it in the model established to achieve prediction. By comparing the above, it can be concluded that both the ARIMA model and the OLS model are regression models with practical significance. For use scenarios can be flexibly applied according to the type of data, and play a role in forecasting in financial and economic scenarios.