Gold future forecasting based on HAR model from 2019 to 2021

. In the international monetary system, gold plays a significant role. Predicting gold prices is a useful and unique skill for anybody. As a result, improving one's ability to anticipate gold futures is critical. The study presented in this paper relates to gold futures predictions, based on heterogeneous autoregressive (HAR) theory, and Heterogeneous Autoregressive model of Realized Volatility (HAR-RV model), coupled with gold's daily trade volume and CBOE Volatility Index (VIX) to create three unique models: Heterogeneous Autoregressive model of Realized Volatility and Trading volume (HAR-RV-T model), Heterogeneous Autoregressive model of Realized Volatility and Volatility Index (HAR- RV-VIX model), and Heterogeneous Autoregressive model of Realized Volatility, Trading Volume, and Volatility Index (HAR-RV-T&VIX model). This paper mainly explores a method to predict the volatility of gold futures. Improve the ability to forecasting the volatility of gold prices is obviously conducive to effectively play the futures, including hedging, risk management, price analysis, and other tasks. The research concludes that adding trading volume and sentiment indicator contributes to a more robust HAR model and performs better on forecasting. autoregressive fractionally integrated moving average models with realized (ARFIMA-RV use the for The Model Confidence Set (MCS) test results show that the HAR-RV model beats other models across all loss functions. The empirical results show that HAR-log (RV) is the best model to model and forecast the Chinese stock market among the 22 high-frequency models after comparing HAR-RV with HAR-log (RV) model in forecasting future volatility based on the MCS test This paper presents four HAR-type models for predicting gold futures market volatility: a benchmark model (HAR-RV) and three modified HAR-type models, HAR-RV-T, HAR-RV-VIX, and HAR-RV-T&VIX, that accounts for gold daily effects, price volatility, trading volume, in addition, CBOE Volatility Index. In this study, we analyzed 5-minute frequency trade data to estimate gold futures price volatility. We first present the standard HAR-RV model, which has both a linear and a logarithmic form, to see if the RV carries a significant amount of information. The logarithmic form of trade volume and CBOE Volatility Index were then included as independent variables to the HAR-RV-T, HAR-RV-T, HAR-RV-T &VIX model, respectively. They indicate that trading volume and sentiment indicators can impact financial models and the future price of gold. Medium-term (weekly) and long-term (monthly) volatility include a significant amount of forecasting information, but short-term (daily) volatility can only be forecast one day.


Introduction
Gold is an essential metal, and it is widely used in commodity futures markets. It is extremely popular because of its high acceptability by financial institutions and the public. Since the early 1970s, the volume of gold produced each year has tripled, the amount of gold bought annually has quadrupled. Nowadays, gold is bought by a diverse set of consumers. For instance, it is used on jewelry, technology, and investors. Gold mining cannot manufacture sufficient demand. Thus, we also recycle certain types of gold, mainly from jewelries. Mining supplied 67%, and recycling supplied 33% from 2008 to 2017. Gold also plays an important role in the international monetary system. The US dollar is linked with gold in 1944 by the Bretton Woods system. However, President Nixon announced that the US would end on-demand convertibility of the dollar into gold which caused the collapse of the Bretton Woods system. Because of it, the status of gold had dropped dramatically. In 2007, the world encountered a financial crisis [1]. This situation led to the skyrocketing of gold prices which increased from around 600 dollars per ounce in 2007 to nearly 1900 dollars per ounce in 2011. Gold associates with a large number of families and institutions. It is commonly used as a store of values and means of payment. As Fang and his team mention, investors generally utilize gold as a tool to hedge inflation due to its diversification [2]. Researchers have verified gold as a typical safe haven during financial crises [3,4]. Therefore, forecasting the gold price is practical and useful for everyone. Therefore, it is significantly important to improve the forecasting ability for the price volatility of gold. It will facilitate the effective exertion of functions in the future market, including hedging, risk management, and price analysis. This paper has introduced related research on the volatility forecasting of gold futures.
The heterogeneous autoregressive theory, abbreviated as the HAR model, attracts many researchers due to its high forecasting accuracy [5]. With the widespread availability of highfrequency financial data, Andersen and his research partners found that implied high-frequency-databased realized volatility (RV) has a better forecasting effect than the popular GARCH and stochastic volatility (SV) models. Corsi [6] proposed a heterogeneous autoregressive model with realized volatility (HAR-RV model), which proved it is significantly better than the GARCH model and autoregressive fractionally integrated moving average models with realized volatility (ARFIMA-RV model) at forecasting financial market volatility. In this paper, we use the HAR-RV model for volatility forecasting in the gold futures market.
The Model Confidence Set (MCS) test results show that the HAR-RV model beats other models across all loss functions. The empirical results show that HAR-log (RV) is the best model to model and forecast the Chinese stock market among the 22 high-frequency models after comparing HAR-RV with HAR-log (RV) model in forecasting future volatility based on the MCS test [7]. This paper presents four HAR-type models for predicting gold futures market volatility: a benchmark model (HAR-RV) and three modified HAR-type models, HAR-RV-T, HAR-RV-VIX, and HAR-RV-T&VIX, that accounts for gold daily effects, price volatility, trading volume, in addition, CBOE Volatility Index. In this study, we analyzed 5-minute frequency trade data to estimate gold futures price volatility. We first present the standard HAR-RV model, which has both a linear and a logarithmic form, to see if the RV carries a significant amount of information. The logarithmic form of trade volume and CBOE Volatility Index were then included as independent variables to the HAR-RV-T, HAR-RV-T, HAR-RV-T &VIX model, respectively. They indicate that trading volume and sentiment indicators can impact financial models and the future price of gold. Medium-term (weekly) and long-term (monthly) volatility include a significant amount of forecasting information, but short-term (daily) volatility can only be forecast one day.
The remainder of the paper is organized as follows: Section 1 describes the introduction and background of gold. Section 2 describes the sample and data; Section 3 introduces the econometric model and HAR-RV models; Section 4 states our regression analysis of HAR type models. The last section presents our conclusions.

Data
To assess gold futures price volatility in this research, we used 5-minute frequency trading data. According to research, it was discovered that if the sample frequency is too high or too low, the data would be inaccurate, with the 5-minute sampling frequency having the best accuracy [6,[8][9][10]. We have data from GFIS (Global Financial Information Services) for the period January 1, 2019, to June 4, 2021, totaling 171,899 gold futures numbers; after computing the RV and eliminating the vacancy value, we have data for 743 gold futures count trading days from January 1, 2019, to June 4, 2021. In addition, we have VIX data from Wind Economic Database from January 1, 2019, to July 2, 2021, totaling 500 CBOE Volatility Index.
As seen in Figure 1, gold prices continue to climb from January 2019 to August 2020, with strong swings from March to April 2020, followed by price recovery and an upward trend until August 2020. Gold prices change consistently with a decreasing tendency from August 2020 to March 2021. Then gold prices are starting to climb consistently after March 2021. Figure 2 shows the gold futures' five-minute logarithmic rate of return. We can see that there is yield asymmetry in the gold futures market, where upward fluctuations in futures prices are greater than downward fluctuations, implying that positive price shocks lead to greater yield volatility. Due to the significant impact of the Coronavirus epidemic, the most dramatic volatility was discovered from March to April 2020. This phenomenon also presents in Figures 1 and 3.
From Figure 3, we can see that in March 2020, there is a significant structural change in gold futures volatility. Major alterations in economic conditions, such as policy moves, natural disasters, and oil crises, can cause structural change. In March 2020, the impact of a new crown pneumonia outbreak triggered a dramatic decline in gold futures prices, generating significant volatility. Figure 4 shows the CBOE Volatility Index (VIX) dynamics from 2019/1/2 to 2021/7/2. We can obviously notice a sharp rise in the Volatility Index starting at the end of February 2020, with the COVID-19 pandemic causing sharp swings in stockholder and investor sentiment. This phenomenon is consistent with what is shown in Figures 2 and 3.

Econometric Model
This section uses four HAR-type models to forecast gold futures market volatility, starting with the original HAR-type model (HAR-RV model). In addition, the other three optimum models (HAR-RV-T, HAR-RV-VIX, and HAR-RV-T&VIX) incorporate trading volume, price volatility, and the CBOE Volatility Index.
To assess if the RV contains substantial information, we first offer the classical HAR-RV model, which comprises both linear and logarithmic forms. The HAR-RV-T model was then created by adding the logarithmic form of trading volume as an independent variable. The HAR-RV-VIX model was created by adding the logarithmic form of the CBOE Volatility Index as an independent variable. The HAR-RV-T&VIX model was created by adding the logarithmic form of the trading volume and CBOE Volatility Index as the independent variable. We will go through these models in great depth in the next sections.

HAR-RV mode
According to Andersen and Bollerslev's calculation method of RV (volatility) [8], we assume a trading day t, and the daily trading is divided into M segments, then let , represents the ℎ closing price of the trading day t, for i = 1, 2, 3 ... M. Let , be the logarithmic rate of return for the ℎ period in the trading day t, which can be expressed as The RV (volatility) of trading day t, , which can be expressed a = ∑ , 2 =1 (2) According to Corsi's research [6], (The weekly RV) and (The monthly RV) can be calculated from , as shown follow.
COMEX gold futures have about 308 trading days each year and are available to trade from Sunday through Friday. As a result, the number of weekly trading days is chosen to be 6 when computing weekly RV, and the number of monthly trading days is decided to be 25 when calculating monthly RV. This is also true when calculating continuous and leaping volatility on a weekly and monthly basis.
The logarithmic form of the HAR-RV model:

HAR-RV-T
Based on the HAR-RV model and gold's daily trading volume, we have first improved model, Heterogeneous Autoregressive model of Realized Volatility and gold's Trading Volume (HAR-RV-T model). This model included the logarithmic form of trading volume as an independent variable to the HAR-RV-T model.
The logarithmic form of the HAR-RV-T model:

HAR-RV-VIX
The second improved model, the Heterogeneous Autoregressive model of Realized Volatility and CBOE Volatility Index (HAR-RV-VIX model), is based on the HAR-RV model and CBOE Volatility Index. This model included the logarithmic form of the CBOE Volatility Index as an independent variable to the HAR-RV-VIX model.
The logarithmic form of the HAR-RV-VIX model:

HAR-RV-T&VIX
The second improving model, the Heterogeneous Autoregressive model of Realized Volatility, Trading Volume, and CBOE Volatility Index (HAR-RV-T&VIX model), is based on the HAR-RV model, gold's Trading Volume and CBOE Volatility Index. This model included the logarithmic form of trading volume and volatility index as an independent variable to the HAR-RV-T model. (8)

Regression analysis
In this section, we analyze the data obtained from the model. First, the descriptive statistics table of Table 1 allows us to understand the characteristics of the variables based on their mean, standard deviation, maximum and minimum values.
After that, Table 2 presents the parameter estimation results of HAR-RV-type models, followed by parameter estimation of the HAR-RV model and a discussion of the significance of the variables for volatility prediction.
Finally, Table 3 is the in-sample regression robustness test. To test whether the prediction results are still plausible under different samples, we perform in-sample regression on two subsamples and calculate the adjusted r-squared to test the model's prediction accuracy.

Statistic description
According to the descriptive statistical analysis of the following main variables (Table 1), the range of Daily RV is between 0.05 -20.72, with significant and sharp variations. Weekly RV and Monthly RV, on the other hand, have a lower fluctuation range, varying between 0.097 to 11.289 for Weekly RV and 0.213 to 6.278 for Monthly RV. We can also observe from the standard deviation that the dispersion of Daily RV is considerably higher than that of Weekly RV and Monthly RV. The fluctuation of Daily RV is significant.

Results analysis
The estimation results of the HAR-RV model show that the daily volatility does not have enough significance. Weekly and monthly are all significantly volatile. The weekly volatilities are all significantly positive. In the monthly period, 1-day volatility is negative, 1-week and 1-month are positive. The results reveal that the medium-term (weekly) and long-term (monthly) volatility of the gold futures market contain a large amount of forecasting information on the RV, but short-term (daily) volatility does not have enough significance in forecasting the gold future.
T represents the trading volume of gold. The estimation results of the HAR-RV-T model show that short-term (daily) volatility is not significant, but medium-term (weekly) and long-term (monthly) have a great impact on analyzing. In daily volatility, the only 1-day period has negative significance. Weekly volatilities are all significantly positive. In monthly volatility, 1-day has a negative impact, but 1-week and 1-month have a positive impact. The results are quite similar to the HAR-RV model. It reveals that the medium-term (weekly) and long-term (monthly) volatilities contain a large amount of forecasting information on the RV and T, but short-term (daily) volatility can only be forecast 1day. By analyzing ln vol, we conclude that only 1-week has positive significance.
We then focus on the comparison of the in-sample fitting capacity. We test the results by using Adjusted R-squares' methods. If the adj. R 2 is high, and the model performs better. From Table 2, we conclude that the HAR-RV-T model improves a little than the HAR-RV model in forecasting the price volatility of gold futures.  Table 3, the estimation results of HAR-RV-VIX reveals that short-term (daily) volatility does not have enough significance in forecasting the gold future. Weekly volatilities are all significantly positive. In long-term (monthly) volatility,1-week and 1-month have a positive impact, but 1-day does not have enough impact.
We also add trading volume to VIX, which is HAR-RV-VIX-T. The data of the VIX-T model show the same results as the VIX model. Short-term (daily) volatility does not have enough significance. Weekly volatilities are all significantly positive. In long-term (monthly) volatility,1week and 1-month have a positive impact, but 1-day does not have enough impact.
The effect of trading volume in HAR-RV-VIX&T is significantly positive from short-term to longterm. The effect of VIX in HAR-RV-VIX is also significantly positive. However, the effect of VIX in HAR-RV-VIX&T is only positively violated in the short-term and long-term but not significant in the medium-term.

Robustness test for in-sample regression
To examine whether the forecasting result can still be trusted in different samples, we divide the 742 samples into two subsamples. Subsample 1 contains samples from 1 to 371, and subsample 2 contains samples from 372 to 742. Then, we perform an in-sample regression on both subsamples and calculate the adjusted R-squares to test the model forecasting accuracy (Table 4).
In the Har-RV model: The adjusted R-squares of subsample 1 and subsample 2 are significantly different. In short-term volatility estimation, 0.736 is greater than 0.677, which implies sample 1 performs more robust than the original model, and 0.484 is smaller than 0.677, which indicates sample 2 is less robust than the original model. In the medium-term, 0.974 is greater than 0.960, and 0.932 is smaller than 0.960. The result shows sample 1 is more robust and sample 2 is lacking robustness. In long-term volatility estimation, 0.997 is greater than 0.994, and 0.993 is smaller than 0.994. The result means sample 1 is more and robust and sample 2 is less robust. Short-term to long-term estimations share the same consequence that sample 1 is more robust than sample 2.
In the HAR-RV-T model: The values of the three samples are almost the same. Therefore, it shares the same consequence that sample 1 performs better than sample 2.
In the HAR-RV-VIX model: Short-term, medium-term, and long-term all share the same consequence, which is the R^2 of sample 1 and sample 2 are smaller than the total sample. Comparing the separate two samples, sample 2 performs better than sample 1 in all three situations.
In HAR-RV-VIX&T: The result is the same as the previous model. Short-term, medium-term, and long-term all share the same consequence, which is the R^2 of sample 1 and sample 2 are smaller than the total sample. Comparing the separate two samples, sample 2 performs better than sample 1 in all three situations.
We can conclude that the first two models show that sample 1 is better than sample 2 because it is more robust. It also indicates sample 1 might perform better than the total sample in HAR-RV and HAR-RV-T. However, as time increases, the difference between the total sample and subsamples becomes significantly smaller, which means longer periods contribute more robust data. The results of HAR-RV-VIX and HAR-RV-VIX&T demonstrate total sample is more robust than sample 1 and sample 2. Sample 2 performs more stable than sample 1.

Conclusion
This paper mainly researches volatility forecasting in the gold futures market by using 5-minute high-frequency data. HAR-RV model is one of the most popular models for forecasting return volatility. We use this model to find the daily, weekly, and monthly volatility of gold futures. To be more accurate, we add trading volume, the HAR-RV-T model, to improve the models' forecasting accuracy.
The data indicates the daily volatility of gold is insignificant, but weekly and monthly volatilities are quite significant. Therefore, our research has a better forecasting accuracy in the medium and long term. Based on the HAR-RV model, there is not enough significance in the short term. Based on the HAR-RV-T model, there is not enough significance to 1-week and 1-month in the short-term period. Based on HAR-RV-VIX and HAR-RV-VIX&T models, short-term does not have enough significance. Medium-term is significantly positive, and long-term is significant except 1-day.
Although the research in this paper tries a new direction in forecasting gold futures, certain flaws still need to be addressed and some work that needs additional investigation. The findings of this paper demonstrate that the newly constructed HAR-RV-T model is more robust than the HAR-RV model. This does not prove that the HAR-RV-T model has the same advantage in forecasting other futures or stocks. The study's next step will be to use the new model to examine other factors that impact futures volatility, such as irrational investing behavior caused by unexpected calamities that alter investors' emotions.