Prediction of Rural Tourist number in Guangdong Province Based on Prophet-LSTM in the Context of Blockchain

Guangdong, as a province with large population and economy, has huge tourists market demand. The paper predicts people's preference for the development of "blockchain + rural tourists" under blockchain technology, and then applies the characteristics of blockchain itself to rural tourists in a more targeted way. The paper establishes a vector autoregressive VAR model and a ProphetLSTM-based prediction model for the number of tourists based on the data of tourist reception in Guangdong Province from April 2000 to November 2020. The VAR model is used to empirically prove the impact of blockchain background on the number of rural tourists; a combined ProphetLSTM-based prediction model is established to forecast the number of rural tourists in the next two years. The results show that blockchain technology has influenced the number of rural tourists in Guangdong Province to a certain extent; n addition, in the market in the next two years, the number of travelers will show a fluctuating upward trend, for which the demand will continue to expand, and blockchain + rural tourists has market potential.


Introduction
The implementation of the "Guangdong Provincial Tourists Bureau on the implementation plan of the three-year campaign for precise poverty alleviation and precise poverty eradication in the tourists industry of the province" has been implemented, indicating that rural tourists in Guangdong Province has been optimized with the multidimensional layout of rural attractions under the strong support of government policies. the first blockchain industry support policy was introduced in Guangdong Province in December 2017, and Guangzhou, Shenzhen, Foshan, and Zhuhai have successively formulated the development of the industry for the integration of blockchain technology goals . In addition, Guangdong Province has responded positively to the "14th Five-Year National Informatization Plan" issued by the State Council in December 2021, combining blockchain technology with economic construction and social governance, and blockchain technology has become more and more widely applied in Guangdong Province. Most academic researchers also qualitatively show that the targeted integration of blockchain technology into the rural tourists sector will have a positive effect on solving the difficulties in the development of rural tourists and improving the economic benefits and service efficiency of rural tourists. However, the relative backwardness of domestic tourists data statistics has led to a small amount of data. The conclusion lacks quantitative analysis, and future research on the application of blockchain in rural tourists should pay more attention to quantitative analysis .
Therefore, it is of great importance and practical significance to demonstrate the number of tourists under the influence of blockchain and establish a model based on a small amount of historical data that can accurately and effectively predict the number of rural tourism tourists in Guangdong Province. It helps to comprehensively and scientifically measure the impact of blockchain technology on rural tourism development, and provides a solid empirical basis for formulating appropriate response measures. In this paper, based on the empirical evidence of the impact of blockchain on the number of rural tourists tourists, a Prophet-LSTM-based passenger number prediction model is established, and Guangdong Province is selected as a research case to model and predict the number of rural tourists tourists in Guangdong Province for the next two years from April 2000 to November 2020.

Literature Review
The comprehensive competitiveness of Guangdong's regional tourists industry has improved significantly, with the province receiving more than 500 million overnight visitors, total tourists revenue reaching 160 million yuan, and the added value of tourists accounting for 7% of GDP, and rural tourists is considered a key development focus in Guangdong. For the prediction model of the number of tourists is mainly based on traditional time series models and machine learning algorithm models, most scholars have found that the prediction accuracy has been improved by using combined models compared to traditional single models.
In 2016, Li et al. proposed a grid search index method combined with ARMA method forecasting to predict tourists demand in the Altai region, and the prediction accuracy was improved compared with the traditional time series prediction model; in 2018, Liang et al. established an autoregressive distributed lag model based on search index to predict tourists passenger number demand, and the empirical results proved that the two have a positive correlation and the search index can improve the accuracy of tourists forecasting; in 2019, Li established two models, BP network and ARIMA, to forecast and analyze the number of tourists in Beijing, Shanghai and Shenzhen, and compared the two models and found that for the sudden increase in the value of the "outliers" will affect the simulation of the BP network model; In 2020, Hu et al. compared the prediction of inbound tourists numbers in Beijing by building ARIMA and nonlinear autoregressive NARX neural network models, and the study proved that NARX neural network prediction is better than ARIMA model.
The research results of the above scholars have improved an effective method for predicting tourists demand and provided important ideas and directions for the research of this paper. Since the research of this paper is based on the trend prediction of the number of tourists based on the background of blockchain, it is necessary to carry out the empirical evidence and then use the combined model to make more accurate prediction, so as to provide the theoretical basis for the decision planning of relevant departments and enterprises.

VAR-based blockchain impact on the number of travelers model
The vector autoregressive model (VAR model) is the associative form of the autoregressive model. If there is a link between N variables, it is impossible to capture the relationship between N variables if N autoregressive models are built separately. However, the relationship between N variables can be established by using the associative form. The structure of the VAR model is related to two parameters, namely the number of variables contained N and the maximum lag order k. Thus the VAR model containing N variables lagged by k periods is expressed as

Prophet-based passenger number forecasting model
Prophet model is a decomposable time series forecasting model, and the model principle and the overall construction of multiplicative model formula are shown in (2).
where ( )represents the growth trend, denoting the growth function, used to fit the non-periodic variation of the predicted values in the time series; ( ) represents the seasonal trend, denoting the periodic variation, such as monthly, annual season, etc.; ℎ( )represents the effect of holidays on the predicted values, denoting the effect on the predicted values caused by those potential holidays with non-fixed periodicity in the time series; is the noise term, which represents the fluctuations not predicted by the model, assuming that conforms to the Gaussian distribution.

Time series prediction model growth trend
A linear growth model is used in the ( ) growth function, and since the overall trend from the data can be observed to show linear growth, a linear model is used, and the model equation is shown in (3).
( ) = ( + ( ) ) + ( + ( ) ) (3) where denotes the growth rate, is the offset, and ( ) is the sample vector. Firstly, the model defines the points corresponding to when the growth rate changes, and all these points have corresponding slope adjustment values, and all the slope adjustment values form a vector then the growth rate at time t can be expressed as + ( ) ; when the growth rate is adjusted, the offset b corresponding to each point should also be adjusted accordingly to connect the last time point of each segment, expressed as ( )

Time series forecasting model seasonal trends
Since it is possible for a time series to contain seasonal trends of multiple cycle types, the Fourier series can be used to express this cycle property approximately, and its model equation is shown in (4).
where P denotes some fixed period (e.g., P is 365.25 for annual data and P is 30 for monthly data in statistics using days as units). Fitting seasonality requires 2N parameters, with 2N denoting the number of such periods desired to be used in the model. Larger values of N allow for more complex seasonality functions to be fitted, however, they also introduce more overfitting problems. Following empirical values, N is taken as 10 for the annual cycle and 3 for the weekly cycle.

Time series forecasting model holiday trends
Holidays or some big events can have a big impact on the time series, and these time points are often not cyclical. Due to the differences in the date and degree of impact of each holiday, the holiday model treats the impact of different holidays at different points in time as separate models. A time window is also set for each model, which mainly takes into account that there are window periods for the impact of holidays (e.g., a few days before and a few days after the National Day), and the model sets the impact in the same window period to the same value. If i denotes a holiday and D_i denotes the time included in the window period, then the holiday model h(t)can be expressed in the formula shown in (5).
In this paper, the impact of New Year's Day (January 1), May Day (May 1) and National Day (October 1) of each year are defined as holidays.

LSTM based passenger number prediction model
The LSTM model is improved based on the recurrent neural network RNN model, which consists of a main line and three "gates", i.e., forgetting gate, input gate and output gate, and its structure is shown in Figure 1.

Figure 1. LSTM Framework
In the temporal data prediction of the number of passengers, is the forgetting gate control function, which determines how much long-term memory −1 is "forgotten" based on the output stateℎ −1 of the previous period and the input at the current moment, i.e., the forgetting function is is the weight matrix corresponding to each layer of the forgetting gate; [ℎ −1 , ] is the long vector; and is the deviation vector of the forgetting gate. is the input gate control function, which avoids some less relevant data from entering the memory cell by changing the weight matrix data and thus the selection of new headcount data. Thus ′ in Figure 1 is the value transformed by the tanh activation function, which represents the information received in the current period, i.e.
is the weight matrix corresponding to each layer; is the deviation vector. Since it is less efficient to accept all the information at once, selective acceptance of information is required, so the selection ratio function is introduced i.e.
is the weight matrix corresponding to each layer of the input gate; is the deviation vector of the input gate.
is the selection, transformation and storage of the current period passenger number data vector, is the last period of stored data and the current period of new is data are processed with in the forgetting gate and the input gate respectively thereby achieving memory storage of a longer amount of information, namely.
= ⊙ −1 + ⊙ ′ (9) is the output gate control function, i.e., the output is selected after the memory update, so the selection ratio function is is the weight matrix corresponding to each layer of the output gate; is the deviation vector of the output gate. The memory output after selection, i.e., the selection ratio multiplied by the memory in in the main line, is

Prophet-LSTM Combined Model Construction
The Prophet model is based on statistical innovation, so it is more interpretable and has the prominent advantage of decomposing trend, seasonal and holiday factors; while the LSTM is entirely a machine learning model, which is less interpretable, relies entirely on data features and is less capable of identifying abrupt variances. Since different models have different strengths and weaknesses, OLS integration of Prophet and LSTM models is performed for better fitting and prediction. Suppose the prediction value of Prophet model at moment t is � ; the prediction value of LSTM model at moment t is � , = 1,2, ⋯ . The weights and are obtained by OLS least squares for model combination fitting, so the integrated Prophet-LSTM combined traveler number prediction model is

Data Collection and Analysis
The data for this study are obtained from the data of overnight visitors received by tourists accommodation units in the province in Guangdong Statistical Information Network, mainly including the number of overnight visitors per month from April 2000 to November 2020, a total of 248 data. The data are divided into training set, validation set, and then two years from December 2020 to December 2022 are added as test set data. The training set is a rolling training set, and the initial training set is a total of 148 monthly data from April 2000-July 2012, and the validation set is the remaining 100 monthly data. The first training model will predict the number of people in August 2012, and then predict the number of people in September 2012 when the first prediction of the real data in August 2012 will be added to the training set. At this time, the training set is 149 monthly BCP Business & Management

EDMI 2022
Volume 21 (2022) 99 data, and so on to update the training set, so that the model will be trained again in each update of the training set, and the prediction will be rolled over in a continuous cycle, because the test set is unknown data. Therefore, the training set is updated to 248 data items, and the data added to the training set is the predicted value of the previous month, replacing the real value that should have been added.
The data used for the number of blockchain-influenced tourists based on the VAR model are the number of tourists per month from January 2017 to November 2020 after the rise of the blockchain concept, the closing price data of the blockchain 50 index in the National Political Index Network and the Baidu search index data. At this stage, there is no data directly reflecting the impact of blockchain on rural tourists, therefore, the research is conducted by finding similar blockchain impact data, i.e., the blockchain 50 index reflects the rise of the whole blockchain market, screening the keywords "rural tourists" and "blockchain tourists" of Baidu in Guangdong Province. The search index of "blockchain rural tourists" reflects people's concern about blockchain rural tourists. The average number of travelers in this time period is 36412930, the average price of blockchain 50 index is 3189.097 yuan, and the average number of searches is 1364.152 times. Figure 4 shows the overnight tourist flow of tourists received in Guangdong Province from April 2000 to November 2020 downloaded from the Guangdong Provincial Bureau of Statistics, from which it can be observed that at certain points in time the number of tourists plummeted due to certain reasons, but the overall trend is an upward trend, influenced by the global epidemic in 2020, the number of tourists dropped more in 2020, and the average number of tourists was 19,483,340 The average number of tourists in 2020 is 19483340. Since the number of passengers, blockchain 50 index data and Baidu search index are not the same, the line graph is drawn by standardization, and it can be intuitively seen from the graph that the three have a certain connection at a certain moment. To further investigate whether the two columns of index data that approximate the blockchain data have an impact on the passenger data, a VAR model is constructed for empirical evidence.

ADF test
The VAR model requires each time series data to be stationary data, and if the data is not stationary, it can be differenced until the data is stationary, and the VAR model is established for the stationary data after differencing, so the ADF unit root test is conducted for the three columns of data, and the results are shown in Table 1. Note: People is the monthly number of travelers, Close_price is the monthly average closing price of Blockchain 50 Index, and Search is the monthly Baidu search index number index; the three variables followed by _diff are the differenced data.
Under the confidence level of 1%, the data in the three columns are non-stationary, so the data are tested again after first-order differencing, and the test results pass the ADF test, so the data are stationary after first-order differencing.

VAR model pricing and estimation
The optimal lag order of the VAR model was determined according to the AIC and BIC criteria, and the results were obtained as shown in Table 2. From the AIC and BIC results, the optimal lag order of the model can be derived as order 1, and the parameters of the model are estimated, and the parameter estimation results are shown in Table 3. The model estimation results in Table 3 are one of the VAR models, and the estimation results of the number of passengers in the 3 model results are selected because only the data of the number of passengers studied are affected by the blockchain factor or not. And the parameters of the model are tested for stability and the CUSUM test is performed, and the test result p-value is 0.26995, which means that the original hypothesis is not rejected and the test is passed and the coefficients are stationary. So from the estimation results, it can be seen that the P-values derived from the data in the 3 columns all pass the test. The coefficients of Blockchain 50 Index and Baidu Search Index are both positive, so it can be concluded that the number of travelers in Guangdong Province will be influenced by the blockchain factor.

Impulse response analysis
The impulse response function can visually reflect the impact relationship between variables, so the impulse response diagram of blockchain factors affects the number of passengers, as shown in Figure 6.  Figure 6, it can be seen that the impulse function converges after 8 periods, while before 8 periods the impulse response function is alternating positive and negative. This leads to the conclusion that the number of travelers itself and the proximate block chain factors will have a certain impact influence on the number of rural tourists in Guangdong Province. Based on this result, the number of travelers data is predicted to explore the future development trend of tourists in Guangdong under the influence of blockchain.

Prophet-LSTM based passenger number prediction model
The time series data includes a large amount of uncertainty, and the error derived from training and prediction using a single model alone is high, and the prediction results are not ideal. Therefore, in order to improve the prediction accuracy of the model for time series data, the Prophet model and the LSTM model are integrated, and the combined coefficients of the two models are solved by applying OLS, so as to give full play to the advantages of the combined integrated model, and finally the regression model evaluation indexes RMSE, MAE and MAPE are selected to judge the advantages and disadvantages.

EDMI 2022
Volume 21 (2022) tourist overnight stays in Guangdong Province, i.e. the number of tourists. The data are from April 2000 to November 2022, where the initial data amount of the training set is 148 and the initial data set of the validation set is 100. Throughout the process the training set will keep adding the data that have been predicted in the validation set, and the predicted results are shown in Figure 7, from which it can be concluded that the fitting effect is better.
Based on the LSTM model for rolling forecast of the number of passengers, the fitting effect is shown in Figure 8.

Figure 8. LSTM Passenger Number Prediction
The OLS method is used to integrate the Prophet and LSTM models, and the coefficients and of the combined model are solved to obtain the combined forecasting model as � = 1.124 � − 0.2357 � + 2153000 (14) That is, the fitting effect is shown in Figure 9.

Figure 9. Prophet+LSTM Passenger Number Prediction
To better evaluate the advantages and disadvantages of the three models, the models were evaluated by RMSE, MAE and MAPE, and the evaluation data were normalized and presented because the base was too large, and their evaluation results are shown in Table 4.  Table 4, it is observed that both evaluation indicators of the combined Prophet+ LSTM model are smaller than those of the single model, so the combined Prophet+ LSTM model is selected from it for forecasting. The prediction interval is December 2020-December 2022, and the prediction results are shown in Figure 9.  Figure 10, it can be concluded that the number of travelers in the next two years is an upward trend, so it can indicate that the blockchain technology has some potential to drive upward in the rural tourist's market in the future.

Research Conclusion
The vector autoregressive VAR model empirically proves that the number of rural tourists travelers in Guangdong Province is to some extent approximated by the influence of blockchain technology at this stage.
The combined Prophet+LSTM time series prediction model can conclude that under the influence of blockchain, the number of overnight stays of tourists received in Guangdong Province fluctuates in the next two years, but shows an overall upward trend.
From the prediction results, it is concluded that people are willing to choose the preference of the new rural tourists model under the support of blockchain technology, and the market capacity as well as the demand will continue to expand, and the future market prospect is large.

Suggestions
Through the above analysis, we can judge the market volume of "Blockchain + Rural Tourists" in the next two years, and the following are targeted suggestions according to different subjects involved.
In view of the difficulties in the development of rural tourists in Guangdong Province, through the construction of hardware and software facilities, they should make more efforts to formulate the policy guidance for the development mode of rural tourists after adding blockchain technology, and draw a blueprint for the future development of rural tourists in Guangdong Province, which is in line with the public opinion and the hearts of the people, and will help to realize the expected index of rural tourists in Guangdong Province as soon as possible.
In view of the existing development difficulties of rural tourists, blockchain research technology is precisely applied to bring into play the benefits of "blockchain + rural tourists" 1+1>2, fully grasp the market demand opportunities, and thus promote the reform and upgrading of enterprises in the field of rural tourists in Guangdong Province.
(3) Rural tourists attractions and travel agencies.
Combining the unique characteristics of rural tourists and culture in Guangdong Province, we can adapt rural tourists to technological resources, help upgrade the industry, strengthen the integration BCP Business & Management

EDMI 2022
Volume 21 (2022) 104 with the government and enterprises, and meet people's aspirations for a better life while improving revenue.