Development and application of investment prediction model based on gold and bitcoin

. How to predict the change trend of asset prices in the future and decide different operation modes in advance to obtain the maximum benefits is the concern of investors. Taking gold and bitcoin as examples, this paper develops an appropriate mathematical model that uses only the past daily price stream to help traders determine whether to buy, hold or sell assets in their portfolio every day. At the same time, the robustness of the model is analyzed by robustness. The study found that holding US $1000 on September 11, 2016 will eventually maximize profits on September 10, 2021.


Introduction
Nowadays, more and more people participate in the trading and trading of funds, stocks, bonds and assets. The two most important trading assets are gold and bitcoin [1]. Generally speaking, successful market traders often buy and sell volatile assets with the goal of maximizing total return. The ultimate way to obtain the maximum benefit is to determine different operation modes in advance by predicting the change trend of asset prices in the future.
The task of this paper is to develop an appropriate mathematical model that uses only the past daily price flow to help traders determine whether to buy, hold or sell assets in their portfolio every day. On September 11, 2016, he held us $1000 and finally maximized profits on September 10, 2021 [2]. See Figure 1 for the definition of symbols in this paper.

Model establishment
A decision tree is a tree structure in which each internal node represents a test on an attribute, each branch represents a test output, and each leaf node represents a category. Common decision tree algorithms are C4.5, ID3 and CART (Figure 2). Due to the randomness of random forest selection of training samples and features, and the characteristics of classifying test samples many times [5]. Applying this algorithm for classification can better solve the problem of weak generalization ability of a single decision tree, and make up for decision-making to a certain extent. Trees are prone to overfitting defects. 1. Feed data into this random forest model every two days starting on day three. Set random_state to the number 5. Through the method of bootstrap, the data of the first two days are taken as samples in sequence according to the time series.

Model
2. Based on the stock price, we build a CART decision tree for it. 3. Repeat the above two steps. Each iteration will generate a decision tree. 4. Once a new stock price enters the random forest every day, after the decision of each tree, the best predicted price is obtained by voting as the basis for subsequent asset allocation.

Data integration and implementation of the model
Because the trading hours of gold and bitcoin are different, gold will be closed on holidays, so we use python's pandas merge to connect the prices of the two assets through time ( Figure 4,5). The sliding window transformation of time series data is used to convert time series data into regression data. Simply put, it is to change a single series of data into X->Y regression data (Table 1,2).

Data integration and implementation of the model
Genetic algorithm (GA) is a computational model that simulates the biological evolution process of natural selection and genetic mechanism in Darwin's biological evolution theory ( Figure 6,7). It is a method to find the optimal solution by simulating the natural evolution process [3,4].  value of the fee, we will choose to sell both products. If one of the two products rises and the rise exceeds the handling fee, then we will choose to trade.
Then we will go through the four parts of the genetic algorithm. Python's genetic algorithm package handles our constraints. We utilize the constraint method of penalty method. Among them, f(BTVT), f(GBTVT) become our objective function. Under certain conditions, the solution of the evaluation function is equivalent to the solution in the original problem. The construction of fitness evaluation function usually adopts two ways of addition and multiplication. The additive approach simply adds an additional penalty term to the original objective function. That is, the evaluation function is: Here x represents the chromosome. f(x) (>0) is the objective function. P(x) is the penalty term, and then the crossover operation.
By searching through data and assays, we determined the mutation value to be 0.001. After the method is determined, we initialize the population. Then we select a large enough population. Determine the number and length of chromosomes used. We determine the number of generations. Put the predicted prices of two assets each day into the model to get a daily customized trading strategy. Through these constraints, we found a more reasonable genetic parameter (the largest genetic subband is located at 500, the population size is set at 100, and the mutation probability is 0.001). Although the more genetic subbands and the population size, the better, but due to computer limitations, we chose a relatively suitable number.

Data integration and implementation of the model
In the process of selecting our mathematical model, we introduced several statistical formulas to measure the accuracy of the model for ease of understanding.
The closer SSE (Sum of Squares due to Error) is to 0, the better model selection and fitting, and the more successful data prediction.

The model catching for the most profit according to the problem
Random forest can use Out of Bag Error (OOB error) to measure the performance of random forest. In an ideal case, about 36.8 % of the total training data forms the OOB sample. This can be shown as follows.
If there are N rows in the training data set. Then, the probability of not picking a row in a random draw is: Using sampling-with-replacement the probability of not picking N rows in random draws is: Which in the limit of large N becomes equal to: Therefore, about 36.8 % of total training data are available as OOB sample for each DT and hence it can be used for evaluating or validating the random forest model.
The effectiveness of OOB error in measuring the performance of random forests has been demonstrated. Of course, the training set and the test set can also be divided, which can measure the difference in performance between different random forests, because the test set is guaranteed to be the same. Compared with time series analysis models or autoregressive models, the model we use is more reasonable and accurate. Therefore, we can use out-of-bag samples to predict the future price very early, and generally there will be no large deviation.
Therefore, it is a continuous learning process, rather than a training set that exists all the time like a time series. In other words, traders can invest earlier based on forecasts and get more returns.

Robustness and Sensitivity analysis [7-9]
Cart algorithm in random forest model has pruning loss function [10]: α is the regularization parameter (same as the regularization of linear regression), C(Tt) is the prediction error of the training data, and |Tt| is the number of leaf nodes in the subtree T. When α = 0, that is, there is no regularization, the original generated CART tree is the optimal subtree. For a fixed α, there must exist a unique subtree that minimizes the loss function Cα(Tt). Use the control variable method, use 1% as the gradient to change, and take the original gold and Bitcoin transaction fees αgold = 1% and αbitcoin = 2% The final value of $276,239 is used as the standard value. By changing the values of αgold and αbitcoin, the final value under different transaction costs is calculated through the asset return forecast distribution model, and the difference between it and the standard value of $276,239 is divided by the standard value to obtain the sensitivity under the transaction cost (Table 3).
Based on the sensitivity values obtained above, produced a specific range map. According to this map, we can see that in the relatively small changes in transaction costs, the final rate of return of our quantitative investment strategy does not appear relatively large. changes, has a good anti-risk ability.

Figure 8. Sensitivity analysis
As can be seen from the above Figure 8, in the relatively small changes in transaction costs, the final rate of return of our quantitative investment strategy has not changed significantly, and it has a good ability to resist risks.
Based on the analysis of the above decision indicators, robustness and sensitivity, our quantitative trading decision model has good economic benefits under the premise of anti-interference and antirisk, and it is recommended to be adopted by traders.