The feasibility of arbitrage between TESLA and cryptocurrency

. As cryptocurrencies become the target of many investors, it is speculated that there may be a correlation between the trading prices of cryptocurrencies and other assets (e.g., TESLA and BITCOIN). On this basis, we try to build an arbitrage model among the TESLA, BITCOIN, and DOGECOIN to validate the feasibility by simulations using their trading data for 5 years. After conducting the Augmented Dickey-Fuller test, Co-integration test, etc., TESLA and BITCOIN are best correlated that co-integrated over a relatively long period. Within the range of co-integration, we construct the arbitrage model and design the transaction signals by setting a certain threshold. Subsequently, backtestings are carried out accordingly, where different spreads as trading thresholds lead to different results with large differences in returns. These results shed light on the decision on arbitrage investments for cryptocurrencies and other assets.


Introduction
With digital currencies being a very volatile underlying and TESLA being a well-known high volatility NASDAQ component [1], there is a certain connection between them that allows us to look for profit opportunities between them. TESLA CEO Elon Musk has made a number of moves regarding digital currency. First of all, in the fourth quarter of last year, Musk took out $1.5 billion from TESLA in the fixed expansion capital and bought in the BTC price 20,000 USDDT area. After the position was built, it was announced that TESLA could be purchased with BTC, stimulating the ratio to skyrocket. This year, during March-April, shipping 10% of the BTC price soared to 60,000 USDT area [2]. Musk bought DOGECOIN from April 1-14 this year with an average entry price of $0.05. When the price of DOGECOIN soared to $0.3 and $0.7 within 20 days, Musk quickly cashed out and left the market. Profits ranged from 6 to 14 times. BTC plummeted after Musk announced on May 12 that BTC mining was high energy consumption and affecting the environment [3].
Additionally, we find that TESLA and BTC prices are significantly correlated over time. In this case, we can tentatively interpret to mean that those who invest in TESLA and those who invest in Bitcoin are the same type of investors with the same investment preferences.
In general, two assets with a long-term relationship and an unstable short-term relationship are suitable for statistical arbitrage. Gatev et al. [4] used the distance between the prices of two assets to measure this correlation. Vidyamurthy [5] used the Pearson correlation coefficient to measure the absolute value of the distance between stocks, and the higher the absolute value of the distance, the better the cointegration pairing. Huck and Afawubo [6] obtained a stronger pairing convergence by using the cointegration relationship. In this paper, we observe a strong correlation between the stock price of TESLA and the price of cryptocurrencies (e.g., BTC and DOGECOIN) by means of Spearman's rank correlation coefficient from June 11st 2016 to June 11st 2021. Meanwhile, in the short run, TESLA stock price and cryptocurrencies exhibit high volatility from time to time. Thus, this paper attempts to construct a statistical arbitrage model between TESLA stock and cryptocurrencies through the cointegration method. In order to illustrate the feasibility of this arbitrage model, we perform an arbitrage backtesting on it.
After the empirical study, there is a cointegration relationship between TESLA and BTC prices during the sample period. Besides, there is no cointegration relationship between DOGECOIN and TESLA stock price. Using the variation of the spread between TESLA and BTC for arbitrage, the empirical results indicate that the returns of this arbitrage model are much higher than the market returns. Therefore, statistical arbitrage using TESLA and BTC prices is feasible.
The rest of the paper is organized as follows: Sec. 2 presents the data and the methods used in this research. In Sec. 3, it illuminates the results of the empirical analysis. Sec. 4 shows the further discussion of the work. In the last part, a brief conclusion is given eventually.

Data
Formally, TESLA, BTC, and DOGECOIN are chosen to investigate the last five years of trading prices because of the intrinsic connection between these assets. The open and close prices of TESLA trading on the NasdaqGS [7,8] as well BTC [9] and DOGECOIN [10] on CoinMarketCap are collected from Yahoo Finance from 2016/6/11 to 2021/6/13. The average price is more meaningful for the data analysis, which equals the average of daily open and close prices. Due to the differences in trading duration, the average price of the open and close price is used for analysis. After removing the dates that the three assets are not trading simultaneously, 1255 trading bars are obtained.

Correlation test
The Spearman's rank correlation coefficient [11] is a way to measure whether there is a correlation between two variables. It does not require the distribution of variables, i.e., it is a parameter-free test. Moreover, it can detect both the linear and nonlinear relationship between variables. In the case of no duplicate data, it can be considered as 1 or -1 when one variable is strictly monotonic with the other variable. Therefore, it is more widely applicable and more accurate in portraying the correlation of variables.
When calculating the Spearman rank correlation coefficient, if the data are not repeated, the original data { } and { } are sorted by size and labeled with the ordinal number { ′ } and { ′ }. Calculate the rank order corresponding to the ordinal number: Then the Spearman correlation coefficient is calculated as： If there are duplicates in the original data, an adjustment is made when finding the rank. For the duplicated data, the original rank is found first, and the adjusted rank is considered as the average of its ranks. Afterward, one should check the critical value table of the rank correlation coefficient test. If the correlation coefficient is greater than the critical value at the X% confidence level, then the two variables are related. Otherwise, it is considered uncorrelated.

Co-integration theory
The basis of pairwise trading is that the variables need to satisfy the cointegration relationship [12]. The existence of a cointegration relationship indicates a stable relationship between the variables in the long run. Even if there are short-term deviations, such deviations will be corrected towards the long-term relationship. Therefore, when deviations accumulate to a certain level, investors can go short overvalued assets and go long undervalued assets. As the deviation gradually returns to the mean, the opposite action is taken to close the position. This process generates a certain amount of arbitrage.

Single integrability test
For the series { }, unit root test is performed. If { } is stationary, it has no unit root, and is called zero-order single integer series. If it is smooth after d differentials, it has d unit roots and is called a single integer series of order d and, denoted as ~( ). In this paper, the Augmented Dickey-Fuller test (ADF) method is chosen for the unit root test.

Co-integration test
If the series is smooth, a regression can be performed directly. For non-stationary series, direct regression may appear as a pseudo-regression. Therefore, a cointegration test is required for nonstationary series. Although the EG two-step method is mostly used for the cointegration test of two series in some articles, the Johanson test is chosen in this paper because it yields more stable results. It should be noted that the Johanson test can only test the existence of several cointegration relationships while does not determine the specific form of regression.

Vector error correction model
Variables are not always in equilibrium. The variables observed on a daily frequency are often in disequilibrium. Therefore, short-term fluctuations of variables are also considered. Let = ( 1 , 2 , … , ) ′ contains k time series with t=1,2, …, T and ~( ), i.e., every~(1), i=1~k. If is not affected by the d−dimensional exogenous time series = ( 1 , 2 , … , ) ′ , a vector autoregressive model can be built as follows： Transforming the above equation, we obtain Here, If there is a cointegration relationship for , then Π −1~( 0). At this point, one derives Where −1 = −1 , which is the error correction term. is a non-zero vector and can make −1~( 0). Hence, the above equation can be written as： Unit root tests, co-integration tests, and error correction models were carried out in Eviews software [13].

Arbitrage strategy
The theoretical basis of statistical arbitrage is mean reversion. Thus, the arbitrage operation starts when the spread deviates from the long-term equilibrium relationship. Besides, when the spread returns to its mean value, the operation is reversed to end the arbitrage process. According to this thought, we backtest arbitrage results with the historical data to confirm the feasibility of arbitrage between the stock market and cryptocurrency market.

Correlation analysis
Python Seaborn [14] is used to calculate and illustrate the Spearman rank correlation coefficients of TESLA, BTC, and DOGECOIN price from 2016/6/11 to 2021/6/13 and 2019/6/11 to 2021/6/13, respectively. As shown in Fig.1, The correlation coefficient between TESLA and BTC is significantly higher than the correlation coefficient between TESLA and DOGECOIN in the five years and the two years. The correlation coefficient between TESLA and BTC regression is 0.64 over the five years, which is higher than that between TESLA and DOGECOIN (0.49). The regression results for the two years of data denote that the correlation coefficient between TESLA and BTC regression is 0.92, which is higher than that with DOGECOIN (0.87). In terms of time, the correlation coefficients between TESLA and BTC and TESLA and DOGECOIN are significantly lower over the five years than over the two years.

Unit root test
The ADF unit root test is performed on TESLA, BTC price and DOGECOIN price, and the corresponding first-order difference series over the past five years. According to the results listed in Table 1, the t-statistic for DOGECOIN over the past five years is -3.897853, which is less than the critical value at the 5% confidence level. Therefore, the original hypothesis is rejected, so the price of DOGECOIN is a smooth time series over the past five years. The t-statistics for TESLA stock price and BTC price are 0.812030 and 0.329011, both of which accept the original hypothesis. The ADF test for the first-order difference series of TESLA and BTC prices is found to be significant after the first-order difference test. It indicates that both BTC and TESLA prices are first-order single integer series. However, since the DOGECOIN and TESLA prices are not single integers of the same order over the past five years, the regression cannot be performed directly. A test of the DOGECOIN price over the past 10 months is found to become a first-order single-integer series. Note: DOGE * t is a sequence of DOGECOIN prices for the past 10 months, 2020/8/11 to 2021/6/13. ΔTESLAt is the first difference series of TESLA. ΔBTCt, ΔDOGEt, and ΔDOGE * t are the same.

Co-integration test
The Johanson cointegration test for the prices of TESLA, BTC, and DOGECOIN is shown in Table  2. From Table 2, when the original hypothesis is "there are more than 0 cointegration relationships", the trace test statistic of BTC and TESLA is 27.40298, which is significant at the 5% level. When the original hypothesis is "there is more than 1 cointegration relationship", the trace test statistic for BTC and TESLA is 0.620911, and the original hypothesis is accepted. Therefore, there is a cointegration relationship between the price of BTC and TESLA. Since the DOGECOIN price only exhibits nonstationary characteristics from around August 2020, the cointegration test is performed using data from the past 10 months. According to the results, when the original hypothesis is "there are more than 0 cointegration relationships", the trace test statistic of DOGECOIN and TESLA price is 8.291053, i.e., the original hypothesis is accepted. Therefore, there is no cointegration relationship between DOGECOIN and TESLA price in the past 10 months.

FIBA 2022
Volume 26 (2022) 799 A least-squares regression model (Table.3) is developed for BTC and TESLA prices. According to Table 3, the coefficient of TESLA is summarized. TESLA stock price has a significant positive effect on bitcoin price. Therefore, the regression model of BTC and TESLA price is established as shown in the following: = 52.08138 + 1887.535 (8) Note: None means the variables have no cointegration relationship. At most 1 means the number of cointegration relationships is not more than 1. The * besides means refuse this original hypothesis. The sample interval of BTC&TESLA is from 2016/6/11 to 2021/6/13, and that of BTC&TESLA is from 2020/8/11 to 2021/6/13.

Error correction model
Since there is no long-run equilibrium relationship between DOGECOIN and TESLA only vector error correction models are considered for BTC and TESLA prices. The results of the vector error correction model are shown in Table 4. The coefficient of the error correction term is negative, which indicates that there is a mean reversion effect. Additionally, the t-statistic of the error correction term is -4.79315, which indicates that the error correction term is significant. However, the error correction term coefficient is -0.011938, which indicates that the mean regression is slow and the correction effect is weak.
According to the results of Table 4, an error correction model can be established as follows：  Note: The data in garden brackets are standard errors of the coefficients. The data in square brackets are the t-statistics.

Arbitrage backtesting
We set three sigma factors of 0.5, 1, and 1.5 as the threshold of our strategy. Firstly, we obtain the spread by subtracting Price B from Price A. Then, we acquire the difference between the spread and the mean of all spreads for each day (marker as spread), and the standard deviation of spread as our unit sigma. Subsequently, we divide spread by sigma to obtain our z-score. When the z-score is greater than the coefficient of the sigma we are considering, the z-score is set to 1. When the z-score is less than the opposite of the coefficient of the sigma we are considering, the z-score is set to -1, and the rest of the z-score values are set to 0. In this case, our z-score sequence becomes the desired signal to buy BTC when z-score=1, buy TESLA when z-score=-1, and close the position when zscore changes from 1 or -1 to 0.
We use python to carry out the backtesting. Primarily, the initial amount of the account is set to ten million dollars. In the backtest loop, the z-scores of each day and the previous day are checked. If we find that the z-score of the previous day is 0 and the z-score of the day is -1 or 1, the number of shares to buy the corresponding asset will be calculated. In detail, we divide the cash balance floor of the previous day by calculating the remaining cash after the purchase. If we find that the previous day's z-score is -1 or 1, and the day's z-score is 0, the corresponding positions will be closed accordingly. In this case, we will balance the assets and cash as well as tune the position to 0 subsequently.
In each cycle, the account list is appended, and the appended content is the cash amount plus the number of all assets held multiplied by the corresponding price on that day, which is the total amount of assets in the account on that day. Afterward, the log calculation is performed on the account list to obtain the daily log return. The average return per year in the table is obtained by subtracting the initial capital from the total assets on the last day (2021.6.11) and dividing it by the product of the initial capital and the period. The maximum return is obtained by calling the max function to search for the maximum value in the account list, subtracting the maximum value from the initial capital and dividing it by the initial capital; the average return per day is obtained by calling the mean function to log-return list; the volatility is calculated by calling the std. function on the log-return list; and the average number of trades and the average position time are obtained by setting the count variable in the loop by condition.
We find that when open and close thresholds are lowered, there is better arbitrage performance. The reason is that lower thresholds mean more trades for the same time interval of data. Additionally, since the success rate of our paired trades is larger, more trades also mean higher returns. Besides, a higher number of trades also means higher volatility, which means that the risk of our strategy is correspondingly higher. However, by the results, it seems that our risk boost is small relative to the reward boost, meaning that we could have chosen strategies with lower signal thresholds for trading TESLA and BTC to achieve higher returns. Most of our profits as well as most of our maximum withdrawal, occurred in late 2020 as well as early 2021. Moreover, one notes that the BTC price had two large rises from November 2020 to March 2021, while the TESLA stock price also rising more sharply in late 2020 but did not rise as fast as the BTC rise and in 2021 TESLA's share price has another round of declines in early 2021. Such a move made the pairing strategy's trade signals occur many times during this period, generating large drawdowns but ultimately yielding significant profits.

Discussion
After performing the regression, it is found that the regression results are not satisfactory for DOGECOIN. The reason we did not choose DOGECOIN and BTC and TESLA and BTC for arbitrage is because their results are not highly correlated. Consequently, it is not satisfied the requirement of cointegration over a period that is conducive to trading. Therefore, we chose the combination with the best detection correlation results, i.e., TESLA and BTC, to perform the arbitrage test and confirmed that the regression results between TESLA and BTC were more in line with our expectations. Within that period, the trade can be arbitrage by following the method.
In terms of the choice of trading threshold, in much of the previous literature, the trading signal for paired trades is set to be n times the standard deviation of the regression residual series of the price series of the traded object, often defaulting to n=1. However, as we assume to construct a trading model, the goodness of the return results depends heavily on the trading signal. Clearly, setting n=1 is too rigid. Arbitrage is indeed possible when n=1, but it does not maximize the annualized return, and it is better to determine different sizes of n depending on the industry and year. For this reason, we tried n=0.5, 1.0, and 1.5, and the results show that the best return is achieved at n=0.5, which is the higher return. However, the frequency of trading is also the highest. The next best case is the n=1 case, while the worst is the n=1.5 case.
We can arbitrage, or covary the price data over the selected period, because they are intrinsically correlated in terms of price movements, which is in line with our initial hypothesis. Musk (TESLA) has a huge impact on the price of BTC or DOGECOIN every time he announces it. Besides, his news has an impact on the psychological expectations and holding attitudes of a portion of cryptocurrency holders and some potential traders about the price of cryptocurrencies, which is ultimately reflected in price changes.
It should be noted that, according to the theory in information economics, in the market, the news is already more fully reflected in the respective price changes and at a faster pace. We have tried to postpone the data of one of the trading objects for t days to run the regression and found that the best regression results are for the case t=0. In other words, the delay of information on several effects is within 1 day, but the market is not fully efficient (t∈[-3,3], t∈Z).

Conclusion
In summary, an arbitrage model has been successfully constructed based on the cointegration relationship between TESLA and BITCOIN in this paper. Besides, the differences are discussed for the case of different trading thresholds chosen on an arbitrage basis, where the optimal one for the previous case for the spread was 1sigma. The arbitrage model chosen in this paper requires that the trading objects are cointegrated over time. Based on this premise, we perform ADF and cointegration tests on two of the potential trading underlying assets previously predicted. After combining the regression results, we select TESLA and BITCOIN as the final trading objects. After searching the relevant literature, we found that the criteria for conducting the trade have a huge impact on the final return. According to backtesting results, a standard of 0.5 sigmas is better than 1 sigma for both 2year data and 5-year data, which has a significant improvement in the return as well as the increase in trading frequency. On the one hand, our study verifies that there is a strong connection between cryptocurrencies and TESLA. On the other hand, although DOGECOIN is not selected as the final trading object, TESLA also appears to be strongly correlated with GOGECOIN at some key points in time. These results offer a guideline for future understanding of cryptocurrency pricing methods, price movements, and investors' investment choices.