Stock Price Prediction of Apple Based on SVM and KNN

. Stock price prediction using machine learning has always been an enigmatic program for many researchers. Different models suit different circumstances, even though they are all after the same result. In addition, stock prices appear to be no upper or lower bound on how values can rise or fall. Therefore, studying stock price prediction can be very important to shareholders or investors. However, one of the research significances of this essay is about understanding more about machine learning in python while knowing how to use this knowledge of machine learning flexibly for a specific circumstance. Also, successfully using all the data from seven days ago to predict the rise and fall of the eighth day to help eliminate the hesitation of purchasing is another significant meaning for this study. For this research, logistic regression, KNN (K-Nearest Neighbors), and SVM (support vector machine) are selected to compare their advantages and disadvantages in this project. Three models are individually evaluated in this essay, by observing the accuracy value, which is the percentage of test examples that were properly categorized relative to all test instances, and the classification of predicting the rise and fall of stocks. Combining all the considerations, the result shows that the accuracy of the SVM model is higher than the other two models. Additionally, python, which is a superior code language for machine learning, has been chosen for this project.


Introduction
One of the most crucial areas of financial economics is asset pricing, which aims to explain the price or value of an asset that is to be paid in the future under uncertain conditions. The problem of asset pricing is very straightforward in deterministic markets. In layman's terms, the current price of the asset can be directly obtained by discounting the future earnings of an asset with a risk-free rate or rate of return. In reality, however, there is a lot of ambiguity in the financial markets, which increases risk. Sometimes, this risk can be huge. The so-called risk is the discrepancy between expected future asset price trends and actual asset price trends. When dealing with asset pricing under uncertainty, it is essential for engineers to consider investors' perspectives on risk, as well as investors' trade-offs between return and risk. All the above factors explain why predicting stock prices can be such an important topic.
The analysis of three models, which are logistic regression, KNN (K-Nearest Neighbors), and SVM (support vector machine), is the main component of this article. In addition, some brief definitions of these three models are present in this section. First, a logistic regression model requires the selection of predictors and their specific form. Despite regression in its name, logistic regression cannot actually be used to predict continuous values. Kumar [1] found that the factors that influence logistic regression can be either categorical or numerical, which is compatible with multiple linear regression, except that multiple linear regression forecasts constant Y values, whereas records are exclusively categorized by logistic regression [2]. Next, the KNN algorithm is a nonparametric statistical technique for classification and regression in the field of pattern recognition [3][4]. Enter the k closest training samples in feature space in both circumstances. A popular classification technique in data mining technology is the KNN algorithm. Because of its simplicity, it is widely employed in a variety of industries. However, the KNN algorithm's classification effectiveness will be significantly decreased when both the sample size and the feature characteristics are huge. Last, SVM is a supervised machine learning algorithm and an extension of the maximum margin classifier, a straightforward classifier [2,5]. Although it may be used for regression as well, SVM is favored for classification. The following list of SVM's building blocks must be cleared before diving into Support Vector Classifier, Support Vector Machines, Maximum Margin Classifier, and Hyperplane. In addition, the improved clustering algorithm based on a support vector machine has another name called support vector clustering [3,6].
After building these models in python language, the data from the previous 7 days is used as a reference to predict the price on the 8th day. Then calculate the forecast data on day 8 minus the actual data on day 8 divided by the actual data to predict the stock price rise and fall. Moreover, to compare them, calculating the accuracy is also an important task to do. The result shows the fact that the accuracy of the SVM model is higher than the other two models.
As a final note, The remaining sections of the essay are structured as follows: Sample and data are described in Section 2; Topic 3 performs methods and results in analyzations of these three modules; Section 4 enumerates the advantages and disadvantages of machine learning. The last section presents our conclusions.

Firm background
The headquarters of Apple Inc., originally known as Apple Computer, Inc., is located in Cupertino, California, in the United States. The main business of Apple is electronic technology products, including smartphones, tablet computers, computer peripherals, computer software, and so on. As a high-tech company with a reputation for innovation, it changed its name to Apple Inc. on January 9, 2007. Although the first personal computer spreadsheet VisiCalc, which is released by Dan Bricklin and Bob Frankston in 1979, helped Apple open the small-business market, it cannot be ignored that Apple was a dominant player among educational institutions, with its platform dominating software for primary schools until the 1990s, for its aggressive discounts, donations and the absence of the other early competitors [7].
In these years, with the rapid development of Apple Inc., the total revenue of it become the top in the list of the total revenue of the American telecommunications equipment industry. In the third quarter 2022 financial report, the total revenue of apple is 82.959 billion, which increased by about 1.08% year-on-year and was far ahead of second place (Cisco).

Analysis of financial statements
Gross profit is the balance between revenue and the operating costs corresponding to revenue. According to the financial report of Apple Inc. from 2017 to 2021, the gross profit of Apple experienced an upward trend from the 16/17 fiscal year (868.7$ billion) to the 20/21 fiscal year (1528.36$ billion).
The higher net profit ratio can reflect the better profitability of the business, and vice versa. Based on Table 1, from the financial statements of Apple from 2018 to 2021, the net profit ratio of Apple saw an increase from 22.4% in 2018 to 25.88% in 2021. Furthermore, the net profit ratio of Apple became higher than the other corporation in 2021. Although it got a bit decrease in 2020 to 20.94%, its net profit ratio of it is still higher than that of CIEN and that of Apple is almost the same as that of CSCO. ROE is a gauge of profitability and it can reflect the level of return on equity of shareholders. The ROE becomes higher, which means that the management of a firm is more effective at generating revenue and growth from its equity financing [5]. And ROE can be calculated as follows: (1) And the ROE of Apple showed an upward trend from 2018 (49.36%) to 2021 (147.44%), which has the same trend as ROA.

Stock price data
This paper will use the AAPL stock price data from September 15, 2020, to September 15, 2022, including the date, closing price, opening price, highest price, lowest price, volume, and the up and down rate of each day. This data is from the website called investing (https://cn.investing.com/).
The trend of the closing price in this period is as follows: Overall, the closing price of Apple Inc. experienced an upward trend over two years. Although it came down dramatically from April 2002 to around July 2022, it increased rapidly again and reach the level as before (Fig 1).

logistic regression
Logical Regression (LR for short) is a classification problem in machine learning. Logistic regression is a statistical method used to predict the results of dependent variables based on previous observations. It is a kind of regression analysis and a common algorithm to solve the problem of binary classification [8]. It is often used in the second classification problem. For example, predict the rising and falling trend of the stock, and judge whether the disease is negative or positive. This paper is applied to the classification of stock forecasts.
Conditional probability distribution P (Y | X) is used to represent the logical regression model. Y, the random variable, is either 1 or 0, it is called the binomial logical regression model.
Model test: calculate the accuracy of the logical regression model

SVM
A two-dimensional classification problem is a common machine learning problem. SVM has three key points. The first point is the interval, the maximum classification interval of the goal of SVM; The second point is duality, the way to solve SVM; The third point is the kernel technique, the improvement of SVM under nonlinear conditions. Set a threshold value of 0 for the discrimination of the classification function, and further judge its category by comparing the results obtained from the calculation of the classification function. Anomaly detection, intrusion detection, text classification, time series analysis, and the use of deep learning methods such as artificial neural networks are further implementation areas [9].In this paper, the classification of stock price rise and fall is to judge its category in this way (Table 2-3).

KNN
K-Nearest Neighbor is what we often call KNN. It is also a classification algorithm that can be used to deal with classification problems. Its algorithm idea can be understood as that for any dimension input vector, they respectively correspond to a point in a special space, and the output is the category label or predictive value corresponding to the feature vector. K-nearest neighbor algorithm stores all available data, and classifies new data points according to similarity measures. This means when new data appears. After that, it may be quickly categorized into a highly appropriate category using the KNN method [8][9][10].
A unique machine learning algorithm is the KNN algorithm. The feature vector space is divided using training data as a basis, and the final partition result serves as the basis for the algorithm model. Typically, when classifying new data, users merely select the category with the greatest number of occurrences among the first K most similar data in the sample dataset. Concerning classification issues, it is used to predict species, etc. This paper, it is used to predict the rise or fall of stocks (Table  4).

Discussion
Machine learning prediction models have both advantages and drawbacks compared with econometrics forecasting models.

efficiency and effectiveness in data handling
Machine learning models for prediction are more efficient than econometrics. Machine learning models could make more quick calculations in amounts of data and process the data, study, and process these data again for further predictions and decisions. It is automatic and the whole process of data interpretation and handling is completed by the computer so it helps to save a lot of time for men to spend their time on handling a large amount of complex data step by step. Different from econometrics, the working process of machine learning does not need the intervention of men.
Furthermore, machine learning models can be used in various fields, including weather prediction, stock price prediction, and even medicine, and so on. And it is one of the drawbacks of econometrics.

Space for progress
The way for humans to improve themselves is by experience. Same to which, it is also a way for machine learning to improve the accuracy and efficiency of the results. However, cause machine learning can deal with so much data set more quickly, machine learning can gain experiences faster and more than humans. Due to this, machine learning can improve itself more rapidly and it has more room to make progress. And it is easier for the models of machine learning to get better and more accurate forecasts than econometrics.

Time-consuming
Although a machine learning machine can process data faster than econometric models, it costs more time to learn the process of the code to get a better and more efficient result and prediction. Furthermore, machine learning models need many data sets for training and testing for better prediction. If the data source is difficult to find and it requires a lot of space to store, it will be timeconsuming to download these data and need larger internal storage, and it even maybe need more computers to help deal with these data.

Wrong diagnosis
Machine learning is recognized for its higher accuracy rate, while it would be hard to find the errors in the code that is wrong and unsuitable for this series of data if the machine learning models go the incorrect way and get bad predictions. So it is more important for machine learning models compared with econometrics to have a correct program with low biased or without a set of errors.

Conclusion
In this work, we adopt three different models to predict the change of stock trends. Before using the three models for prediction, we determined a judgment standard to judge whether it can rise or fall by 5% after seven days, that is, compare the data of the eighth day with the data of the previous seven days. If the stock can rise by 5% after seven days, it will be judged as 1. If it cannot rise to this standard, it will be judged as 0. If it reaches the set standard, it will be bought. After seven days, it will be sold. Then it will be judged again to choose to buy the stock. The same criteria will be used for the judgment again, and this cycle will be repeated. Then, the project also builds a time-oriented analysis based on the LSTM model. After building the corresponding LSTM, adjust the data structure to the desired way, and finally iterate the LSTM by setting the loss function, and then calculate the indicator judgment of each model.
In the three different models, we study their accuracy in the classification problem. The results show that the SVM model has higher accuracy than the other two models. In addition, from the data comparison between the SVM model and the KNN model, it can be seen that macro avg and weighted avg are two types of data. Since the macro average represents the average of all categories, the weighted average is the improvement of the macro average, which takes into account the proportion of the number of samples of each category in the total samples. The values of SVM are greater than those of KNN, so it is further judged that the SVM model is superior to the KNN model. and further calculate and compare the two indicators of their macro avg and weighted avg using SVM and KNN models. However, the model in this paper is limited to three common models and basic comparison methods. In the future, we can also try to study Bayesian models to predict the stock trend, further achieve the goal of buying low and selling high, and better, more convenient, and more accurate prediction of the rise and fall of stocks.