Used Car Price Prediction by Using XGBoost
DOI:
https://doi.org/10.54691/bcpbm.v44i.4794Keywords:
Car Price; XGBoost; Variable Selection.Abstract
This article demonstrates that by using methods such as Extreme Gradient Boosting (XGBoost), dummy variables, etc., the selling price can be accurately predicted according to the different conditions and variables of each used car. The used car dataset is divided into a training dataset and a test dataset according to the ratio of 83% and 17%. This article uses a total of three data processing methods to find the most accurate prediction method. The first is to remove the outliers of the training dataset and test dataset, and then directly use the xgboost prediction method for prediction. The second is to remove the outliers and remove the variable power that is most closely related to the price of the used car, and then use the xgboost prediction method to make predictions. The third method is to remove outliers and then normalize the training dataset and test dataset, finally using the xgboost prediction method to predict. The experimental results show that normalizing the dataset and then using XGBoost and dummy variables can be used to predict the selling price accurately and efficiently through the different usage conditions of each used car.
Downloads
References
Rawhide Youth Services. “9 Advantages of Buying a Used Car Instead of New.” Rawhide Youth Services, 21 Sept. 2015, https://www.rawhide.org/blog/car-tips/9-advantages-of-buying-a-used-car-instead-of-new/?gclid=Cj0KCQiAveebBhD_ARIsAFaAvrHL9hKasLoabnytqMICuCJcG_s43bBTVZvOEaaOkKy_vOzs2aMIoQ8aAp_sEALw_wcB.
Carlier, Mathilde. “New and Used Light Vehicle Sales in the United States from 2010 to 2021.” Statista, 22 July 2022, https://www.statista.com/statistics/183713/value-of-us-passenger-cas-sales-and-leases-since-1990/.
“What Is XGBoost?” NVIDIA Data Science Glossary, https://www.nvidia.com/en-us/glossary/data-science/xgboost/.
“XGBoost.” GeeksforGeeks, 11 July 2022, https://www.geeksforgeeks.org/xgboost/.
Kasliwal, Avi. “Used Cars Price Prediction.” Kaggle, 25 June 2019, https://www.kaggle.com/datasets/avikasliwal/used-cars-price-prediction.
“Training and Test Sets: Splitting Data Machine Learning | Google Developers.” Google, Google, 18 July 2022, https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-data.
Mello, Arthur. “XGBoost: Theory and Practice - Towardsdatascience.com.” Towards Data Science, 17 Aug. 2020, https://towardsdatascience.com/xgboost-theory-and-practice-fb8912930ad6.
“Understanding Correlations and Correlation Matrix.” Muthukrishnan, 7 May 2021, https://muthu.co/understanding-correlations-and-correlation-matrix/.
Zach, zach. “How to Normalize Data between 0 and 1.” Statology, 26 Apr. 2021, https://www.statology.org/normalize-data-between-0-and-1/.
“How, When, and Why Should You Normalize / Standardize / Rescale Your Data?” Towards AI, 29 May 2020, https://towardsai.net/p/data-science/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff.






