Enhancing Housing Price Prediction Accuracy through Hybrid POI-XGBoost Models: A Case Study of Nanjing
DOI:
https://doi.org/10.54691/v6hkjm84Keywords:
Real Estate Price Prediction; XGBoost; POI (Point of Interest); Web Search Data; Machine Learning; Nanjing.Abstract
[Purpose] This study aims to improve the precision of second-hand housing price prediction by overcoming the limitations of traditional statistical data, such as poor timeliness and authenticity.[Method] Utilizing Python web crawlers, this paper collects housing data from Lianjia and processes neighborhood characteristics using Baidu POI (Points of Interest) data. Three models are constructed and compared: Vector Autoregression (VAR), Random Forest, and the proposed POI-XGBoost model. [Findings] The empirical results based on Nanjing
[Purpose] This study aims to improve the precision of second-hand housing price prediction by overcoming the limitations of traditional statistical data, such as poor timeliness and authenticity.[Method] Utilizing Python web crawlers, this paper collects housing data from Lianjia and processes neighborhood characteristics using Baidu POI (Points of Interest) data. Three models are constructed and compared: Vector Autoregression (VAR), Random Forest, and the proposed POI-XGBoost model. [Findings] The empirical results based on Nanjing data demonstrate that the XGBoost model, enhanced with POI features, achieves superior predictive performance (of 0.952), significantly outperforming traditional linear and random forest models. [Originality] This research provides a novel methodological framework for integrating multi-source geospatial data with gradient boosting algorithms to capture the non-linear impact of neighborhood amenities on housing values.
data demonstrate that the XGBoost model, enhanced with POI features, achieves superior predictive performance (of 0.952), significantly outperforming traditional linear and random forest models. [Originality] This research provides a novel methodological framework for integrating multi-source geospatial data with gradient boosting algorithms to capture the non-linear impact of neighborhood amenities on housing values.
Downloads
References
[1] Zhang, R., & Li, S. (2023). Big Data Analytics in Real Estate: A Review. Journal of Property Research, 40(2), 145-167.
[2] Antipov, E. A., & Pokryshevskaya, E. B. (2012). Mass appraisal of residential apartments: An application of Random forest for valuation. Expert Systems with Applications, 39(2), 1772-1778.
[3] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD.
[4] Li, X., et al. (2023). Quantifying the Impact of Green Space on Housing Prices Using Machine Learning Interpreters. Sustainable Cities and Society, 89, 104-123.
[5] Zhang, Y., et al. (2024). Integrating Geospatial Data with Machine Learning for Urban House Price Prediction. Computers, Environment and Urban Systems, 103, 102-115.
[6] Wang, J., & Li, H. (2022). Forecasting Housing Price Volatility: A Hybrid LSTM-GARCH Approach. International Journal of Forecasting, 38(4), 1345-1360.
[7] Zhang, L., et al. (2024). Spatial Heterogeneity in Housing Markets: A GWR-ML Fusion Model. Land Use Policy, 136, 106-120.
[8] Goodman, A. C., & Thibodeau, T. G. (1998). Housing Market Segmentation. Urban Studies, 35(10), 1733-1745.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Scientific Journal of Economics and Management Research

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




