Predicting Trip Purposes of Households in Makurdi Using Machine Learning: A Comparative Analysis of Decision Tree, CatBoost, and XGBoost Algorithms

Main Article Content

Emmanuel Okechukwu Nwafor
Folake Olubunmi Akintayo

Abstract

This study explores the application of machine learning techniques for predicting trip purposes in Makurdi, Nigeria, utilizing three advanced algorithms: Decision Tree (DT), CatBoost, and XGBoost. The research aims to determine the most effective model for predicting household trip purposes based on demographic, socioeconomic, and travel data. Model performance was assessed using key metrics, including R-squared (R²), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), revealing distinct strengths and weaknesses among the models. CatBoost demonstrated the highest R² score of 73%, indicating its efficacy in capturing variance in trip purposes, despite a higher MAE (0.353) and RMSE (0.850), which suggest potential for larger prediction errors. XGBoost, with an R² score of 72% and the lowest RMSE of 0.545, exhibited a balanced performance, providing accurate predictions with minimal error. The Decision Tree model, while acceptable with an R² of 68%, MAE of 0.314, and RMSE of 0.615, ranked lower in predictive accuracy. The findings advocate for the use of XGBoost as the most reliable model for this task. Future research directions include hyperparameter optimization and the investigation of ensemble methods to enhance predictive accuracy

Article Details

How to Cite
Nwafor, E. O., & Akintayo, F. O. (2024). Predicting Trip Purposes of Households in Makurdi Using Machine Learning: A Comparative Analysis of Decision Tree, CatBoost, and XGBoost Algorithms. Engineering Applications, 3(3), 260–274. Retrieved from https://publish.mersin.edu.tr/index.php/enap/article/view/1605
Section
Articles

References

Tao, X., Cheng, L., Zhang, R., Chan, W. K., Chao, H., & Qin, J. (2023). Towards Green Innovation in Smart Cities: Leveraging Traffic Flow Prediction with Machine Learning Algorithms for Sustainable Transportation Systems. Sustainability, 16(1), 251.

Ang, K. L. M., Seng, J. K. P., Ngharamike, E., & Ijemaru, G. K. (2022). Emerging technologies for smart cities’ transportation: geo-information, data analytics and machine learning approaches. ISPRS International Journal of Geo-Information, 11(2), 85.

Anagnostopoulos, T. (2021). A predictive vehicle ride sharing recommendation system for smart cities commuting. Smart Cities, 4(1), 177-191.

Musa, A. A., Malami, S. I., Alanazi, F., Ounaies, W., Alshammari, M., & Haruna, S. I. (2023). Sustainable Traffic Management for Smart Cities Using Internet-of-Things-Oriented Intelligent Transportation Systems (ITS): Challenges and Recommendations. Sustainability, 15(13), 9859.

Gallo, M., & Marinelli, M. (2020). Sustainable mobility: A review of possible actions and policies. Sustainability, 12(18), 7499.

Dingil, A. E., Rupi, F., & Esztergár-Kiss, D. (2021). An integrative review of socio-technical factors influencing travel decision-making and urban transport performance. Sustainability, 13(18), 10158.

Guo, Y., & Peeta, S. (2020). Impacts of personalized accessibility information on residential location choice and travel behavior. Travel Behaviour and Society, 19, 99-111.

Wang, Y., Currim, F., & Ram, S. (2022). Deep learning of spatiotemporal patterns for urban mobility prediction using big data. Information Systems Research, 33(2), 579-598.

He, W., & Chen, M. (2024). Advancing Urban Life: A Systematic Review of Emerging Technologies and Artificial Intelligence in Urban Design and Planning. Buildings, 14(3), 835.

Khan, A. F., & Ivan, P. (2023). Integrating Machine Learning and Deep Learning in Smart Cities for Enhanced Traffic Congestion Management: An Empirical Review. J. Urban Dev. Manag, 2(4), 211-221.

Karami, Z., & Kashef, R. (2020). Smart transportation planning: Data, models, and algorithms. Transportation Engineering, 2, 100013.

Alsaleh, N., & Farooq, B. (2021). Interpretable data-driven demand modelling for on-demand transit services. Transportation Research Part A: Policy and Practice, 154, 1-22.

Yang, B., Tian, Y., Wang, J., Hu, X., & An, S. (2022). How to improve urban transportation planning in big data era? A practice in the study of traffic analysis zone delineation. Transport policy, 127, 1-14.

Park, K., Sabouri, S., Lyons, T., Tian, G., & Ewing, R. (2020). Intrazonal or interzonal? Improving intrazonal travel forecast in a four-step travel demand model. Transportation, 47, 2087-2108.

Waghmare, A., Yadav, G., & Tiwari, K. (2022). Four Step Travel Demand Modeling for Urban Transportation Planning. Sci. Eng. Technol., 5, 1254.

Lwin, W. Y., Yoon, B. J., & Lee, S. M. (2024). Exercising The Traditional Four-Step Transportation Model Using Simplified Transport Network of Mandalay City in Myanmar. Journal of the Society of Disaster Information, 20(2), 257-269.

Miller, E. J. (2020). Travel demand models, the next generation: Boldly going where no-one has gone before. In Mapping the Travel Behavior Genome (pp. 29-46). Elsevier.

Mukherjee, J., & Kadali, B. R. (2022). A comprehensive review of trip generation models based on land use characteristics. Transportation Research Part D: Transport and Environment, 109, 103340.

Hasnine, M. S., & Nurul Habib, K. (2021). Tour-based mode choice modelling as the core of an activity-based travel demand modelling framework: A review of state-of-the-art. Transport Reviews, 41(1), 5-26.

Huang, Y., Gao, L., Ni, A., & Liu, X. (2021). Analysis of travel mode choice and trip chain pattern relationships based on multi-day GPS data: A case study in Shanghai, China. Journal of transport geography, 93, 103070.

Heidari, A., Navimipour, N. J., & Unal, M. (2022). Applications of ML/DL in the management of smart cities and societies based on new trends in information technologies: A systematic literature review. Sustainable Cities and Society, 85, 104089.

Abid, M. T., Aljarrah, N., Shraa, T., & Alghananim, H. M. (2024). Forecasting and managing urban futures: Machine learning models and optimization of urban expansion. Asian Journal of Civil Engineering, 25(6), 4673-4682.

AlKhereibi, A. H., Wakjira, T. G., Kucukvar, M., & Onat, N. C. (2023). Predictive machine learning algorithms for metro ridership based on urban land use policies in support of transit-oriented development. Sustainability, 15(2), 1718.

Kayıran, H. F., & Şahmeran, U. (2022). Development of individualized education system with artificial intelligence Fuzzy logic method. Engineering Applications, 1 (2), 137-144

Zela, K., & Saliaj, L. (2023). Forecasting through neural networks: Bitcoin price prediction. Engineering Applications, 2 (3), 218-224

Kayıran, H. F. (2022). The function of artificial intelligence and its sub-branches in the field of health. Engineering Applications, 1 (2), 99-107

Li, J., Wang, X., Yang, X., Zhang, Q., & Pan, H. (2024). Analyzing freeway safety influencing factors using the CatBoost model and interpretable machine-learning framework, SHAP. Transportation research record, 2678(7), 563-574.

RK, P., M. AboRas, K., & Youssef, A. (2024). Application of an ensemble CatBoost model over complex dataset for vehicle classification. Plos one, 19(6), e0304619.

Behboudi, N., Moosavi, S., & Ramnath, R. (2024). Recent Advances in Traffic Accident Analysis and Prediction: A Comprehensive Review of Machine Learning Techniques. arXiv preprint arXiv:2406.13968.

Koushik, A. N., Manoj, M., & Nezamuddin, N. (2020). Machine learning applications in activity-travel behaviour research: a review. Transport reviews, 40(3), 288-311.

Wang, S., Mo, B., Hess, S., & Zhao, J. (2021). Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark. arXiv preprint arXiv:2102.01130.

Wu, W., Xia, Y., & Jin, W. (2020). Predicting bus passenger flow and prioritizing influential factors using multi-source data: Scaled stacking gradient boosting decision trees. IEEE Transactions on Intelligent Transportation Systems, 22(4), 2510-2523.

Tekouabou, S. C. K., Diop, E. B., Azmi, R., Jaligot, R., & Chenal, J. (2022). Reviewing the application of machine learning methods to model urban form indicators in planning decision support systems: Potential, issues and challenges. Journal of King Saud University-Computer and Information Sciences, 34(8), 5943-5967.

Mienye, I. D., & Jere, N. (2024). A Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Access.

Sebt, M. V., Sadati-Keneti, Y., Rahbari, M., Gholipour, Z., & Mehri, H. (2024). Regression Method in Data Mining: A Systematic Literature Review. Archives of Computational Methods in Engineering, 1-20.

Vincent, R. R., Sakthivel, E., Kumari, M., Nisha, F., & Rohini, A. (2024). Machine Learning for Geospatial Analysis: Enhancing Spatial Understanding and Decision-Making. In Ethics, Machine Learning, and Python in Geospatial Analysis (pp. 168-195). IGI Global.

Zhang, L., & Jánošík, D. (2024). Enhanced short-term load forecasting with hybrid machine learning models: CatBoost and XGBoost approaches. Expert Systems with Applications, 241, 122686.

Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. Journal of big data, 7(1), 94.

Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54, 1937-1967.

Demir, S., & Sahin, E. K. (2023). An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Computing and Applications, 35(4), 3173-3190.

Joshi, A., Saggar, P., Jain, R., Sharma, M., Gupta, D., & Khanna, A. (2021). CatBoost—An ensemble machine learning model for prediction and classification of student academic performance. Advances in Data Science and Adaptive Analysis, 13(03n04), 2141002.

Kinnander, M. (2020). Predicting profitability of new customers using gradient boosting tree models: Evaluating the predictive capabilities of the XGBoost, LightGBM and CatBoost algorithms.

Zheng, Q., Yu, C., Cao, J., Xu, Y., Xing, Q., & Jin, Y. (2024). Advanced Payment Security System: XGBoost, CatBoost and SMOTE Integrated. arXiv preprint arXiv:2406.04658.

Hasan, B., Shaikh, S. A., Khaliq, A., & Nadeem, G. (2024). Data-Driven Decision-Making: Accurate Customer Churn Prediction with Cat-Boost. The Asian Bulletin of Big Data Management, 4(02), Science-4.

Almahdi, A., Al Mamlook, R. E., Bandara, N., Almuflih, A. S., Nasayreh, A., Gharaibeh, H., ... & Jamal, A. (2023). Boosting Ensemble Learning for Freeway Crash Classification under Varying Traffic Conditions: A Hyperparameter Optimization Approach. Sustainability, 15(22), 15896.

Almeida, M. R. M. R. (2024). Hybrid Failure Prognosis Approach combining Data-Driven and Knowledge-Based Methods.

Zhen, H., & Yang, J. J. (2024). Analyzing the importance of network topology in AADT estimation: insights from travel demand models using graph neural networks. Transportation, 1-38.

Abouelela, M., Lyu, C., & Antoniou, C. (2023). Exploring the Potentials of Open-Source Big Data and Machine Learning in Shared Mobility Fleet Utilization Prediction. Data Science for Transportation, 5(2), 5.

Senthilkumar, V. (2023). Enhancing House Rental Price Prediction Models for the Swedish Market: Exploring External features, Prediction intervals and Uncertainty Management in Predicting House Rental Prices.

Santamato, V., Tricase, C., Faccilongo, N., Iacoviello, M., Pange, J., & Marengo, A. (2024). Machine Learning for Evaluating Hospital Mobility: An Italian Case Study. Applied Sciences, 14(14), 6016.

Olaleye, O. (2024). Machine Learning and Stochastic Simulation for Inventory Management (Doctoral dissertation, Massachusetts Institute of Technology).

Martín-Baos, J. Á., López-Gómez, J. A., Rodriguez-Benitez, L., Hillel, T., & García-Ródenas, R. (2023). A prediction and behavioural analysis of machine learning methods for modelling travel mode choice. Transportation research part C: emerging technologies, 156, 104318.

Hu, S. (2023). A Big-Data-Driven Framework for Spatiotemporal Travel Demand Estimation and Prediction (Doctoral dissertation, University of Maryland, College Park).

Liu, Y., Miller, E., & Habib, K. N. (2022). Detecting transportation modes using smartphone data and GIS information: evaluating alternative algorithms for an integrated smartphone-based travel diary imputation. Transportation Letters, 14(9), 933-943.

Chen, Y., Geng, M., Zeng, J., Yang, D., Zhang, L., & Chen, X. M. (2023). A novel ensemble model with conditional intervening opportunities for ride-hailing travel mobility estimation. Physica A: Statistical Mechanics and its Applications, 628, 129167.

Wu, P., Zhang, Z., Peng, X., & Wang, R. (2024). Deep learning solutions for smart city challenges in urban development. Scientific Reports, 14(1), 5176.

Wolniak, R., & Stecuła, K. (2024). Artificial Intelligence in Smart Cities—Applications, Barriers, and Future Directions: A Review. Smart Cities, 7(3), 1346-1389.

Nikitas, A., Michalakopoulou, K., Njoya, E. T., & Karampatzakis, D. (2020). Artificial intelligence, transport and the smart city: Definitions and dimensions of a new mobility era. Sustainability, 12(7), 2789.

Sarker, I. H. (2022). Smart City Data Science: Towards data-driven smart cities with open research issues. Internet of Things, 19, 100528.

Soomro, K., Bhutta, M. N. M., Khan, Z., & Tahir, M. A. (2019). Smart city big data analytics: An advanced review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(5), e1319.

França, R. P., Monteiro, A. C. B., Arthur, R., & Iano, Y. (2021). An overview of the machine learning applied in smart cities. Smart cities: A data analytics perspective, 91-111.

Lukic Vujadinovic, V., Damnjanovic, A., Cakic, A., Petkovic, D. R., Prelevic, M., Pantovic, V., ... & Bodolo, I. (2024). AI-Driven Approach for Enhancing Sustainability in Urban Public Transportation. Sustainability, 16(17), 7763.

Mayuranathan, M., Nahar, G., Vijayakumar, A., Mamodiya, U., & Babu, D. M. (2024). Sustainable Business Models for Smart City Using Artificial Intelligence Techniques. In Navigating the Circular Age of a Sustainable Digital Revolution (pp. 263-294). IGI Global

Van Hoang, T. (2024). Impact of integrated artificial intelligence and internet of things technologies on smart city transformation. Journal of Technical Education Science, 19(Special Issue 01), 64-73.

Boukerche, A., & Wang, J. (2020). Machine learning-based traffic prediction models for intelligent transportation systems. Computer Networks, 181, 107530.

Yuan, T., da Rocha Neto, W., Rothenberg, C. E., Obraczka, K., Barakat, C., & Turletti, T. (2022). Machine learning for next‐generation intelligent transportation systems: A survey. Transactions on emerging telecommunications technologies, 33(4), e4427.

Tavakoli, F. (2023). Dataset Creation and Imbalance Mitigation in Big Data: Enhancing Machine Learning Models for Forest Fire Prediction (Master's thesis, University of Waterloo).

Yeung, C., Bunker, R., Umemoto, R., & Fujii, K. (2024). Evaluating soccer match prediction models: a deep learning approach and feature optimization for gradient-boosted trees. Machine Learning, 1-24.

National Population Commission, (2006). National Population Census of Federal Republic of Nigeria Official Gazette, 96 (2).

Abah, R.C., (2012). Causes of Seasonal Flooding in Flood Plains: A Case of Makurdi, Northern Nigeria. Intl. J. Envtal Studies, Vol. 69, No. 6, Pp. 904-912.

Morin, J. F., Olsson, C., & Atikcan, E. O. (2021). Research methods in the social sciences: an A-Z of key concepts. Oxford University Press.

Nachmias, D. & Nachmias, C. F. (2014). Research Methods in the Social Sciences, 8th ed. Worth Publishers.

Yin, R. K. (2017). Case study research and applications: design and methods, 6th ed. Sage Publications.

Adam, A. M. (2020). Sample Size Determination in Survey Research. Journal of Scientific Research and Reports, 26(5), 90–97.

Yamane, Y. (1967). Mathematical Formulae for Sample Size Determination.