Predicting Startup Success Using Tree-Based Machine Learning Algorithms

Saifur Rohman Cholil; Rahmat Gernowo; Catur Edi Widodo; Adi Wibowo; Budi Warsito; Alauddin Maulana Hirzan

doi:10.22456/2175-2745.133375

Authors

Saifur Rohman Cholil Universitas Semarang https://orcid.org/0000-0002-2969-980X
Rahmat Gernowo Universitas Diponegoro https://orcid.org/0000-0002-2409-7295
Catur Edi Widodo Universitas Diponegoro
Adi Wibowo Universitas Diponegoro
Budi Warsito Universitas Diponegoro
Alauddin Maulana Hirzan Universitas Semarang https://orcid.org/0000-0002-2486-6787

DOI:

https://doi.org/10.22456/2175-2745.133375

Keywords:

Benchmark, Prediction, Startup Tech, Tree-based Algorithms

Abstract

Startups are an important element in today’s digital economy. Increased interest in startups as a source of innovation and economic growth has prompted many studies to identify factors that can influence startup success. One of the challenges in predicting startup success is the diversity and complexity of the data. This research aims to identify the tree-based algorithm that achieves the highest accuracy in predicting startup success. The study employs tree-based methods using a single estimator (decision tree), ensemble bagging (bagging, random forest, and Extra Trees), and ensemble boosting (AdaBoost, gradient boosting, LGBM, and XGBoost). Model testing is conducted using evaluation matrices such as accuracy, classification model formation and confusion matrix. The results demonstrate that eXtreme Gradient Boosting (XGBoost) is the most effective prediction method for startup success rate when compared to other tree-based algorithms, achieving a high accuracy of 88.1%. The use of tree-based algorithms can provide useful insights for startup entrepreneurs in improving business strategies and decision-making. The key factors that have the most influence on startup success can be identified through the analysis of model test results, which is useful for startup entrepreneurs and investors in improving business performance.

Downloads

Download data is not yet available.

References

DARADKEH, M.; MANSOOR, W. The impact of network orientation and entrepreneurial orientation on startup innovation and performance in emerging economies: The moderating role of strategic flexibility. Journal of Open Innovation: Technology, Market, and Complexity, v. 9, n. 1, p. 100004, mar. 2023. ISSN 21998531. Disponível em: ⟨https: //linkinghub.elsevier.com/retrieve/pii/S2199853123001063⟩.

AMINOVA, M.; MARCHI, E. The Role of Innovation on Start-Up Failure vs. its Success. International Journal of Business Ethics and Governance, p. 41–72, jan. 2021. ISSN 2717-9923. Disponível em: ⟨https://ijbeg.com/index.php/1/article/view/60⟩.

THAI, D.-K. et al. Classification models for impact damage of fiber reinforced concrete panels using Tree-based learning algorithms. Structures, v. 53, p. 119–131, jul. 2023. ISSN 23520124. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2352012423005350⟩.

KIRAN, S. et al. A Gradient Boosted Decision Tree with Binary Spotted Hyena Optimizer for cardiovascular disease detection and classification. Healthcare Analytics, v. 3, p. 100173, nov. 2023. ISSN 27724425. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2772442523000400⟩.

SNOUSY, M. B. A. et al. Suite of decision tree-based classification algorithms on cancer gene expression data. Egyptian Informatics Journal, v. 12, n. 2, p. 73–82, jul. 2011. ISSN 11108665. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S1110866511000223⟩.

GHIASI, M. M.; ZENDEHBOUDI, S. Application of decision tree-based ensemble learning in the classification of breast cancer. Computers in Biology and Medicine, v. 128, p. 104089, jan. 2021. ISSN 00104825. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S0010482520304200⟩.

MISHRA, S. et al. Tree Based Fault Classification in Underground Cable. Procedia Computer Science, v. 218, p. 524–531, 2023. ISSN 18770509. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S1877050923000340⟩.

BOBBILI, R. Interpretable glass forming ability prediction of amorphous alloys through tree based algorithms. Materials Letters, v. 349, p. 134774, out. 2023. ISSN 0167577X. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S0167577X2300959X⟩.

HOU, T. et al. Marine floating raft aquaculture extraction of hyperspectral remote sensing images based decision tree algorithm. International Journal of Applied Earth Observation and Geoinformation, v. 111, p. 102846, jul. 2022. ISSN 15698432. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S1569843222000486⟩.

BANSAL, M.; GOYAL, A.; CHOUDHARY, A. A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning. Decision Analytics Journal, v. 3, p. 100071, jun. 2022. ISSN 27726622. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2772662222000261⟩.

CINAR, A. C.; KORKMAZ, S.; KIRAN, M. S. A discrete tree-seed algorithm for solving symmetric traveling salesman problem. Engineering Science and Technology, an International Journal, v. 23, n. 4, p. 879–890, ago. 2020. ISSN 22150986. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2215098619313527⟩.

AN, Y.; ZHOU, H. Short term effect evaluation model of rural energy construction revitalization based on ID3 decision tree algorithm. Energy Reports, v. 8, p. 1004–1012, jul. 2022. ISSN 23524847. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2352484722002402⟩.

RUIZ-VILLAFRANCA, S. et al. A MEC-IIoT intelligent threat detector based on machine learning boosted tree algorithms. Computer Networks, p. 109868, jun. 2023. ISSN 13891286. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S1389128623003134⟩.

WIBOWO, A. et al. Anomaly detection on displacement rates and deformation pattern features using tree-based algorithm in Japan and Indonesia. Geodesy and Geodynamics, v. 14, n. 2, p. 150–162, mar. 2023. ISSN 1674-9847. Disponível em: ⟨https://www.sciencedirect.com/science/article/pii/S1674984722000702⟩.

ROCCATELLO, E. et al. Impact of startup and defrosting on the modeling of hybrid systems in building energy simulations. Journal of Building Engineering, v. 65, p. 105767, abr. 2023. ISSN 23527102. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2352710222017739⟩.

ANG, Y. Q.; CHIA, A.; SAGHAFIAN, S. Using Machine Learning to Demystify Startups’ Funding, Post-Money Valuation, and Success. In: BABICH, V.; BIRGE, J. R.; HILARY, G. (Ed.). Innovative Technology at the Interface of Finance and Operations: Volume I. Cham: Springer International Publishing, 2022. p. 271–296. ISBN 978-3-030-75729-8. Disponível em: ⟨https://doi.org/10.1007/978-3-030-75729-8 10⟩.

AFRIYIE, J. K. et al. A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions. Decision Analytics Journal, v. 6, p. 100163, mar. 2023. ISSN 27726622. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2772662223000036⟩.

AYDIN, N. et al. Prediction of financial distress of companies with artificial neural networks and decision trees models. Machine Learning with Applications, v. 10, p. 100432, dez. 2022. ISSN 26668270. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2666827022001074⟩.

SHETTY, S.; MUSA, M.; BR ́eDART, X. Bankruptcy Prediction Using Machine Learning Techniques. Journal of Risk and Financial Management, v. 15, n. 1, p. 35, jan. 2022. ISSN 1911-8074. Disponível em: ⟨https://www.mdpi.com/1911-8074/15/1/35⟩.

OPSTAL, W. V.; BORMS, L. Startups and circular economy strategies: Profile differences, barriers and enablers. Journal of Cleaner Production, v. 396, p. 136510, abr. 2023. ISSN 09596526. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S0959652623006686⟩.

LI, T. et al. Economic Granularity Interval in Decision Tree Algorithm Standardization from an Open Innovation Perspective: Towards a Platform for Sustainable Matching. Journal of Open Innovation: Technology, Market, and Complexity, v. 6, n. 4, p. 149, dez. 2020. ISSN 21998531. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2199853122011246⟩.

̇ZBIKOWSKI, K.; ANTOSIUK, P. A machine learning, bias-free approach for predicting business success using Crunchbase data. Information Processing & Management, v. 58, n. 4, p. 102555, jul. 2021. ISSN 03064573. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S0306457321000595⟩.

KIM, S. Y.; UPNEJA, A. Majority voting ensemble with a decision trees for business failure prediction during economic downturns. Journal of Innovation & Knowledge, v. 6, n. 2, p. 112–123, abr. 2021. ISSN 2444569X. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2444569X21000081⟩.

BATBOOTI, R. S.; RANSING, R. S. A novel imputation based predictive algorithm for reducing common cause variation from small and mixed datasets with missing values. Computers & Industrial Engineering, v. 179, p. 109230, maio 2023. ISSN 03608352. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S0360835223002541⟩.

TAN, B.; GAN, Z.; WU, Y. The measurement and early warning of daily financial stability index based on XGBoost and SHAP: Evidence from China. Expert Systems with Applications, v. 227, p. 120375, out. 2023. ISSN 09574174. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S0957417423008771⟩.

JIN, C. et al. Sampling scheme-based classification rule mining method using decision tree in big data environment. Knowledge-Based Systems, v. 244, p. 108522, maio 2022. ISSN 09507051. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S0950705122002258⟩.

SUBASI, A.; KADASA, B.; KREMIC, E. Classification of the Cardiotocogram Data for Anticipation of Fetal Risks using Bagging Ensemble Classifier. Procedia Computer Science, v. 168, p. 34–39, 2020. ISSN 18770509. Disponível em: ⟨https: //linkinghub.elsevier.com/retrieve/pii/S1877050920303872⟩.

KUMAR, L. S. et al. Random forest tree classification algorithm for predicating loan. Materials Today: Proceedings, v. 57, p. 2216–2222, 2022. ISSN 22147853. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2214785321080585⟩.

SHARMA, D.; KUMAR, R.; JAIN, A. Breast cancer prediction based on neural networks and extra tree classifier using feature ensemble learning. Measurement: Sensors, v. 24, p. 100560, dez. 2022. ISSN 26659174. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2665917422001945⟩.

NAEM, A. A.; GHALI, N. I.; SALEH, A. A. Antlion optimization and boosting classifier for spam email detection. Future Computing and Informatics Journal, v. 3, n. 2, p. 436–442, dez. 2018. ISSN 23147288. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2314728818300746⟩.

KIANGALA, S. K.; WANG, Z. An effective adaptive customization framework for small manufacturing plants using extreme gradient boosting-XGBoost and random forest ensemble learning algorithms in an Industry 4.0 environment. Machine Learning with Applications, v. 4, p. 100024, jun. 2021. ISSN 26668270. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2666827021000050⟩.

Abdullah-All-Tanvir et al. A gradient boosting classifier for purchase intention prediction of online shoppers. Heliyon, v. 9, n. 4, p. e15163, abr. 2023. ISSN 24058440. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S2405844023023708⟩.

SHRESTHA, S. M.; SHAKYA, A. A Customer Churn Prediction Model using XGBoost for the Telecommunication Industry in Nepal. Procedia Computer Science, v. 215, p. 652–661, 2022. ISSN 18770509. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S187705092202138X⟩.

ZHOU, L. Predicting the Removal of Special Treatment or Delisting Risk Warning for Listed Company in China with Adaboost. Procedia Computer Science, v. 17, p. 633–640, 2013. ISSN 18770509. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S1877050913002159⟩.

KILIC, K. et al. Soft ground tunnel lithology classification using clustering-guided light gradient boosting machine. Journal of Rock Mechanics and Geotechnical Engineering, p. S1674775523000720, mar. 2023. ISSN 16747755. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S1674775523000720⟩.

WANG, D.-n.; LI, L.; ZHAO, D. Corporate finance

risk prediction based on LightGBM. Information Sciences, v. 602, p. 259–268, jul. 2022. ISSN 00200255. Disponível em: ⟨https://linkinghub.elsevier.com/retrieve/pii/S002002552200411X⟩.