Catboost Algorithm Application in Legal Texts and UN 2030 Agenda




Natural Language Processing, Legal Text Classification, Machine Learning, UN 2030 Agenda


This article evaluates the application of the Catboost algorithm for automatic classification of legal texts in The United Nations (UN) 2030 Agenda for Sustainable Development Goals (SDGs). The task consists of labeling texts from initial petitions and rulings based on identifying topics related to the objectives of the 2030 Agenda, which include sustainable development, quality education, gender equality, preservation of the environment, among other topics of interest to UN member countries. This work aims to help Judicial System employees in case management task, an activity that is manual and repetitive. Since the Catboost algorithm allows joining textual, numerical and categorical features in the same classification model. The proposed approach adds to the classification algorithm traditional metadata about legal processes, such as the Supreme Court Class and Field of Law. The main contributions of this work are: analysis of metadata in machine learning flows and evaluation of the Catboost algorithm for textual classification in legal contexts.


Download data is not yet available.


KATZ, D. M.; BOMMARITO, M. J.; BLACKMAN, J. A general approach for predicting the behavior of the supreme court of the united states. PLoS ONE, São Francisco, EUA, v. 12, n. 4, p. 485–498, abr. 2017.

MEDVEDEVA, M.; VOLS, M.; WIELING, M. Using machine learning to predict decisions of the european court of human rights. Artificial Intelligence and Law, Dordrecht, Holanda, v. 28, p. 237–266, jun. 2020.

HAUSLADEN, C. I.; SCHUBERT, M. H.; ELLIOT, A. Text classification of ideological direction in judicial opinions. International Review of Law and Economics, Sevenoaks, Inglaterra, v. 62, p. 1–39, jun. 2020.

RADYGIN, V. Y. et al. Application of text mining technologies in russian language for solving the problems of primary financial monitoring. Procedia Computer Science, Amsterdã, Holanda, v. 190, p. 678–683, jan. 2021.

JUNIOR, A. P. C.; CALIXTO, W. P.; CASTRO, C. H. A. Aplicação da inteligência artificial na identificação de conexões pelo fato e tese nas petições iniciais e integração com o sistema de processo eletronico. Revista Eletrônica do CNJ, Brasília, v. 4, n. 1, p. 8–18, jan. 2020.

NASCIMENTO, E. G. S.; OLIVEIRA, R. S. Clustering by similarity of brazilian legal documents using natural language processing approaches. In: TANG, N. (Ed.). Data Clustering. Rijeka, Croácia: Intechopen, 2022. p. 1–15.

MENEZES, E. J. N.; CLEMENTINO, M. B. M. Using deep learning to predict outcomes of legal appeals better than humans experts: A study with data from brazilian federal courts. PLoS ONE, São Francisco, EUA, v. 17, n. 7, p. 1–20, jul. 2022.

ZANUZ, L.; RIGO, S. J. Fostering judiciary applications with new fine-tuned models for legal named entity recognition in portuguese. In: PROPOR 2022: 15th International Conference on Computational Processing of Portuguese. Cham, Suíça: Springer, 2022. p. 219–229.

UNITED NATIONS BRAZIL. Agenda 2030. 2022. Disponível em: ⟨⟩. Acesso em: 10 set. 2022.

SUPREMO TRIBUNAL FEDERAL. Agenda 2030 and STF. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

SPACY. spaCy: Industrial-strength natural language processing in python. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

WALSH, M. Part-of-Speech Tagging for Portuguese. 2021. Disponível em: ⟨⟩. Acesso em: 3 jun. 2022.

CATBOOST. CatBoost documentation. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

UNIVERSIDADE DE BRAS ́ILIA. Laboratory of Machine Learning in Finance and Organizations. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

CATBOOST. CatBoost tutorials. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

CHANG, W. et al. shiny: Web Application Framework for R. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

STREAMLIT. Streamlit. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

QLIK SENSE. Qlik Sense. 2022. Disponível em: ⟨⟩. Acesso em: 3 jul. 2022.

SOUZA, F.; NOGUEIRA, R.; LOTUFO, R. BERTimbau: pretrained BERT models for Brazilian Portuguese. In: BRACIS 2020: 9th Brazilian Conference on Intelligent Systems. Cham, Suíça: Springer, 2020. p. 403–417.




How to Cite

Gonçalves Freitas, L. J., Edokawa, P. S. D., Carvalho Valadares Rodrigues, T., Thomé de Farias, A. H., & Rodrigues de Alencar, E. (2023). Catboost Algorithm Application in Legal Texts and UN 2030 Agenda. Revista De Informática Teórica E Aplicada, 30(2), 51–58.



Regular Papers