Catboost Algorithm Application in Legal Texts and UN 2030 Agenda

Lucas José Gonçalves Freitas; Pamella Sada Dias Edokawa; Thaís Carvalho Valadares Rodrigues; Ariane Hayana Thomé de Farias; Euler Rodrigues de Alencar

doi:10.22456/2175-2745.128836

Authors

Lucas José Gonçalves Freitas Universidade de Brasília https://orcid.org/0000-0003-4385-6346
Pamella Sada Dias Edokawa Secretaria de Gestao Estratégica, Supremo Tribunal Federal https://orcid.org/0000-0001-7387-8533
Thaís Carvalho Valadares Rodrigues Universidade de Bras´ılia
Ariane Hayana Thomé de Farias Corregedoria Geral de Justic¸a, Tribunal de Justic¸a de Roraima https://orcid.org/0000-0003-1571-8739
Euler Rodrigues de Alencar Secretaria de Gestao Estratégica, Supremo Tribunal Federal https://orcid.org/0000-0002-7648-6102

DOI:

https://doi.org/10.22456/2175-2745.128836

Keywords:

Natural Language Processing, Legal Text Classification, Machine Learning, UN 2030 Agenda

Abstract

This article evaluates the application of the Catboost algorithm for automatic classification of legal texts in The United Nations (UN) 2030 Agenda for Sustainable Development Goals (SDGs). The task consists of labeling texts from initial petitions and rulings based on identifying topics related to the objectives of the 2030 Agenda, which include sustainable development, quality education, gender equality, preservation of the environment, among other topics of interest to UN member countries. This work aims to help Judicial System employees in case management task, an activity that is manual and repetitive. Since the Catboost algorithm allows joining textual, numerical and categorical features in the same classification model. The proposed approach adds to the classification algorithm traditional metadata about legal processes, such as the Supreme Court Class and Field of Law. The main contributions of this work are: analysis of metadata in machine learning flows and evaluation of the Catboost algorithm for textual classification in legal contexts.

Downloads

Download data is not yet available.

References

KATZ, D. M.; BOMMARITO, M. J.; BLACKMAN, J. A general approach for predicting the behavior of the supreme court of the united states. PLoS ONE, São Francisco, EUA, v. 12, n. 4, p. 485–498, abr. 2017.

MEDVEDEVA, M.; VOLS, M.; WIELING, M. Using machine learning to predict decisions of the european court of human rights. Artificial Intelligence and Law, Dordrecht, Holanda, v. 28, p. 237–266, jun. 2020.

HAUSLADEN, C. I.; SCHUBERT, M. H.; ELLIOT, A. Text classification of ideological direction in judicial opinions. International Review of Law and Economics, Sevenoaks, Inglaterra, v. 62, p. 1–39, jun. 2020.

RADYGIN, V. Y. et al. Application of text mining technologies in russian language for solving the problems of primary financial monitoring. Procedia Computer Science, Amsterdã, Holanda, v. 190, p. 678–683, jan. 2021.

JUNIOR, A. P. C.; CALIXTO, W. P.; CASTRO, C. H. A. Aplicação da inteligência artificial na identificação de conexões pelo fato e tese nas petições iniciais e integração com o sistema de processo eletronico. Revista Eletrônica do CNJ, Brasília, v. 4, n. 1, p. 8–18, jan. 2020.

NASCIMENTO, E. G. S.; OLIVEIRA, R. S. Clustering by similarity of brazilian legal documents using natural language processing approaches. In: TANG, N. (Ed.). Data Clustering. Rijeka, Croácia: Intechopen, 2022. p. 1–15.

MENEZES, E. J. N.; CLEMENTINO, M. B. M. Using deep learning to predict outcomes of legal appeals better than humans experts: A study with data from brazilian federal courts. PLoS ONE, São Francisco, EUA, v. 17, n. 7, p. 1–20, jul. 2022.

ZANUZ, L.; RIGO, S. J. Fostering judiciary applications with new fine-tuned models for legal named entity recognition in portuguese. In: PROPOR 2022: 15th International Conference on Computational Processing of Portuguese. Cham, Suíça: Springer, 2022. p. 219–229.

UNITED NATIONS BRAZIL. Agenda 2030. 2022. Disponível em: ⟨https://brasil.un.org/pt-br/sdgs⟩. Acesso em: 10 set. 2022.

SUPREMO TRIBUNAL FEDERAL. Agenda 2030 and STF. 2022. Disponível em: ⟨https://bit.ly/STFAgendaONU2030⟩. Acesso em: 3 jul. 2022.

SPACY. spaCy: Industrial-strength natural language processing in python. 2022. Disponível em: ⟨https://spacy.io/⟩. Acesso em: 3 jul. 2022.

WALSH, M. Part-of-Speech Tagging for Portuguese. 2021. Disponível em: ⟨https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/Multilingual/Portuguese/03-POS-Keywords-Portuguese.html⟩. Acesso em: 3 jun. 2022.

CATBOOST. CatBoost documentation. 2022. Disponível em: ⟨https://catboost.ai/en/docs/⟩. Acesso em: 3 jul. 2022.

UNIVERSIDADE DE BRAS ́ILIA. Laboratory of Machine Learning in Finance and Organizations. 2022. Disponível em: ⟨https://lamfo-unb.github.io/2017/09/27/BaggingVsBoosting/⟩. Acesso em: 3 jul. 2022.

CATBOOST. CatBoost tutorials. 2022. Disponível em: ⟨https://github.com/catboost/tutorials⟩. Acesso em: 3 jul. 2022.

CHANG, W. et al. shiny: Web Application Framework for R. 2022. Disponível em: ⟨https://shiny.rstudio.com/⟩. Acesso em: 3 jul. 2022.

STREAMLIT. Streamlit. 2022. Disponível em: ⟨https://streamlit.io/⟩. Acesso em: 3 jul. 2022.

QLIK SENSE. Qlik Sense. 2022. Disponível em: ⟨https://www.qlik.com/⟩. Acesso em: 3 jul. 2022.

SOUZA, F.; NOGUEIRA, R.; LOTUFO, R. BERTimbau: pretrained BERT models for Brazilian Portuguese. In: BRACIS 2020: 9th Brazilian Conference on Intelligent Systems. Cham, Suíça: Springer, 2020. p. 403–417.