Group Labeling Methodology Using Distance-based Data Grouping Algorithms


  • Francisco Imperes Filho Federal University of Piaui
  • Vinicius Ponte Machado Departamento de Computacão, Universidade Federal do Piauí, Brasil
  • Rodrigo de Melo Souza Veras Departamento de Computacao, Universidade Federal do Piauı, Brasil
  • Kelson Romulo Teixeira Aires Departamento de Computacao, Universidade Federal do Piaui, Brasil
  • Aline Montenegro Leal Silva Centro de Educacao Aberta e a Distancia, Universidade Federal do Piauı, Brasil



Rotulação de Dados, Definição de Dados, Agrupamento de Dados, Aprendizagem de Máquina


Clustering algorithms are often used to form groups based on the similarity of their members. In this context, understanding a group is just as important as its composition. Identifying, or labeling groups can assist with their interpretation and, consequently, guide decision-making efforts by taking into account the features from each group. Interpreting groups can be beneficial when it is necessary to know what makes an element a part of a given group, what are the main features of a group, and what are the differences and similarities among them. This work describes a method for finding relevant features and generate labels for the elements of each group, uniquely identifying them. This way, our approach solves the problem of finding relevant definitions that can identify groups. The proposed method transforms the standard output of an unsupervised distance-based clustering algorithm into a Pertinence Degree (GP), where each element of the database receives a GP concerning each formed group. The elements with their GPs are used to formulate ranges of values for their attributes. Such ranges can identify the groups uniquely. The labels produced by this approach averaged 94.83% of correct answers for the analyzed databases, allowing a natural interpretation of the generated definitions.


Download data is not yet available.

Author Biography

Francisco Imperes Filho, Federal University of Piaui

Computer Science

Artificial Intelligence

Machine Learnig


FAYYAD, U.; PIATETSKY-SHAPIRO, G.; SMYTH, P. From data mining to knowledge discovery in databases. AI Magazine, v. 17, n. 3, p. 37, Mar. 1996. Dispon ́ıvel em: ⟨ view/1230⟩.

PARTH, M. et al. Survey of unsupervised machine learning algorithms on precision agricultural data. IEEE: Interna- tional Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), DOI: 10.1109/ICI- IECS.2015.7193070, p. 1–8, 2015.

RIZKIN, B. A.; HARTMAN, R. L. Supervised machine learning for prediction of zirconocene-catalyzed α -olefin poly- merization. Chemical Engineering Science, v. 210, p. 115224, 2019. Dispon ́ıvel em: ⟨ article/pii/S000925091930716X⟩.

LIMA, I.; PINHEIRO, C.; SANTOS, F. Inteligencia Artificial. Elsevier Editora Ltda., 2016. Disponıvel em: ⟨https: //⟩.

SANTOS, L. et al. Medical image segmentation using seeded fuzzy c-means: A semi-supervised clustering algorithm. Proceedings of the International Joint Conference on Neural Networks, v. 2018-July, 2018.

MONTEIRO, S. T.; RIBEIRO, C. H. C. Desempenho de algoritmos de aprendizagem por reforco sob condicoes de ambiguidade sensorial em robotica movel. SBA Controle & Automacao, v. 15, n. 3, p. 320–338, Jul 2004.

COPPIN, B. Inteligencia Artificial: traducao e revisao tecnica Jorge Duarte Pires Valerio. 1. ed. [S.l.]: Rio de Janeiro: LCT, 2010.

RASIM, A. et al. Batch clustering algorithm for big data sets. IEEE 10th International Conference on Application of Information and Communication Technologies (AICT), p. 1–4, 2016.

AGGARWAL, C. C.; REDDY, C. K. Data Clustering: Algorithms and Applications. 1. ed. [S.l.]: Chapman and Hall/CRC, 2013.

TAFISH, M. H.; EL-HALEES, A. M. Breast can- cer severity degree predication using data mining tech- niques in the gaza strip. International Conference on Promising Electronic Technologies, ICPET 2018, DOI: 10.1109/ICPET.2018.00029, p. 124–128, 2018.

LOPES, L. A. et al. Automatic Labelling of Clusters of Discrete and Continuous Data with Supervised Machine Learning. Knowledge-Based Systems, v. 106, p. 231 – 241, 2016.

LOPES, L. A. Rotulacao Automatica de Grupos com Aprendizagem de Maquina Supervisionada. 73 p. Dissertacao (Mestrado) — Universidade Federal do Piauı, Teresina, 2014.

MACHADO, V. P.; RIBEIRO, V. P.; RABELO, R. de A. L. Rotulacao de grupos utilizando conjuntos fuzzy. XII Simposio Brasileiro de Automacao Inteligente - SBAI, n. 12, p. 355–360, 2015.

ARAUJO, F. N. C. de et al. Automatic cluster labeling based on phylogram analysis. 2018 International Joint Conference on Neural Networks (IJCNN), p. 1–8, 2018.

FACELI, K. et al. Inteligencia Artificial: Uma Abordagem de Aprendizagem de Ma ́quina. [S.l.]: Rio de Janeiro: LCT, 2011.

RUSSEL, S. J.; NORVIG, P. Inteligeˆncia Artificial. 3. ed. [S.l.]: Rio de Janeiro: Elsevier Editora Ltda, 2013.

VIEIRA, F. do A. et al. Paraconsistent Extractor of Mammographic Images Applied in the Process of Diagnosis of Breast Cancer Assisted by Computer. IEEE Conferences: In- novations in Intelligent Systems and Applications (INISTA), DOI:10.1109/INISTA.2018.8466280, p. 1 – 6, 2018.

BUCZAK, A. L.; ERHAN, G. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, DOI: 10.1109/COMST.2015.2494502, v. 18, p. 1153 – 1176, 2016.

KUN, L. X. Z. et al. Protein function detection based on machine learning: Survey and possible solutions. 15th Inter- national Symposium on Parallel and Distributed Computing (ISPDC), DOI: 10.1109/ISPDC.2016.78, p. 227–333, 2016.

HANEN, A.; RIDHA, B. Exploiting machine learning strategies and rssi for localization in wireless sensor networks: A survey. IEEE: 13th International Wireless Communica- tions and Mobile Computing Conference (IWCMC), DOI: 10.1109/IWCMC.2017.7986447, p. 1150 – 1154, 2017.

GONG, J.; KUANG, X.-H.; LIU, Q. Survey on software vulnerability analysis method based on machine learn- ing. IEEE First International Conference on Data Science in Cyberspace (DSC), DOI: 10.1109/DSC.2016.33, p. 642 – 647, 2016.

EBRU, A.; AKCAYOL, M. A. A comprehensive sur- vey for sentiment analysis tasks using machine learning techniques. IEEE: International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), DOI: 10.1109/INISTA.2016.7571856, p. 1 – 7, 2016.

DETONI, D. et al. Learning to identify at-risk students in distance education using interaction counts. Revista Informatica Teorica Aplicada (Online), v. 23, n. 2, p. 124–140, 2016.

CHANG, K.-S.; PEN, Y.-W.; CHEN, W.-M. Density- based clustering algorithm for gpgpu computing. In IEEE: International Conference on Applied System Innovation (ICASI), DOI: 10.1109/ICASI.2017.7988545, p. 774–777, 2017.

ATILGAN, C.; NASIBOV, E. A memory efficient distributed fuzzy joint points clustering algorithm. IEEE 10th International Conference on Application of Infor- mation and Communication Technologies (AICT), DOI: 10.1109/ICAICT.2016.7991729, n. 10, p. 1–5, 2016.

RIBALDO, R.; CARDOSO, P. C. F.; PARDO, T. A. S. Explorando mapas de relacionamento com base em subtopicos para sumarizac ̧a ̃o multidocumento. Revista Informatica Teorica Aplicada (Online), v. 23, n. 1, p. 183–211, 2016.

MACQUEEN, J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley, Calif.: University of California Press, 1967. p. 281 – 297. Dispon ́ıvel em: ⟨https: //⟩.

LINDER, R. Tecnicas de agrupamento. Revista de Sistemas de Informacao da FSMA, n. 4, p. 18–36, 2009.

KUMAR, A.; KUMAR, S. Density based initialization method for k-means clustering algorithm. I.J. Intelligent Systems and Applications, v. 9, n. 10, p. 40–48, 2017.

NEZHAD, A. S.; SALAJEGHEH, M.; NIA, E. T. Clustering scientific articles based on the k-means algorithm case study: Iranian research institute for information science and technology (irandoc). Iranian Journal of Information Process- ing Management, v. 34, p. 871–896, 2019.

CHERRAT, E.; ALAOUI, R.; BOUZAHIR, H. Improv- ing of fingerprint segmentation images based on k-means and dbscan clustering. International Journal of Electrical and Computer Engineering, v. 9, n. 4, p. 2425–2432, 2019.

PRAETYO, S. Y. J. et al. Mitigation & identification for local aridity, based of vegetation indices combined with spatial statistics & clustering k means. In: . [S.l.: s.n.], 2019. v. 1235, n. 1.

MULYAWAN, B.; CHRISTANTI, M. V.; WENAS, R. Recommendation product based on customer categorization with k-means clustering method. In: . [S.l.: s.n.], 2019. v. 508, n. 1.

SHABARI, S.; SHETTY, S.; SIDDAPPA, M. Imple- mentation and comparison of k-means and fuzzy c-means algorithms for agricultural data. International Conference on Inventive Communication and Computational Technologies, ICICCT 2017, DOI: 10.1109/ICICCT.2017.7975168, p. 105– 108, 2017.

LOPES, L. A.; MACHADO, V. P.; RABELO, R. de A. L. Automatic cluster labeling through artificial neural networks. IEEE International Joint Conference on Neural Networks (IJCNN), p. 762–769, 2014.

FISHER, D. Improving inference through conceptual clustering. In: Proceedings of the Sixth National Conference on Artificial Intelligence. AAAI Press., 1987.

MANGASARIAN, O. L.; WOLBERG, W. H. Cancer diagnosis via linear programming. SIAM News, 1990.

SAKAR, C. O. et al. A comparative analysis of speech signal processing algorithms for parkinson’s disease clas- sification and the use of the tunable q-factor wavelet transform. Applied Soft Computing, v. 74, p. 255 – 263, 2019. Dispon ́ıvel em: ⟨ article/pii/S1568494618305799⟩.




How to Cite

Filho, F. I., Machado, V. P., Veras, R. de M. S., Aires, K. R. T., & Montenegro Leal Silva, A. (2020). Group Labeling Methodology Using Distance-based Data Grouping Algorithms. Revista De Informática Teórica E Aplicada, 27(1), 48–61.



Regular Papers

Most read articles by the same author(s)