Relative Scalability of NoSQL Databases for Genotype Data Manipulation

Arthur Lorenzi Almeida, Vinícius Junqueira Schettino, Thiago Jesus Rodrigues Barbosa, Pedro Fernandes Freitas, Pedro Gabriel Silva Guimarães, Wagner Arbex

Abstract


Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.


Keywords


Database; NoSQL; Bionformatics; Data Science; SNP; Genotype

Full Text:

PDF

References


PARADIS, E. et al. Linking genomics and population genetics with r. Mol Ecol Resour., Wiley Online Library, v. 17, n. 1, p. 54–66, 2017.

CATTELL, R. Scalable SQL and NoSQL data stores. Acm Sigmod Rec, v. 39, n. 4, p. 12–27, 2011.

STONEBRAKER, M. Sql databases v. NoSQL databases. ACM Comm., v. 53, n. 4, p. 10–11, 2010.

CONSORTIUM, . G. P. et al. A global reference for human genetic variation. Nature, v. 526, n. 7571, p. 68, 2015.

SHI, W. et al. Informative snps selection based on fuzzy clustering and genetic algorithm. J Comput Theor Nanosci., v. 14, n. 3, p. 1440–1445, 2017.

ZHANG, K. et al. Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. Genome Res., v. 14, n. 5, p. 908–916, 2004.

CAETANO, A. R. Marcadores SNP: conceitos basicos, aplicacoes no manejo e no melhoramento animal e perspectivas para o futuro. Rev. Bras. Zootecnia, v. 38, n. 8, p. 64–71, 2009.

SCHETTINO, V. J. et al. Avaliacao do desempenho relativo de bancos de dados NoSQL para arquivos de geno ́tipos. BRESCI 2016, v. 1, n. 1, p. 306–309, 2016.

EDLICH, S. NoSQL. 2016. Available in: http://www.nosql-database.org.

IT., S. DB-Engines Ranking. 2017. Available in: https://db-engines.com/en/ranking.

SEMPERE ,G.etal.Gigwa—genotypeinvestigatorfor genome-wide analyses. GigaScience, v. 5, n. 1, p. 1–9, 2016.

CHLEBIEJ, M. et al. Architectural challenges of genotype-phenotype data management. In: BARBOSA, S. et al. (Ed.). International Conference: Beyond Databases, Architectures and Structures. Ustron, Poland: Springer, 2015. (Communications in Computer and Information Science).

FOUNDATION, A. S. Apache HBase. 2017. Available in: https://hbase.apache.org.

MONGODB, Inc. MongoDB for GIANT Ideas. 2017. Available in: https://www.mongodb.org.

ORIENTDB Ltd. OrientDB - Distributed Multi- Model and Graph Database. 2017. Available in: ⟨http://orientdb.com/orientdb.

GROUP, M. Tarantool - Get your data in RAM. Get compute close to data. Enjoy the performance. 2017. Available in: https://tarantool.org.

COOPER, B. F. et al. Benchmarking cloud serving systems with ycsb. In: HELLERSTEIN, J. M. (Ed.). Proceedings of the 1st ACM Symposium on Cloud Computing. New York, NY, USA: ACM, 2010. (SoCC ’10, v. 1), p. 143–154.

ABUBAKAR, Y.; ADEYI, T. S.; AUTA, I. G. Performance evaluation of NoSQL systems using YCSB in a resource austere environment. Perf. Evaluation, v. 7, n. 8, p. 23–27, 2014.

MOREIRA, L. O.; SOUSA, F. R.; MACHADO, J. C. Analisando o desempenho de banco de dados multi-inquilino em nuvem. In: OLIVEIRA, J. P. M. de (Ed.). Sao paulo, SP: Brazilian Computer Society Special Interest Group on Databases, 2012.

FRIEDRICH, S. et al. Nosql OLTP benchmarking: A survey. In: LEY, M. (Ed.). GI-Jahrestagung. Stuttgart, Germany: DBLP, 2014. (Data Management in the Cloud, v. 44).




DOI: https://doi.org/10.22456/2175-2745.79334

Copyright (c) 2018 Arthur Lorenzi Almeida, Vinícius Junqueira Schettino, Thiago Jesus Rodrigues Barbosa, Pedro Fernandes Freitas, Maurício Henrique Laier, Pedro Gabriel Silva Guimarães, Wagner Arbex

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.