2024-03-29T07:05:00Z
https://seer.ufrgs.br/index.php/rita/oai
oai:seer.ufrgs.br:article/79333
2018-07-18T00:56:21Z
rita:BCB
driver
A Genetic Programming Model for Association Studies to Detect Epistasis in Low Heritability Data
Ribeiro, Igor Magalhães
Borges, Carlos Cristiano Hasenclever
Silva, Bruno Zonovelli
Arbex, Wagner
Bioinformatics
GWAS
SNP
Genetic Programming
Random Forest
Computational Modeling
Mathematical Modeling
The genome-wide associations studies (GWAS) aims to identify the most influential markers in relation to the phenotype values. One of the substantial challenges is to find a non-linear mapping between genotype and phenotype, also known as epistasis, that usually becomes the process of searching and identifying functional SNPs more complex. Some diseases such as cervical cancer, leukemia and type 2 diabetes have low heritability. The heritability of the sample is directly related to the explanation defined by the genotype, so the lower the heritability the greater the influence of the environmental factors and the less the genotypic explanation. In this work, an algorithm capable of identifying epistatic associations at different levels of heritability is proposed. The developing model is a aplication of genetic programming with a specialized initialization for the initial population consisting of a random forest strategy. The initialization process aims to rank the most important SNPs increasing the probability of their insertion in the initial population of the genetic programming model. The expected behavior of the presented model for the obtainment of the causal markers intends to be robust in relation to the heritability level. The simulated experiments are case-control type with heritability level of 0.4, 0.3, 0.2 and 0.1 considering scenarios with 100 and 1000 markers. Our approach was compared with the GPAS software and a genetic programming algorithm without the initialization step. The results show that the use of an efficient population initialization method based on ranking strategy is very promising compared to other models.
Instituto de Informática - Universidade Federal do Rio Grande do Sul
2018-07-17
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
application/pdf
https://seer.ufrgs.br/index.php/rita/article/view/RITA-VOL-25-NR2-85
10.22456/2175-2745.79333
Revista de Informática Teórica e Aplicada; Vol. 25 No. 2 (2018); 85-92
Revista de Informática Teórica e Aplicada; v. 25 n. 2 (2018); 85-92
2175-2745
0103-4308
eng
https://seer.ufrgs.br/index.php/rita/article/view/RITA-VOL-25-NR2-85/pdf
Copyright (c) 2018 Igor Magalhaes Ribeiro; Carlos Cristiano Hasenclever Borges; Bruno Zonovelli da Silva; Wagner Arbex
oai:seer.ufrgs.br:article/79334
2018-07-18T00:56:21Z
rita:BCB
driver
Relative Scalability of NoSQL Databases for Genotype Data Manipulation
Relative Scalability of NoSQL Databases for Genotype Data Manipulation
Almeida, Arthur Lorenzi
Schettino, Vinícius Junqueira
Barbosa, Thiago Jesus Rodrigues
Freitas, Pedro Fernandes
Guimarães, Pedro Gabriel Silva
Arbex, Wagner
Database
NoSQL
Bionformatics
Data Science
SNP
Genotype
Genotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.
Genotype data manipulation is one of the greatest challenges in research fields such as population genetics, bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explain why relational database management systems (RDBMS), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, the Big Data advent has been pushing the development of modern database systems that might be able to overcome RDBMS deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical genotype data (SNP markers). Results indicate that Tarantool is approximately 21,8% more efficient than MongoDB when storing 770,000 SNP markers, but MongoDB is less impacted by the increase of SNP markers per individual.
Instituto de Informática - Universidade Federal do Rio Grande do Sul
2018-07-17
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
application/pdf
https://seer.ufrgs.br/index.php/rita/article/view/RITA-VOL-25-NR2-93
10.22456/2175-2745.79334
Revista de Informática Teórica e Aplicada; Vol. 25 No. 2 (2018); 93-100
Revista de Informática Teórica e Aplicada; v. 25 n. 2 (2018); 93-100
2175-2745
0103-4308
eng
https://seer.ufrgs.br/index.php/rita/article/view/RITA-VOL-25-NR2-93/pdf
Copyright (c) 2018 Arthur Lorenzi Almeida, Vinícius Junqueira Schettino, Thiago Jesus Rodrigues Barbosa, Pedro Fernandes Freitas, Maurício Henrique Laier, Pedro Gabriel Silva Guimarães, Wagner Arbex