A gene based bacterial whole genome comparison toolkit

Luciano Antonio Digiampietri, Vivian Mayumi Yamassaki Pereira, Geraldo José Santos-Júnior, Giovani Sousa-Leite, Priscilla Koch Wagner, Leandro Márcio Moreira, Caio Rafael do Nascimento Santiago


Most of the computational biology analysis is made comparing genomic features. The nucleotide and amino acid sequence alignments are frequently used in gene function identification and genome comparison. Despite its widespread use, there are limitations in their analysis capabilities that need to be considered but are often overlooked or unknown by many researchers. This paper presents a gene based whole genome comparison toolkit which can be used not only as an alternative and more robust way to compare a set of whole genomes, but, also, to understand the tradeoff of the use of sequence local alignment in this kind of comparison. A study case was performed considering fifteen whole genomes of the Xanthomonas genus. The results were compared with the 16S rRNA-processing protein RimM phylogeny and some thresholds for the use of sequence alignments in this kind of analysis were discussed.


Bioinformatics; Whole genome; Genome comparison; Phylogeny; Pangenome; Genome visualization

Full Text:



FIETTO, J. L. R.; MACIEL, T. E. F. Sequenciando genomas. In: Cieˆncias genoˆmicas: fundamentos e aplicac ̧o ̃es. 1. ed. Porto alegre, Brazil: Sociedade Brasileira de Computacao, 2015. v. 1, p. 27–64.

SANGER, F.; NICKLEN, S.; COULSON, A. R. Dna sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U. S. A., v. 74, n. 12, p. 5463–5467, 12 1977.

HARDISON, R. C. Comparative genomics. PLOS Biol., v. 1, n. 2, p. e58, 11 2003.

XIA, X. Comparative Genomics. 1. ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. v. 1. (SpringerBriefs in Genetics, v. 1).

LANDER, E. S. et al. Initial sequencing and analysis of the human genome. Nature, v. 409, n. 6822, p. 860–921, 2 2001.

KEHDY, F. S. G. et al. Origin and dynamics of admixture in brazilians and its effect on the pattern of deleterious mutations. Proc. Natl. Acad. Sci., v. 112, n. 28, p. 8696–8701, 7 2015.

ILINA, E. N. et al. Comparative genomic analysis of mycobacterium tuberculosis drug resistant strains from russia. PLoS ONE, v. 8, n. 2, p. e56577, 2 2013.

LU, Y. et al. Omics data reveal the unusual asexual-fruiting nature and secondary metabolic potentials of the medicinal fungus cordyceps cicadae. BMC Genom., v. 18, n. 1, p. 668, 2017.

TATUSOV, R. L.; KOONIN, E. V.; LIPMAN, D. J. A genomic perspective on protein families. Science, v. 278, n. 5338, p. 631–637, 1997.

DALQUEN, D. A. et al. The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: A simulation study. PLOS ONE, v. 8, n. 2, p. 1–11, 2 2013.

ALTSCHUL, S. et al. Basic local alignment search tool. J. Mol. Biol., v. 215, n. 3, p. 403–410, 1990.

LANGMEAD, B. et al. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol., v. 10, n. 3, p. R25, 2009.

SASSON, O.; LINIAL, N.; LINIAL, M. The metric space of proteins– comparative study of clustering algorithms. BIOINFORMATICS, v. 18, n. 1, p. 14–21, 2002.

BOLTEN, E. et al. Clustering protein se- quences–structure prediction by transitive homology. Bioinform. (Oxf. Engl.), v. 17, n. 10, p. 935–41, 10 2001.

ENRIGHT, A. J.; DONGEN, S. V.; OUZOUNIS, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic acids res., v. 30, n. 7, p. 1575–84, 4 2002.

RYAN, R. P. et al. Pathogenomics of xanthomonas: understanding bacterium-plant interactions. Nat. rev., Microbiol., v. 9, n. 5, p. 344–55, 5 2011.

JALAN, N. et al. Comparative genomic and transcriptome analyses of pathotypes of xanthomonas citri subsp. citri provide insights into mechanisms of bacterial virulence and host range. BMC genom., v. 14, n. 14, p. 551, 2013.

ZHANG, Y. et al. Positive selection is the main driving force for evolution of citrus canker-causing xanthomonas. ISME J., v. 9, n. 10, p. 2128–2138, 2015.

ASSIS, R. de A. B. et al. Identification and analysis of seven effector protein families with different adaptive and evolutionary histories in plant-associated members of the xanthomonadaceae. Sci. Reportsvolume, v. 7, n. 23, p. 16133, 2017.

TENNANT, P. F. et al. Diseases and pests of citrus (citrus spp.). Tree Sci Biotech, v. 3, n. 1, p. 81–107, 2009.

PALANIRAJ, A.; JAYARAMAN, V. Production, recovery and applications of xanthan gum by xanthomonas campestris. J. Food Eng., v. 106, n. 1, p. 1–12, 2011.

PIERETTI, I. et al. The complete genome sequence of xanthomonas albilineans provides new insights into the reductive genome evolution of the xylem-limited xanthomonadaceae. BMC Genom., v. 10, n. 1, p. 616, 12 2009.

SALZBERG, S. L. et al. Genome sequence and rapid evolution of the rice pathogen xanthomonas oryzae pv. oryzae pxo99a. BMC genom., v. 9, n. 204, p. 204, 2008.

GEVERS, D. et al. Re-evaluating prokaryotic species. Nat. Rev. Microbiol., v. 3, n. 9, p. 733–739, 2005.

YOUNG, J. et al. A multilocus sequence analysis of the genus Xanthomonas. Syst. Appl. Microbiol., v. 31, n. 5, p. 366 – 377, 2008.

ALTENHOFF, A. M.; DESSIMOZ, C. Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS comput. biol., v. 5, n. 1, p. e1000262, 1 2009.

LAING, C. R. et al. Everything at once: comparative analysis of the genomes of bacterial pathogens. Vet. microbiol., v. 153, n. 1-2, p. 13–26, 11 2011.

ENRIGHT, A. J.; OUZOUNIS, C. A. Generage: a robust algorithm for sequence clustering and domain detection. BIOINFORMATICS, v. 16, n. 5, p. 451–457, 2000.

PROCLUST: improved clustering of protein sequences with an extended graph-based approach. BIOINFORMATICS, v. 18, n. 2, p. 182–191, 2002.

ABASCAL, F.; VALENCIA, A. Clustering of proximal sequence space for the identification of protein families. BIOINFORMATICS, v. 18, n. 7, p. 908–921, 2002.

PAGE, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics, v. 31, n. 22, p. 3691–3693, 2015.

BENEDICT, M. N. et al. Itep: An integrated toolkit for exploration of microbial pan-genomes. BMC Genom., v. 15, n. 1, p. 8, 2014.

CHAUDHARI, N. M.; GUPTA, V. K.; DUTTA, C. Bpga-an ultra-fast pan-genome analysis pipeline. Sci. Rep., v. 6, n. April, p. 1–10, 2016.

COURONNE, O. Strategies and tools for whole-genome alignments. Genome Res., v. 13, n. 1, p. 73–80, 1 2003.

FLEISCHMANN, R. D. et al. Whole-genome comparison of mycobacterium tuberculosis clinical and laboratory strains. J. Bacteriol., v. 184, n. 19, p. 5479–5490, 10 2002.

ROUCHKA, E. C.; GISH, W.; STATES, D. J. Comparison of whole genome assemblies of the human genome. Nucleic acids res., v. 30, n. 22, p. 5004–14, 11 2002.

SILVA, A. C. R. da et al. Comparison of the genomes of two xanthomonas pathogens with differing host specificities. Nature, v. 417, n. 6887, p. 459–463, 5 2002.

QIAN, W. Comparative and functional genomic analyses of the pathogenicity of phytopathogen xanthomonas campestris pv. campestris. Genome Res., v. 15, n. 6, p. 757–767, 5 2005.

DARLING, A. E.; MAU, B.; PERNA, N. T. progressivemauve: Multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE, v. 5, n. 6, p. e11147, 6 2010.

DUBCHAK, I. et al. Multiple whole-genome alignments without a reference organism. Genome Res., v. 19, n. 4, p. 682–689, 4 2009.

SANTIAGO, C.; PEREIRA, V.; DIGIAMPIETRI, L. Homology detection using multilayer maximum clustering coefficient. J. Comput. Biol., v. 25, n. 2, p. 1328–1338, 2018.

DOI: https://doi.org/10.22456/2175-2745.84814

Copyright (c) 2019 Luciano Antonio Digiampietri, Vivian Mayumi Yamassaki Pereira, Geraldo José Santos-Júnior, Giovani Sousa-Leite, Priscilla Koch Wagner, Leandro Márcio Moreira, Caio Rafael do Nascimento Santiago

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.