On the Influence of Latent Semantic Analysis Parameterization for Bug Localization
The bug localization problem has benefited from modern information retrieval techniques, such as Latent Semantic Analysis. There are many factors that influence the quality of results of this approach, such as, stop-words, term-document
matrix transformations, dimensionality reduction and filtering criteria of the corpus. In this paper, we study the effect of different combinations for these factors on the impact of the accuracy of the query results in the proposed technique for bug localization. Bugs of three real-world software systems were analyzed with different combinations of input parameters for the LSA technique. Our results suggest that the term-document matrix transformations and filtering criteria of the corpus have major influence in the quality of the result and that the combination of adequate individual parameter values does not necessarily produce the best combination. Furthermore, some general guidance for parameterization of the LSA technique for bug localization could also be
suggested from the observed results.