Scientific metrics on bibliometric studies: detection of outliers for univariate data
DOI:
https://doi.org/10.19132/1808-5245230.254-273Keywords:
Outliers. Exploratory Data Analysis. Asymmetry. Bibliometry. Univariate.Abstract
This study presents formulas for detection of outliers for univariate data, taking into consideration the positive as well as the negative asymmetry of data. This new formula is based on the Exploratory Data Analysis and is simulated through the comparison of the outcome of the Exploratory Data Analysis found in statistical text books and statistical software. However, only normal or Gaussian distribution, i.e., symmetric or slightly asymmetric values, are applied. Real data published in two scientific papers on metrics are used for the simulation. For moderate or strong positive (negative) asymmetries, the new formulation detects a lower (higher) quantity of superior outliers. It is important to take into account the existence of outliers in bibliometric data; it is recommended to quantify the influence of outliers in statistical calculation, such as mean and standard deviation.Downloads
References
ADIL, Iftikhar Hussain; IRSHAD, Ateeq ur Rehman. A modified approach for detection of outliers. Pakistan Journal of Statistics and Operation Research, Lahore, v. 11, n. 1, p. 91-102, Apr. 2015.
BANERJEE, Sharmila; IGLEWICZ, Boris. A simple univariate outlier identification procedure designed for large samples. Communications in Statistics: simulation and computation, New York, v. 36, n. 2, p. 249-263, Mar. 2007.
BARNETT, Vic; LEWIS, Toby. Outliers in statistical data. 3. ed. New York: John Wiley & Sons, 1994.
BENSMAN, Stephen J.; SMOLINSKY, Lawrence J.; PUDOVKIN, Alexander I. Mean citation rate per article in Mathematics journals: differences from the scientific model. Journal of the American Society for Information Science and Technology, New York, v. 61, n. 7, p. 1440-1463, July 2010.
BORNMANN, Lutz et al. Citation counts for research evaluation: Standards of good practice for analyzing bibliometric data and presenting and interpreting results. Ethics in Science and Environmental Politics, Oldendorf/Luhe, v. 8, p. 93-102, 2008. Disponível em: <http://www.int-res.com/articles/esep2008/8/e008p093.pdf>. Acesso em: 5 set. 2016.
BRANT, Rollin. Comparing classical and resistant outlier rules. Journal of the American Statistical Association, Boston, v. 85, n. 412, p. 1083-1090, Dec. 1990.
BRUFFAERTS, Christopher; VERARDI, Vincenzo; VERMANDELE, Catherine. A generalized boxplot for skewed and heavy-tailed distributions. Statistics and Probability Letters, Amsterdam, v. 95, p. 110-117, Dec. 2014.
CARLING, Kenneth. Resistant outlier rules and the non-Gaussian case. Computational statistics & Data Analysis, Amsterdam, v. 33, n. 3, p. 249-258, May. 2000.
CARTER, Nancy; SCHWERTMAN, Neil C.; KISER, Terry L. A comparison of two boxplot methods for detecting univariate outliers which adjust for sample size and asymmetry. Statistical Methodology, Amsterdam, v. 6, n. 6, p. 604-621, Nov. 2009.
DOVOEDO, Y. H.; CHAKRABORTI, S. Boxplot-based outlier detection for the location-scale family. Communications in Statistics – Simulation and Computation, New York, v. 44, n. 6, p. 1492-1513, Apr. 2015.
GLÄNZEL, Wolfgang; MOED, Henk. F. Thougts and facts on bibliometric indicators. Scientometrics, Dordrecht, v. 96, n. 1, p. 381-394, Jul. 2013.
HOAGLIN, David C.; IGLEWICZ, Boris. Fine-tuning some resistant rules for outlier labeling. Journal of the American Statistical Association, Boston, v. 82, n. 400, p. 1147-1149, Dec. 1987.
HOAGLIN, David C.; IGLEWICZ, Boris; TUKEY, John W. Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association, Boston, v. 81, n. 396, p. 991-999, Dec. 1986.
HUBERT, M.; VANDERVIEREN, E. An adjusted boxplot for skewed distributions. Computational Statistics & Data Analysis, Amsterdam, v. 52, n. 12, p. 5186-5201, aug. 2008.
KIMBER, A. C. Exploratory data analysis for possibly censored data from skewed distributions. Journal of the Royal Statistical Society. Series C (Applied Statistics), London, v. 39, n. 1, p. 21-30, Jan. 1990.
LIMA, Luís Fernando Maia Lima; MAROLDI, Alexandre Masson; SILVA, Dávilla Vieira Odízio da. Outlier(s) em cálculos bibliométricos: primeiras aproximações. Liinc em Revista, Rio de Janeiro, v. 9, n. 1, p. 257-268, maio 2013.
MUTZ, Rüdiger; DANIEL, Hans-Dieter. Skewed citation distributions and bias factors: solutions to two core problems with the journal impact factor. Journal of Informetrics, Amsterdam, v. 6, n. 2, p. 169-176, Apr. 2012.
SANTOS, Solange Maria dos. Perfil dos periódicos científicos de Ciências Sociais e Humanidades: mapeamento das características extrínsecas. 2010. 176 f. Dissertação (Mestrado em Ciência da Informação) – Escola de Comunicação e Artes, Universidade de São Paulo, São Paulo, 2010.
SCHWERTMAN, Neil C.; OWENS, Margaret Ann; ADNAN, Robiah. A simple more general boxplot method for identifying outliers. Computational Statistics & Data Analysis, Amsterdam, v. 47, n. 1, p. 165-174, Aug. 2004.
SCHWERTMAN, Neil C.; SILVA, Rapti de. Identifying outliers with sequencial fences. Computational Statistics & Data Analysis, Amsterdam, v. 51, n. 8, p. 3800-3810, May 2007.
SILVA, Dávilla Vieira Odízio da. Elementos bibliométricos das referências nas dissertações defendidas no Programa de Mestrado de Biologia Experimental (PGBIOEXP) na Universidade Federal de Rondônia (UNIR), entre 2003 a 2010. 2014. 51 f. Trabalho de Conclusão de Curso (Graduação) – Departamento de Ciência da Informação, Universidade Federal de Rondônia, Porto Velho, 2014.
SILVA, Ermes Medeiros da; et al. Estatística para os cursos de Economia, Administração, Ciências Contábeis. 2. ed. São Paulo: Saraiva, 1996. v. 1.
SIM, C. H.; GAN, F. F.; CHANG, T. C. Outlier labeling with boxplot procedures. Journal of the American Statistical Association, Boston, v. 100, n. 470, p. 642-652, Jun. 2005.
TRIOLA, Mario F. Introdução à Estatística. 10. ed. Rio de Janeiro: LTC, 2012.
TUKEY, John Wilder. Exploratory data analysis. Reading, Massachusetts: Addison-Wesley, 1977.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2017 Em Questão

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
Authors will keep their copyright and grant the journal with the right of first publication, the work licensed under License Creative Commons Attribution (CC BY 4.0), which allows for the sharing of work and the recognition of authorship.
Authors can take on additional contracts separately for non-exclusive distribution of the version of the work published in this journal, such as publishing in an institutional repository, acknowledging its initial publication in this journal.
The articles are open access and free. In accordance with the license, you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.