Retractions and post-retraction citations in the COVID-19 infodemic: is Academia spreading misinformation?

The speed in producing information and the rush to publish scientific articles on COVID-19 in several knowledge areas have resulted in what is known as an infodemic also in the scientific field, potentially producing inaccurate information and sources of misinformation at scholarly communication. This has led to some articles being retracted or withdrawn due to unintentional errors or deliberate misconduct, but they continue to be cited. This article (i) gives an overview of the COVID-19 retracted articles and preprints, and (ii) analyses a set of post-retraction citations in the context of the COVID-19 infodemic. We analyzed 56 retracted articles and preprints by using the list available in the section on “retracted coronavirus (COVID-19) papers” in the Retraction Watch (RW) webpage. We found that 64.3% of these retractions were articles published in journals, 33.9% were uploaded in preprints servers, and 1.8% conference papers. We also analyzed 162 eligible articles out of 612 records identified by using the Google Scholar search engine. This research found that an article from The Lancet continued to be cited even after being retracted. In this case, we identified 214 post-retraction citations, of which 38% were negative (n=81), 32% were neutral (n=69), and 30% were positive citations (n=64).


INTRODUCTORY REMARKS
In the ongoing pandemic of the coronavirus disease 2019 , the need for fasts responses from the scientific community to the catastrophic effects of the virus led researchers and publishers to speed up scientific publications and peer-review processes (Else 2020;Palayew et al. 2020). As of this writing, the World Health Organization (WHO) COVID-19 database 1 had already screened 247,100 records, including scientific articles, preprints, and reports around the world from various knowledge areas. The speed of the information production on the pandemic has resulted in an informational epidemic, known as infodemic, potentially producing inaccurate information and sources of misinformation at scholarly communication (Zarocostas 2020).
Furthermore, this rush could lead to problems in public health (Steen 2011), as well as define and redefine some countries' research agenda and investment. In this context, some articles have already been corrected, retracted, or withdrawn. If compared to the number of publications, the number of retractions is not substantial; nonetheless, because many of the retractions are widely reported in the media, it raises questions of trust in science.
In this scenario, one of the concerns in Academia is that despite being retracted due to research misconduct, some of these articles continue to be cited and in some cases in a positive light (Bar-Ilan, Halevi 2017; Teixeira da Silva, Bornemann-Cimenti 2017; Teixeira da Silva, Dobránszki 2017; Schneider et al. 2020;Santos-d'Amorim et al. 2021, in press). Indeed, the effect of persistent misinformation in scholarly communication through post-retraction citations should be a matter of concern. Considering that scientific knowledge must be built on solid foundations and considering Bornmann and Daniel's (2008) statement that, "[s]cientists have complex citing motives which, depending on the intellectual and practical environment, are variously socially constructed", it is important to understand the citing behavior in short-term postretraction citations in the current context (Moed et al. 1985;Bornmann, Daniel 2008).
Thus, this study addresses (i) an overview of the retracted COVID-19 articles and preprints and (ii) the post-retraction citations in the context of the COVID-19 infodemic. In this arena, researchers are committed to solving the most diverse problems caused by the new coronavirus. In addition to the research developed in the different sciences fields and the medical sphere, researchers from social, cognitive, computational, and information sciences have dedicated themselves to understanding informational problems associated with the social media dilemma and its potential of undermining the effort made by institutions that struggle to control the destructive effects of the virus on society (Marin 2020;Pennycook et al. 2020;Quinn, Fazel, Peters 2020;Stephens 2020).
On this matter, Bramstedt (2020, p. 804) states that "[r]esearch normally occurs at the speed of a marathon, but during a pandemic, the pace is more like a sprint". An example of this is illustrated by the COVID-19 Primer database (https://covid19primer.com/). Through its daily mapping of journals indexed by PubMed and preprints uploaded at bioRxiv, medRxiv, and arXiv servers, it offers a comprehensive view on the COVID-19 research. Figure 1 shows the increase in publications in the period between Jan 21, 2020 and Jan 15, 2021.
2 According the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, available at https://bit.ly/3aaJecn (accessed Apr 14, 2021). 3 As pointed out by Dr. Tedros Adhanom Ghebreyesus, World Health Organization Director-General, by saying that "we're not just fighting an epidemic; we're fighting an infodemic.", at Munich Security Conference (Feb 15, 2020). Available at: https://www.who.int/dg/speeches/detail/munich-security-conference (accessed Jan 6, 2021). 4 Available at: https://www.oed.com/view/Entry/88407009 (accessed Jan 7, 2021). With this in mind, we can say that this "sprint" to publish research results has led to an infodemic, allowing for research with potentially flawed methodologies, misconduct, and misinformation in general, to go public. Lazer (2020, p. 434) defined health misinformation as "information that is contrary to the epistemic consensus of the scientific community regarding a phenomenon". In this research, we show that retracted articles and preprints are potential misinformation sources because they publish results based on flawed research. Therefore, in this context, is Academia contributing to generating infodemic and misinformation? In this article, the concept of misinformation (i.e., misleading, or inaccurate information) is used in its broader sense because the focus is not on 'intent', given that articles are retracted or withdrawn due to both intentional misconduct and unintentional errors 6 .

DATA COLLECTION AND METHODS
For the first research objective, the data were recovered from the "retracted coronavirus (COVID-19) papers" section on Retraction Watch (RW) webpage (https://retractionwatch.com/retracted-coronavirus-covid-19-papers) accessed on 3 rd January 2021. Retractions due to journal error, retracted and reinstated and 5 For disinformation and malinformation disambiguations, see Fallis (2009Fallis ( , 2014, Karlova, Fisher (2013) and Wardle and Derakhshan (2018). 6 Research misconduct occurs when there is "an intentional (or deliberate) deviation from accepted norms of scientific behavior" while " [d]eviations that are unintended (or accidental) are regarded as honest error, not misconduct" (Resnik, Stewart 2011 p. 2). expressions of concern were not considered. Thus, 60 documents composed the corpus, and the following information was extracted and organized for analysis: 1) document typology; 2) reasons for retraction; 3) science field; 4) journal name and Journal Impact Factor (JIF), and

5) author's affiliation (Country).
Each document was manually checked and after this process, we detected that two articles included in the corpus were retracted due to editor error; and two articles have already been corrected and republished. Thus, we disregarded four articles at this stage. For this reason, 56 documents composed the final corpus investigated here.
For the second research objective, Figure 2 summarizes the data collection process.
First, we retrieved all documents that cited Mehra, Desai, Ruschitzka, and Patel 7 postretraction citations by using the Google Scholar search engine on October 11, 2020 and found 612 records. Source: research data (2020).

6/19
Second, for the sake of accuracy, we manually checked the receiving date of each article to separate the ones that were submitted before and after the retraction notice.
We analyze the context of the citations in the text, considering that the "citation context is defined by the words which are located around a specific citation" (Bornmann, Haunschild, Hug 2018). The classification of each citation was conducted using the categorization proposed by Bar-Ilan and Halevi (2017), that classify post retraction citations as:

1.
"Positive A positive citation indicates that the retracted article was cited as legitimate prior work and its findings used to corroborate the author/s current study.

2.
Negative A negative citations indicates that the authors mentioned the retracted article as such and its findings inappropriate.

3.
Neutral A neutral citation indicates that the retracted article was mentioned as a publication that appears in the literature and does not include judgement on its validity".

RESULTS AND DISCUSSION
This section is presented into two parts. Part I gives an overview on the retracted COVID-19 articles and preprints, and Part II presents an analysis of the post-retraction citations in the context of the COVID-19 infodemic.

Part I: an overview of the COVID-19 retracted articles and preprints
Within the scope of the COVID-19 research, Bramstedt (2020) found that 19 articles and 14 preprints were retracted or withdrawn up to July 2020. As of January 3, 2021, the Retraction Watch list had already traced 60 retracted documents. To offer an overview of the COVID-19 retracted articles and preprints, in addition to investigating the reasons for retraction, we also analyzed some characteristics of these documents, as detailed in Table 1. According to the data collected, we observed that 64.3% of these retractions were from articles published in journals and 33.9% were uploaded in preprint servers. Of all the retracted articles, 30.6% were published in journals with Impact Factor (JIF) between 2 to 5. After checking each retraction notice, of the 32 retracted documents (articles and preprints), we were only able to identify 11 reasons for the retractions (Table 2); in 24 of them (17 articles, 6 preprints, and one conference paper) the reasons were not disclosed.
Of the 17 articles with undisclosed reasons, 13 are part of the Elsevier conglomerate, which is reason for concern considering that they do not provide clear information regarding the retraction guidelines as recommended by the Committee on Publication Ethics (COPE) (Wager 2009). Their standard statement on the 13 cases is "[t]his article has been withdrawn at the request of the authors and the editors. The Publisher apologizes for any inconvenience this may cause". Lack of consent for an institutional review board (IRB) 1 "Serious scientific fraud" 1 "The wrong paper has been published due to some technical glitch.
The information pertaining in this paper is misleading the readers and creating massive conflicts amid the scientific community"

1
Besides the 17 incidences where no reason for the retraction was given, we found recurrent cases of duplicate publication -papers that have already been published in part or in full elsewhere -(n= 4), plagiarism (n= 3), raw data not made available to the authors/editor/referee/auditor (n= 3), and unreliable results (n= 2). The generic terms used in the retraction notes, such as, "serious scientific fraud" and "technical glitch", did not allow us to conduct a proper analysis.
Regarding the Journal Impact Factor (JIF), Figure 3 demonstrates that out of the 36 retractions, 14% (n= 5) of the articles had been published with a JIF under two, 31% (n= 11) between 2-5, 8% (n= 3) between 5-8, and 19% (n= 7) with a JIF higher than 8. We also note that 28 % (n= 10) of the retractions were from journals that were not indexed in the 2019 Journal Citation Report (JCR). These results differ from those found by Fang and Casadevall (2011)

when investigating the correlation between Journal Impact
Factor and retractions -what they called "retraction index" -where they suggest that "the probability that an article published in a higher-impact journal will be retracted is higher than that for an article published in a lower-impact journal" (p. 3856). However, the limited analysis corpus precludes drawing more precise conclusions.   Source: research data (2021). Figure 5 shows that articles written by three or more authors were responsible for 75% (n= 42) of the retractions, followed by the ones with one author, 16% (n= 9), and two authors 5% (n=3). Considering that 86% of these retractions were in the medical literature, including specialties, we can point out that even in normal circumstances (i.e., not in a pandemic situation), a growth in the number of authors per article can be observed in this field. This is explained by the complexity of multi-disciplinary and multiinstitutional collaboration . In 4% (n= 2) of the cases, it was not possible to have access to the documents. In addition to the identification of the percentage of retractions by the number of authors in each article, the author's affiliation was also considered in order to identify institutional collaboration between countries around the globe. We identified that, among the co-authored and multi-authored articles, 11% (n= 5) were from authors of the same institution and 81% (n= 38) from different institutions; 8% (n= 4) of the articles were not available for consultation. The highest incidence of interactions among countries was between China-USA (n= 4). Other interactions were between Italy-USA-Russia; Italy-USA; Australia-USA; Spain-Mozambique; Belgium-France-Switzerland-East Timor; and USA-Switzerland, among others, as shown in Figure 6. Regarding preprints, because they offer some advantages, such as accessibility, faster dissemination, easy feedback (Vlasschaert, Topf, Hiremath 2020), or credit for priority in science discovery (Merton 1957), at first glance, some researchers have been opting to upload their manuscripts on preprints servers. Although they are documents that have not been subject to peer scrutiny, preprints have been an important subject of discussion, the reason why we included them in this study. Table 3 summarizes the reasons for retraction/withdrawal of the preprints analyzed. We found eight different reasons. By examining these documents, we found that, as in the case of the research articles, the preprints with non-disclosed reasons (n= 6) are the most frequent, followed by "the authors' wish to update their database" reason (n= 3). Unlike the research articles, we found no retractions due to plagiarism or duplicate publication in preprints.
In a comment for Retraction Watch, Richard Sever, co-founder of bioRxiv and medRxiv preprint servers, clarifies that unless a preprint contains fraud or some ethical or legal violation, "[a] preprint is usually withdrawn only at an authors' request. A preprint server will not typically withdraw a flawed preprint against the wishes of the author since the server made no claim to have peer-reviewed and certified the scientific content in the first place" (Retraction… 2019 n.p.).
However, to curb the advancement of publication of potentially bad research, preprint servers have been taking measures such as in the case of bioRxiv. As reported by Kwon (2020), "[t]he biomedical repository would no longer accept manuscripts making predictions about treatments for COVID-19 solely on the basis of computational work", since, according to Richard Sever, co-founder of bioRxiv and medRxiv, numerous speculative papers have been published using computational models.

Part II: post-retraction citations in the COVID-19 infodemic context
The example of post-retraction citations we chose to analyze is from the article "Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis", published in the journal The Lancet on May 22, 2020. The results disclosed in this paper led the WHO to temporarily suspend the tests with these drugs (Offord 2020). It is very important to understand what happens when retracted articles are cited and whether the short-term post-retraction citations in research publications on the COVID-19 pandemic suffer the effect of misinformation.
Known as "The Surgisphere Scandal" (Offord 2020;Piller 2020), at the request of three of the four authors, this article was retracted on June 4, 2020, (http://doi.org/10.1016/S0140-6736(20)31180-6) because they could no longer guarantee the veracity of the primary sources of the data used to support the research.
According to the retraction notice, the authors justified their action by stating that one of the co-authors, who was also the founder of the company (Surgisphere Corporation, Chicago, IL), claimed not to be able to transfer the complete database, which would allow an independent review to ensure the integrity of the primary sources and the replicability of the analysis (Mehra, Ruschitzka, Patel 2020).
Chloroquine and Hydroxychloroquine were persistently used as a political device, being promoted as "miraculous" drugs. Both have long been used in the treatment for malaria and autoimmune diseases and presented in vitro potential effects in the treatment and prevention of infections caused by the new coronavirus. Thus, they were seen as promising drugs and as the fastest route in the treatment of COVID-19 infection (Lei et al. 2020;Mohamed, Rezaei 2020). However, their inefficacy for the treatment of COVID-19 has been confirmed (Cavalcanti et al. 2020).
In summary, the Mehra, Desai, Ruschitzka, and Patel retracted article, highlighted the non-evidence of Chloroquine and Hydroxychloroquine effectiveness, as well as associated their use with the risk of ventricular arrhythmias and the greater risk of inhospital death for patients with COVID-19.
Nonetheless, our investigation found that this article continued to be cited even after the retraction. We found 214 post-retraction citations in 162 articles, including multiple citations in some of them. By analyzing the citation using the Bar-Ilan and Halevi (2017) categorization, we found 38% negative citations (n= 81), 32% neutral citations (n= 69), and 30% positive citations (n= 64), as shown in Figure 7. In its concept, neutral citations do not include judgment of value. They act as information devices that indicate the existence of that article in literature, not contributing effectively to the misinformation effect. They are probably cited for social reasons, such as for having been published in a prestigious journal (see Shadish et al. they consider the work to be legitimate and are used to corroborate the cited study.

CONCLUDING REMARKS
For the sake of investigating the main characteristics of the retracted COVID-19 articles and preprints and examine the nature of a set of post-retraction citations in the context of the COVID-19 infodemic, by using a paper as a case study, we set out a methodological strategy based on information metrics studies and the in-text citation analysis technique. To enhance the understanding of the object, we analyze the reasons for retraction and the following characteristics present in the research corpus: document typology, science field, journal name, and Journal Impact Factor (JIF), author's affiliation (Country), and their respective collaborations through authorship analysis.
In summary, we analyzed 56 retracted articles and preprints under the COVID-19 research and found that 64.3% (n= 36) of these retractions were articles published in journals, 33.9% (n= 19) were uploaded in preprints servers, and 1.8% (n= 1) conference paper. Our analysis identified the prevalence of retraction in multi-authored articles characterized by the complexity of multidisciplinary and multi-institutional collaborations mainly in the field of medicine and its specialties. Based on the verifications presented here, we can conclude that this informational epidemic, known as infodemic, is also present in the scientific sphere.
Concerning about the reasons for retraction, the lack of transparency in the retraction notices is still a matter of concern. In 17 of the 36 analyzed articles, the reasons that led to the retraction are unknown, characterized by the text "Reason not disclosed". The same occurs for the preprints, with the recurrence of the term "reason not disclosed" in 6 of the 19 preprints retracted from the analyzed corpus.
We also analyzed 214 post-retraction citations in 162 articles. Of these, we found 81 negative citations (38%), 69 neutral citations (32%), and 64 positive citations (30%). The prevalence of the negative and neutral post-retraction citations can indicate that Academia is aware of the dynamics of scientific publications, they are not spreading misinformation, and in a sense, they are shielded from political interferences.
In conclusion, we emphasize that retraction is an efficient mechanism of the scientific record correction that flags flaws in the published research -due to unintentional errors or deliberate misconduct. It is also noteworthy that the rate of articles retracted can also serve as an indicator for measuring the misinformation phenomenon in science, enabling one to assess trends and mitigate current problems and at future scenarios. It is understood that given the need for fast responses from the scientific community to the catastrophic effects caused by the coronavirus disease 2019 (COVID-19), more efficient and effective peer review processes can prevent faulty articles from being published and avoiding, ab origine, the spread of misinformation.

FUNDING
This work was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) -Finance Code 001, and also by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

DATA AVAILABILITY
Our full dataset is available on Zenodo repository and can be downloaded on demand at https://doi.org/10.5281/zenodo.4695045