Chinese researchers appear to have deleted important data from a global database operated by the National Institutes of Health that could provide key insights into the origins of the COVID-19 pandemic, a preprint study claims.
An American scientist recovered the deleted data from cloud storage and published his analysis Tuesday. The paper, "Recovery of deleted deep sequencing data sheds more light on the early Wuhan SARS-CoV-2 epidemic," suggests that early virus samples from the Wuhan seafood market that until now have been the focus of most studies on the origins of the pandemic "are not fully representative of the viruses actually present in Wuhan at that time."
The paper is not yet peer-reviewed, and its findings should not yet be considered conclusive. The recovered virus samples do not support either the "lab leak" hypothesis or the "natural origins" hypothesis of the origins of SARS-CoV-2, according to scientists who have examined the paper. But these scientists say it does suggest the virus was spreading in Wuhan earlier than the Chinese government claimed, and the paper's author, Dr. Jesse Bloom, says his findings should reinforce skepticism that China has fully shared all relevant data on COVID-19.
Bloom, an influenza virus expert at the Fred Hutchinson Cancer Research Center, also says his study should be a cause for hope that scientists can recover additional information about the early spread of SARS-CoV-2 without an international investigation.
In the course of his research into SARS-CoV-2, Bloom read a paper that analyzed data from a project by Wuhan University that sequenced 45 positive coronavirus cases from January and early February 2020. The Chinese study, which developed an improved technique to test for and diagnose COVID-19 cases, was peer-reviewed and published in June 2020.
The SARS-CoV-2 sequences obtained by the Chinese researchers were uploaded to the NIH's Sequence Read Archive (SRA), a database for storing what are essentially maps of how viruses are built. These sequences can help scientists study how a virus originated and evolved over time, and such study may lead to knowledge that can prevent the next pandemic.
But when Bloom went to the SRA to examine the Chinese sequences, he found the data had been deleted. He explained in his paper that the SRA "is designed as a permanent archive of deep sequencing data." The only circumstances under which data can be removed is if the original researchers make an email request to have it deleted, provide reasons for doing so, and have that request approved by SRA staff.
A spokesperson for the NIH told the Telegraph that the NIH had "reviewed the submitting investigator's request to withdraw the data" in June 2020 and subsequently removed it.
"The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues," the spokesperson said. "Submitting investigators hold the rights to their data and can request withdrawal of the data."
Bloom attempted to contact the Wuhan University researchers asking why they requested the data be deleted but did not receive a response. He noted in his paper that "there is no plausible scientific reason for the deletion" and suggested "it therefore seems likely the sequences were deleted to obscure their existence."
Fortunately, he was able to recover some of the data from the Google Cloud, obtaining 34 early positive COVID-19 samples, and he was able to reconstruct partial viral sequences from 13 of them.
In a Twitter thread about his paper, Bloom explained why these sequences are crucial for understanding the origins of the virus.
"Although events that led to emergence of #SARSCoV2 in Wuhan are unclear (zoonosis vs lab accident), everyone agrees deep ancestors are coronaviruses from bats," Bloom said.
"Therefore, we'd expect the first #SARSCoV2 sequences would be more similar to bat coronaviruses, and as #SARSCoV2 continued to evolve it would become more divergent from these ancestors. But that is *not* the case!" he continued.
"Instead, early Huanan Seafood Market #SARSCoV2 viruses are more different from bat coronaviruses than #SARSCoV2 viruses collected later in China and even other countries."