The origin and early spread of SARS-CoV-2 remains shrouded in mystery. Here I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH’s Sequence Read Archive. I recover the deleted files from the Google Cloud, and reconstruct partial sequences of 13 early epidemic viruses. Phylogenetic analysis of these sequences in the context of carefully annotated existing data suggests that the Huanan Seafood Market sequences that are the focus of the joint WHO-China report are not fully representative of the viruses in Wuhan early in the epidemic. Instead, the progenitor of known SARS-CoV-2 sequences likely contained three mutations relative to the market viruses that made it more similar to SARS-CoV-2’s bat coronavirus relatives.
Who deleted it.
From The Washington Post
State Department cables warned of safety issues at Wuhan lab studying bat coronaviruses
By Josh Rogin
Two years before the novel coronavirus pandemic upended the world, U.S. Embassy officials visited a Chinese research facility in the city of Wuhan several times and sent two official warnings back to Washington about inadequate safety at the lab, which was conducting risky studies on coronaviruses from bats. The cables have fueled discussions inside the U.S. government about whether this or another Wuhan lab was the source of the virus — even though conclusive proof has yet to emerge…
The origin story is not just about blame. It’s crucial to understanding how the novel coronavirus pandemic started because that informs how to prevent the next one. The Chinese government must be transparent and answer the questions about the Wuhan labs because they are vital to our scientific understanding of the virus, said Xiao Qiang, a research scientist at the School of Information at the University of California at Berkeley.
We don’t know whether the novel coronavirus originated in the Wuhan lab, but the cable pointed to the danger there and increases the impetus to find out, he said.