Advancing FAIR phylogenetics for health threats
The Prestige Paradox: Why High-Impact COVID-19 Research Failed the Phylogenetics Stress Test
During the COVID-19 pandemic, phylogenetic analysis became a household concept. Evolutionary trees guided public health interventions, tracked the spread of the virus, and monitored the emergence of new variants. But the demand for rapid answers created a severe tension between speed and methodological rigor.
In a newly published paper in Cladistics, my co-author Omkar Marne and I asked a critical question: did the urgency of the pandemic compromise the quality of viral phylogenetics?
To find out, we systematically reviewed 217 SARS-CoV-2 phylogenetic studies published between January 2020 and March 2025, spanning 121 different journals. The results were alarming.
Key Vulnerabilities in Pandemic-Era Phylogenetics
Our review uncovered persistent methodological shortcomings that undermine the reproducibility and reliability of published findings. Five years after initial warnings were raised about weak phylogenetic practices in coronavirus research, the field has not course-corrected.
Here is what we found:
- Missing Outgroups: 185 out of the 217 studies omitted an outgroup. Without an outgroup, evolutionary direction cannot be reliably determined, which directly undermines inferences about viral origins and transmission directionality.
- Faulty Data Annotation: Despite 199 studies using whole-genome datasets, only 29 incorporated gene annotation before alignment. Aligning whole genomes without prior annotation can erroneously align nucleotides from completely different genes, producing spurious homology statements.
- Misinterpreted Branch Statistics: 164 studies relied predominantly on the Bootstrap metric. Bootstrap measures clade repeatability under resampling, not the actual probability that a clade is correct.
- A Systemic Failure in Open Science: Only 9% (20 of 217 studies) fully met FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. When researchers fail to deposit data matrices, alignments, and analytical scripts, their work cannot be independently verified or extended by the scientific community.
The “Prestige Paradox”
Perhaps the most concerning finding is what we call the “prestige paradox”.
We tested whether journal prestige was associated with methodological rigor. Our statistical analysis revealed no significant association between a journal’s impact factor and its adherence to methodological standards. Studies that skipped gene annotation, omitted outgroups, or failed to share their data appeared with equal or greater frequency in the highest-impact journals.
In short: the most visible studies in the most prestigious venues are not necessarily the most methodologically transparent. This structural fragility represents a latent public health liability for the next outbreak.
Guidelines for Future Outbreaks
These shortcomings are widespread, but they are not inevitable. Fixing them does not require new software or complex new methods; it requires explicit standards and the will to enforce them.
To close this gap, our paper provides a straightforward, actionable checklist distributed across authors, reviewers, and editors.
By requiring researchers to document their methodological decisions—such as outgroup justification, gene annotation procedures, exact software parameters, and FAIR data deposition—we can streamline peer review and ensure that published analyses remain a trustworthy resource for future health threats.
Previous post
Uncovering the Hidden Switches in Dengue Virus Evolution