Machine learning enables scalable and systematic hierarchical virus taxonomy.

Bolduc B, Zablocki O, Turner D, Bin Jang H, Guo J, Adriaenssens EM, Dutilh BE, Sullivan MB 2025 Machine learning enables scalable and systematic hierarchical virus taxonomy. Nat Biotechnol ,

Abstract

Although virus ecogenomics has expanded access to and understanding of the virosphere, existing classification tools lack taxonomic resolution and are unable to scale to modern discovery-based datasets or classify previously unknown sequence space. Here we develop vConTACT3-a machine learning-based tool that improves scalability and accuracy of virus taxonomy. By optimizing gene-sharing thresholds and leveraging adaptive, realm-specific cut-offs, vConTACT3 expands classification to both eukaryote and prokaryote viruses for four of the six officially recognized realms, and establishes accurate hierarchical taxonomy from genus to order. Specifically, vConTACT3 achieves >95% agreement with official taxonomy for 35,545 and 13,524 public prokaryotic and eukaryotic virus genomes, respectively, to surpass vConTACT2 across most realms, while still uniquely classifying previously uncharacterized taxa, and doing so even faster. vConTACT3 application provides taxonomy assignments for tens of thousands of unclassified taxa rapidly, automatically and systematically; evaluates virus sequence space to reveal support for fewer taxonomic ranks than currently available and identifies taxonomically challenging areas across the virosphere.

Links