VirJenDB: a FAIR (meta)data and bioinformatics platform for all viruses.

Saghaei S, Siemers M, Ossetek KL, Richter S, Edwards RA, Roux S, Zielezinski A, Dutilh BE, Marz M, Cassman NA 2025 VirJenDB: a FAIR (meta)data and bioinformatics platform for all viruses. Nucleic Acids Res ,

Abstract

High-throughput sequencing has generated an unprecedented volume of data. However, researcher-submitted data in repositories requires extensive curation and quality control for reuse. These tasks are hindered by the multiplicity of repositories, the sheer volume of the data, and the complexity of virus (meta)data curation. To address these challenges, VirJenDB offers a user-friendly platform to facilitate versioned, community-driven curation, and ontology development. Virus sequences were ingested from 16 sources, including ~200 fields of metadata or standards, covering taxonomy, sample, and host information. Up to 85 metadata fields have undergone at least one round of curation, and are linked to 15.4 million virus sequences, with 88 % from those infecting eukaryotes and the remaining infecting prokaryotes. Subsets were created, including a novel collection of 0.91 million viral operational taxonomic unit (vOTU) sequences across all viruses, while keeping the original sequences from each vOTU to facilitate downstream analyses, e.g. sequence variation. The VirJenDB web portal (https://www.virjendb.org) provides HTTPS and Application Programming Interface (API) access to the sequence datasets and metadata, offering a search engine, filtering, download, visualizations, and documentation. VirJenDB aims to connect the phage and eukaryotic virus research communities by supporting webtool integration, meta-analyses, and metadata schema extensions.

Links