AbstractNatural microbial communities are phylogenetically and metabolically diverse. In addition to underexplored organismal groups1, this diversity encompasses a rich discovery potential for ecologically and biotechnologically relevant enzymes and biochemical compounds2,3. However, studying this diversity to identify genomic pathways for the synthesis of such compounds4 and assigning them to their respective hosts remains challenging. The biosynthetic potential of microorganisms in the open ocean remains largely uncharted owing to limitations in the analysis of genome-resolved data at the global scale. Here we investigated the diversity and novelty of biosynthetic gene clusters in the ocean by integrating around 10,000 microbial genomes from cultivated and single cells with more than 25,000 newly reconstructed draft genomes from more than 1,000 seawater samples. These efforts revealed approximately 40,000 putative mostly new biosynthetic gene clusters, several of which were found in previously unsuspected phylogenetic groups. Among these groups, we identified a lineage rich in biosynthetic gene clusters (‘Candidatus Eudoremicrobiaceae’) that belongs to an uncultivated bacterial phylum and includes some of the most biosynthetically diverse microorganisms in this environment. From these, we characterized the phospeptin and pythonamide pathways, revealing cases of unusual bioactive compound structure and enzymology, respectively. Together, this research demonstrates how microbiomics-driven strategies can enable the investigation of previously undescribed enzymes and natural products in underexplored microbial groups and environments.
SummaryMicrobes are phylogenetically and metabolically diverse. Yet capturing this diversity, assigning functions to host organisms and exploring the biosynthetic potential in natural environments remains challenging. We reconstructed >25,000 draft genomes, including from >2,500 uncharacterized species, from globally-distributed ocean microbial communities, and combined them with ∼10,000 genomes from cultivated and single cells. Mining this resource revealed ∼40,000 putative biosynthetic gene clusters (BGCs), many from unknown phylogenetic groups. Among these, we discovered Candidatus Eudoremicrobiaceae as one of the most biosynthetically diverse microbes detected to date. Discrete transcriptional states structuring natural populations were associated with a potentially niche-partitioning role for BGC products. Together with the characterization of the first Eudoremicrobiaceae natural product, this study demonstrates how microbiomics enables prospecting for candidate bioactive compounds in underexplored microbes and environments.