A 97% identity in 16S rRNA gene sequences is commonly used to group “”species-level”" phylotypes [1, 11, 12]. A 3% variation within a short hypervariable region of the small subunit (SSU) rRNA gene may not correlate exactly with a 3% variation along the entire SSU rRNA gene. In fact, the correlation between genetic differences may well
vary with different regions of the gene, and in different classes of organisms. However, most microbial diversity projects to date have used 3% OTUs [1, 13, 14], and to be consistent with other research using pyrosequencing sequences we have chosen to use 3% OTUs as well. We have also
clustered sequences into OTUs using more conservative genetic Rabusertib price differences of 6% and 10% (Table 1, Additional file 2, Additional file 3). In the further text however we refer only to OTUs at the 3% difference. These OTUs were grouped in 112 higher taxa (Additional file 4) consisting of 78 genera and 34 more inclusive taxa (e.g., family, order, class), Y-27632 in vivo representing eight bacterial phyla (Table 2). The size of the OTUs (number of reads per OTU) correlated significantly (p < 0.001; Spearman's rho 0.930) with the number of unique
sequences ML323 cost within an OTU (Figure 1), i.e., the most abundant OTUs harboured the highest counts of unique sequences. An obvious outlier was one abundant OTU (0.9% of all reads), classified as Fusobacterium which contained only three unique sequences. Six other abundant OTUs (1.4 – 6.7% of all reads) contained more than 140 (range 145 – 265) unique sequences each. Four of these OTUs were assigned to the genus Streptococcus (OTU stiripentol ID 803; 165; 230; 262), one to the genus Corynebacterium (ID 145), and one to the genus Neisseria (ID 637). Two-thirds of all OTUs contained a single sequence; however these were low abundance OTUs (5 – 49 reads), together contributing to just 0.7% of all reads (Figure 1, Additional file 1). Figure 1 The size of OTU clusters and the number of unique sequences per cluster. The number of reads within each OTU (sequences that clustered at 3% genetic distance level) and the number of unique sequences per OTU are plotted in the rank order of OTU cluster size (high to low).