Contig cluster assignment varying cd-hit sequence identity from 0.95 - 1.00 and length cutoff from 0-0.8 Care! The cluster ids are not unqiue! Only the cluster id-len_diff_cutoff-ident_cutoff combination is unique. See the code for how this is managed in practice. See manuscript for further details
A tidy data frame with 6426 rows and 8 variables:
Cluster id from cd-hit
Sequence id within cluster from cd-hit
Length of sequence, bases
Contid unique identfier
sample ID
Length difference cutoff (-s in cd-hit)
Sequence identity cutoff (-c in cd hit)