Extract unique sequences


Given a fasta file and groups file, removes identical sequences from the fasta file.




Many sequences are identical and it would be computationally wasteful to align the same sequence many times later. It is therefore better to keep only one representative sequence in the fasta file, and keep track of how many sequences it represents and store this info in a count_table file. Alternatively we could list the names of each represented sequence, but this names file would be very large as sequence names are long.

This tool is based on the Unique.seqs and Count.seqs commands of the Mothur package.


The analysis output consists of the following:


Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.