Filters out columns from a fasta formatted sequence alignment. Removes newly formed identical sequences after that.
Removes overhangs at both ends of the alignment by removing columns where one or more sequences have '.' Removes also gaps in the alignment by removing columns where every character is '-'. These gap columns are not included in calculating distances because they have no information in them. Removing empty columns speeds up the distance calculation later on.
As removing columns can create new identical sequences, identical sequences are detected and removed after filtering. In addition to the FASTA file, you need to provide a count file so that the number of sequences is updated.
This tool is based on the Filter.seqs and Unique.seqs commands of the Mothur package.
The analysis output consists of the following:
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.