Remove redundant sequences


Remove redundant sequences from an input set

This tool use programscd-hit(proteins) and cd-hit-est(nucleotides) to remove redundancy from an input sequence set. User should define the input sequence set type (nucleotide or protein) and a threshold of sequence identity (e.g. 90%).

The input sequences must be in fasta format. Note that this tool reads just one input file, that should contain all the sequences to be analyzed.

This tool produces two output files:

For more details, please check the manual pages of cd-hit command.