Utilities / Trim primers/adaptors

Description

Trim tags from sequences.

Parameters

Details

The sequence of the tag (i.e. primer/adapter) could occur not only at the sequence end, but also at any other position of the sequence. To assure that only tags are trimmed, the tag sequences can be defined to occur only at the ends allowing a certain number of variable bases with the "Trim within" parameter. If specified, the value has to be at least the number of bases in the tag sequence. A typical value would be about 1.5x the tag length. 0 value means the option is ignored.

When "Split fragment-to-fragment concatenations" option is selected, the tool removes tag contaminations inside the sequences and splits fragment-to-fragment concatenations into separate sequences. The "Allowed mismatches" value specifies the maximum number of allowed mismatches for the internal (concatenated) tag sequence(s). This feature should be used with caution for inputs with only a 5' or 3' tag sequence (likely splits too many false positive that naturally occur for single tags compared to much longer concatenated 5' and 3' tags). This option will cause a decrease in speed.

Input can be a FASTQ file or a FASTA file. If using a FASTA file you can provide an optional QUAL file to trim sequences and their quality scores.

Output

The output is a tab-delimited text file.

References

This tool uses the TagCleaner package. Please cite the article:

Schmieder R, Lim YW, Rohwer F, Edwards R, 2010. TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets. BMC Bioinformatics. 2010, 11:341.

Please see the TagCleaner homepage for more details.