This tool calculates several quality control statistics for reads using the PRINSEQ package. Please note that if your file is larger than 4 GB, we recommend that you submit only a sample of reads for the quality statistics analysis, because PRINSEQ uses a lot of memory when producing the html report and might fail with bigger files. You can use the tool "Utilities / Make a subset of FASTQ" for this.
The statistics are calculated using the PRINSEQ option: -stats_all. The input data can be in FASTQ or FASTA format
This tool produces a comprehensive quality report reads-stats.html containing many useful plots. For viewing this file, please select the visualization method "Open in external web browser". In addition, a table reads-stats.tsv is produced, containing following information:
stats_dinuc aatt | Dinucleotide odds ratio for AA/TT. |
stats_dinuc acgt | Dinucleotide odds ratio for AC/GT. |
stats_dinuc agct | Dinucleotide odds ratio for AG/CT. |
stats_dinuc at | Dinucleotide odds ratio for AT. |
stats_dinuc catg | Dinucleotide odds ratio for CA/TG. |
stats_dinuc ccgg | Dinucleotide odds ratio for CC/GG. |
stats_dinuc cg | Dinucleotide odds ratio for CG. |
stats_dinuc gatc | Dinucleotide odds ratio for GA/TC. |
stats_dinuc gc | Dinucleotide odds ratio for GC. |
stats_dinuc ta | Dinucleotide odds ratio for TA. |
stats_dupl 3 | The number of 3' duplicates. |
stats_dupl 3maxd | |
stats_dupl 5 | The number of 5' duplicates. |
stats_dupl 5maxd | |
stats_dupl exact | The number of exact duplicates. |
stats_dupl exactmaxd | |
stats_dupl exactrevcomp | Number of exact duplicates with reverse complements. |
stats_dupl exactrevcompmaxd | |
stats_dupl revcomp | Number of 5'/3' duplicates with reverse complements. |
stats_dupl revcompmaxd | |
stats_dupl total | Total number of duplicates. |
stats_info bases | Total number of bases in the input file. |
stats_info reads | Number of reads in the input file. |
stats_len max 101 | Length of the longest read. |
stats_len mean | Mean length of the reads. |
stats_len median | Median of the read lengths. |
stats_len min | Length of the shortest read. |
stats_len mode | Mode of the read lengths. |
stats_len modeval | Number of mode length sequences. |
stats_len range | Range of the sequence lengths. |
stats_len stddev | Standard deviation of the read lengths. |
stats_ns maxn | Maximum number of Ns in one read. |
stats_ns maxp | The maximum percentage of Ns per read. |
stats_ns seqswithn | Number of reads with ambiguous base N. |
stats_tag midnum | The number of predefined MIDs. |
stats_tag prob3 | The probability of a tag sequence at the 3'-end (in percentage). |
stats_tag prob5 | The probability of a tag sequence at the 5'-end (in percentage). |