This tool calculates several quality control statistics for reads using the PRINSEQ package. Please note that if your file is larger than 4 GB, we recommend that you submit only a sample of reads for the quality statistics analysis, because PRINSEQ uses a lot of memory when producing the html report and might fail with bigger files. You can use the tool "Utilities / Make a subset of FASTQ" for this.

The statistics are calculated using the PRINSEQ option: *-stats_all*. The input data can be in FASTQ or FASTA format

This tool produces a comprehensive quality report reads-stats.html containing many useful plots. For viewing this file, please select the visualization method "Open in external web browser". In addition, a table reads-stats.tsv is produced, containing following information:

stats_dinuc aatt | Dinucleotide odds ratio for AA/TT. |

stats_dinuc acgt | Dinucleotide odds ratio for AC/GT. |

stats_dinuc agct | Dinucleotide odds ratio for AG/CT. |

stats_dinuc at | Dinucleotide odds ratio for AT. |

stats_dinuc catg | Dinucleotide odds ratio for CA/TG. |

stats_dinuc ccgg | Dinucleotide odds ratio for CC/GG. |

stats_dinuc cg | Dinucleotide odds ratio for CG. |

stats_dinuc gatc | Dinucleotide odds ratio for GA/TC. |

stats_dinuc gc | Dinucleotide odds ratio for GC. |

stats_dinuc ta | Dinucleotide odds ratio for TA. |

stats_dupl 3 | The number of 3' duplicates. |

stats_dupl 3maxd | |

stats_dupl 5 | The number of 5' duplicates. |

stats_dupl 5maxd | |

stats_dupl exact | The number of exact duplicates. |

stats_dupl exactmaxd | |

stats_dupl exactrevcomp | Number of exact duplicates with reverse complements. |

stats_dupl exactrevcompmaxd | |

stats_dupl revcomp | Number of 5'/3' duplicates with reverse complements. |

stats_dupl revcompmaxd | |

stats_dupl total | Total number of duplicates. |

stats_info bases | Total number of bases in the input file. |

stats_info reads | Number of reads in the input file. |

stats_len max 101 | Length of the longest read. |

stats_len mean | Mean length of the reads. |

stats_len median | Median of the read lengths. |

stats_len min | Length of the shortest read. |

stats_len mode | Mode of the read lengths. |

stats_len modeval | Number of mode length sequences. |

stats_len range | Range of the sequence lengths. |

stats_len stddev | Standard deviation of the read lengths. |

stats_ns maxn | Maximum number of Ns in one read. |

stats_ns maxp | The maximum percentage of Ns per read. |

stats_ns seqswithn | Number of reads with ambiguous base N. |

stats_tag midnum | The number of predefined MIDs. |

stats_tag prob3 | The probability of a tag sequence at the 3'-end (in percentage). |

stats_tag prob5 | The probability of a tag sequence at the 5'-end (in percentage). |