This Drop-seq tool does two things:
1) identifies and corrects bead synthesis errors, and
2) extracts the digital gene expression data from an aligned library.
In the bead synthesis step (step 1) the tool identifies cell barcodes with aberrant “fixed” UMI bases. If only the last UMI base is fixed as a T, the cell barcode is corrected (the last base is trimmed off) and all cell barcodes with identical sequence at the first 11 bases are merged together. If any other UMI base is fixed, the reads with that cell barcode are discarded.
The tool asks the user to select a number of barcodes on which to perform the correction. In the original Drop-seq manual, the tool developers guide users to use roughly 2 times the anticipated number cells, as they have empirically found that this allows to recover nearly every defective cell barcode that corresponds to a STAMP (rather than an empty bead cell barcode).
This program reads in the BAM file, and looks at the distribution of bases at each position of all UMIs for a cell barcode. It detects unusual distributions of base frequency, where a base with >=80% frequency at any position is detected as an error. Barcodes with less than 25 total UMIs are ignored.
The tool also checks for PRIMER_MATCHes, where the UMI perfectly matches one of the PCR primers. These cell barcodes are dropped. These errors are only detected if a PRIMER_SEQUENCE argument is supplied as a parameter.
The selection of the sets of cells:
For more details, please check the Drop-seq manual.