In order to extract mapped read pairs we show here two alternatives. Alternative 1 gives 242627 read pairs, and Alternative 2 gives 344381 read pairs. The latter data set was used in the examples of Chapter 5. In both cases, in order to use samtools for extracting chromosome 18 reads, we must first index the BAM file $ samtools index wgEncodeCaltechRnaSeqH1hescR2x75Il200AlignsRep1V2.bam ################################### Alternative 1: 242627 read pairs ################################### This is based on using only samtools. We utilize flag information (2nd column of SAM/BAM file) SAM flag 66 = 2+64 means 002: read mapping in proper pair 064: first in pair SAM flag 130 = 2+128 means 002: read mapping in proper pair 128: second in pair More information about SAM flags can be found e.g. in https://broadinstitute.github.io/picard/explain-flags.html In our BAM file read 1 is missing from read pair 42YT6AAXX_HWUSI-EAS627_1:3:89:843:1303 (contrary what SAM flag says) so we have to skip also read 2 from this read pair (this is done with grep -v in the second command below) $ samtools view -hf 66 wgEncodeCaltechRnaSeqH1hescR2x75Il200AlignsRep1V2.bam chr18 | samtools bam2fq - > chr18_1.fq $ samtools view -hf 130 wgEncodeCaltechRnaSeqH1hescR2x75Il200AlignsRep1V2.bam chr18 | grep -v 42YT6AAXX_HWUSI-EAS627_1:3:89:843:1303 | samtools bam2fq - > chr18_2.fq The word "proper" in "read mapped in proper pair" is strict so this gives 242627 read pairs which is less compared to the second alternative. ################################### Alternative 2: 344381 read pairs ################################### Here we utilize Perl script bam2pefq.pl which is given below. The command for creating the files chr18_1.fq and chr18_2.fq is $ samtools view wgEncodeCaltechRnaSeqH1hescR2x75Il200AlignsRep1V2.bam chr18 | perl bam2pefq.pl chr18 #### Perl script bam2pefq.pl ###### #!/usr/bin/perl $obase = shift; while (<>) { ($id,$flag,$chr,$pos,$mapQ,$cigar,$pe,$pairpos,$foo,$seq,$qv)=split; if ($pe eq '=') { $binflag = reverse(sprintf("%b",$flag)); if (substr($binflag,6,1) eq '1') { $rd1{$id} = "$seq\n+\n$qv\n"; } elsif (substr($binflag,7,1) eq '1') { $seq =~ tr/ACGT/TGCA/; $seq = reverse($seq); $qv = reverse($qv); $rd2{$id} = "$seq\n+\n$qv\n"; } } } $ofile1 = $obase . "_1.fq"; $ofile2 = $obase . "_2.fq"; open (FD1,">$ofile1") or die "ERROR: cannot open '$ofile1' for writing.$!\n"; open (FD2,">$ofile2") or die "ERROR: cannot open '$ofile2' for writing.$!\n"; while (($id,$val) = each(%rd1)) { if (exists($rd2{$id})) { print FD1 "\@$id/1\n$val"; print FD2 "\@$id/2\n$rd2{$id}"; } } close(FD1); close(FD2); ###################################