Hi all,
very quickly because at the moment I am in London with an intermittent connection.
The quality scores are computed as follows: for each read,
HWI-ST661:153:D0FTJACXX:8:2201:16410:93092 163 chr1 14704 180 75M = 14770 -134 CCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAGAGGAGGGATGGAGGCTGACACGCGGG CCCFFFFFHHHGHJJJIGIJIIIIIJIJHIIJJJB?BFHG7@F@AB:9=(6=(;>B29<@############### RG:Z:0 NM:i:7 XT:A:U md:Z:62T12 XA:Z:chr15,-102516387,75M,8;chr9,+14815,75M,10;chr16,+64389,19M1I1M2I52M,10;chr2,-114356235,75M,11;has an overall quality score of 180. This is due to the fact that there are several mappings, but the best one (the one described in the regular SAM record), to chr1:14704/chr1:14770 is anyway better than the second best one (the one to chr15,-102516387/chr15,+102516328 described in the XA optional field). In fact, the mapping to chr1 has 7 mismatches (following GEM's scoring system), while the mapping to chr15 has 8 mismatches. So: in absolute terms all mappings are bad because the second end does not map well to any position, but still one can say that the mapping to chr1 is better than the other ones, and this is why it gets a high relative score.
HWI-ST661:153:D0FTJACXX:8:2201:16410:93092 83 chr1 14770 180 39M1D22M1I3M2I1M1I2M4S = 14704 134 ACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCAGGTCCTTTCCCAGAGATGCCTTGGCTCGTGGCTGT 5@
9DDDDDDDDDC@BDDDDDDFHHIIIIJIHFJJJJIIJIJIJJIJJJJJJJJJJIIIJJJJHHHHHFFDDA11B RG:Z:0 NM:i:7 XT:A:U md:Z:(4)2>1-1>2-T2>1-22>1+39 XA:Z:chr15,+102516328,1S1M2I4M3I1M1I1M1I24M1D36M,8;chr9,-14881,39M1D22M1I3M2
I1M1I2M4S,10;chr16,-64452,39M1D22M1I3M2I1M1I2M4S,10;chr2,+114356176,1S1M2I4M3I1M1I1M1I24M1D36M,11;
Hi Matthias and others,
I noticed the read length discordance myself very recently, and was going to bring it up. This is really unfortunate - people processing the data in Barcelona didn't follow the explicit instructions of submitting 75 bp data. This is something that I should have checked though, and I thought I had, but apparently I didn't for the mRNA data. However, I don't think this will really affect things. The Barcelona samples don't show worse mapping stats, and I definitely don't want to remap these samples without a very strong reason.
Regarding the GEM mapping quality scores, Paolo, Thasso and Micha, can you please comment on this? Matthias and Daniela, what do you mean by referring to the example reads as being well or badly mapped? The classifications below is the information I have from Paolo, but I don't know the algorithms that they use to calculate the MAPQ. In my understanding the qualities are calculated independently for the two mates. I'm using reads with MAPQ >150, i.e. reads in categories 1 and 2 below. It depends on the analysis whether I require both mates to have this quality (e.g. exon quantifications) or if I can just use a single good-quality mate regardless of the quality of its mate (e.g. ASE).
- Matches which are unique, and do not have any subdominant match:
251 >= MAPQ >= 255, XT=U
- Matches which are unique, and have subdominant matches but a different score:
175 >= MAPQ >= 181, XT=U- Matches which are putatively unique (not unique, but distinguishable by score):
119 >= MAPQ >= 127, XT=U- Matches which are a perfect tie:
78 >= MAPQ >= 90, XT=R.
best regards,
Tuuli
Tuuli Lappalainen, PhD Department of Genetic Medicine and Development University of Geneva Medical School CMU / Rue Michel-Servet 1 1211 Geneva 4 Switzerland Tel. +41-(0)22-3795550 tuuli.lappalainen@unige.chOn 7/4/12 3:24 PM, Matthias Barann wrote:
Dear all,
we noticed some inconsistencies in the read lengths of the GEM mappings (I didn't check the BWA mappings, but it's probably the same).
Some samples appear to have up to 76 bp matched, while other samples only have 75 bp. In regard to comparability this might cause some (probably very little) differences between the samples.
i.e.
NA20589.1.M_111124_3.bam has 75 bp
NA20812.2.M_111216_6.bam has 76 bp
NA20760.3.M_120202_5.bam has 75 bp
NA20783.4.M_120208_6.bam has 75 bp
NA20768.5.M_120131_1.bam has 75 bp
NA20798.6.M_120119_6.bam has 75 bp
NA20803.7.M_120219_1.bam has 75 bp
We checked some more files for institute 2, and they seem to have generally 1 bp more than the others.
We're also a bit lost regarding GEM's quality score. We would like to filter reads for good mapping quality, we're just not sure how to accomplish this.
Below are two read pairs as an example.
The fist pair has a mapping quality of 180. The second read however was terribly aligned.
The second pair has only a quality of 99, while the reads mapped much better (at least I would say so).
We're not sure what's the reason for this.
Curiously, both reads of a pair get the same mapping quality. This could be by intention (i.e. it's always the pair quality rather than the read quality) or, which could also explain the quality differences for the reads, both reads of the pair get the quality of the first mapped read.
We're thankful for any suggestions on how to filter for 'good' mappings.
HWI-ST661:153:D0FTJACXX:8:2201:16410:93092 163 chr1 14704 180 75M = 14770 -134 CCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAGAGGAGGGATGGAGGCTGACACGCGGG CCCFFFFFHHHGHJJJIGIJIIIIIJIJHIIJJJB?BFHG7@F@AB:9=(6=(;>B29<@############### RG:Z:0 NM:i:7 XT:A:U md:Z:62T12 XA:Z:chr15,-102516387,75M,8;chr9,+14815,75M,10;chr16,+64389,19M1I1M2I52M,10;chr2,-114356235,75M,11;
HWI-ST661:153:D0FTJACXX:8:2201:16410:93092 83 chr1 14770 180 39M1D22M1I3M2I1M1I2M4S = 14704 134 ACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCAGGTCCTTTCCCAGAGATGCCTTGGCTCGTGGCTGT 5@
9DDDDDDDDDC@BDDDDDDFHHIIIIJIHFJJJJIIJIJIJJIJJJJJJJJJJIIIJJJJHHHHHFFDDA11B RG:Z:0 NM:i:7 XT:A:U md:Z:(4)2>1-1>2-T2>1-22>1+39 XA:Z:chr15,+102516328,1S1M2I4M3I1M1I1M1I24M1D36M,8;chr9,-14881,39M1D22M1I3M2
I1M1I2M4S,10;chr16,-64452,39M1D22M1I3M2I1M1I2M4S,10;chr2,+114356176,1S1M2I4M3I1M1I1M1I24M1D36M,11;
HWI-ST661:153:D0FTJACXX:8:2201:6270:52066 99 chr1 14582 99 1M2I71M1S = 14665 158 CCCTGGTTCCGTCACCCCCTCCCAGGGAAGCAGGTCTGAGCAGCTTGTCCTGGCTGTGTCAATGTCAGAGCAACA @11ADFFFHHHHHIIJJJJIJJIJJJJHHJJIJJHIJJJIIGFGIIJIIJF@@EGDAE>?)).7?BDFFDCCCB@ RG:Z:0 NM:i:8 XT:A:U md:Z:1>2-21A5T29C13(1) XA:Z:chrY,-59358087,75M,4;chrX,-155255081,75M,4;chr9,+14691,75M,7;chr2,-114356359,75M,7;chr16,+64266,75M,8;chr12,-90971,75M,8;
HWI-ST661:153:D0FTJACXX:8:2201:6270:52066 147 chr1 14665 99 75M = 14582 -158 GGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTAGGATTCCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGTG <B@A<9;>@CAAADDDDCCC>CC=?;>FHCGGEGIGJJIIGGIHHIIIIJIIJIJIGJJIJIHHFDDFFFFDB@; RG:Z:0 NM:i:8 XT:A:U md:Z:AG38G34 XA:Z:chrY,+59358002,75M,4;chrX,+155254996,75M,4;chr9,-14776,75M,7;chr2,+114356274,75M,7;chr16,-64350,58M1I1M2I11M1D2M,8;chr12,+90885,2M1D73M,8;
best wishes,
Matthias & Daniela
-- Matthias Barann Institute of Clinical Molecular Biology Christian Albrechts University Kiel Schittenhelmstr. 12 D-24105 Kiel, Germany m.barann@ikmb.uni-kiel.de +49 - (0)431 - 597 8681 (office)
_______________________________________________ Geuvadis_rna_analysis mailing list Geuvadis_rna_analysis@lists.crg.es http://davinci.crg.es/mailman/listinfo/geuvadis_rna_analysis