Hello,
just to be sure about this, how do you calculate the edit distance?
Our guess is that you consider each of the following variations...
- mismatches
- each insertion/deletion *independent of the length*
- split-reads
with an edit distance of 1? That's actually how we would like to have
it, though its borderline for the split-reads.
Thanks,
Matthias & Daniela
On 05.07.2012 02:18, Tuuli Lappalainen wrote:
Hello,
In my opinion we need have a filter for maximum number of mismatches
(as well as MAPQ>150) when we want to have well-mapped reliable reads
- if a read has lots of mismatches, I wouldn't trust it even if the
other matches were even worse. But you're right that 3 or 4 is too
stringent, I was thinking of the 75 bps and not the total of 150.
I'd say that we keep reads with <=6 mismatches according to the NM
flag. If no one objects by Thursday noon, I'll proceed with this. I'll
provide a script for filtering, and upload a filtered set of bam files
to the ftp site - you can do whatever is easier for you.
best regards,
Tuuli
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
--
Matthias Barann
Institute of Clinical Molecular Biology
Christian Albrechts University Kiel
Schittenhelmstr. 12
D-24105 Kiel, Germany
m.barann(a)ikmb.uni-kiel.de
+49 - (0)431 - 597 8681 (office)