Hello,

just to be sure about this, how do you calculate the edit distance?

Our guess is that you consider each of the following variations...

- mismatches
- each insertion/deletion independent of the length
- split-reads

with an edit distance of 1? That's actually how we would like to have it, though its borderline for the split-reads.

Thanks,
Matthias & Daniela

On 05.07.2012 02:18, Tuuli Lappalainen wrote:
Hello,

In my opinion we need have a filter for maximum number of mismatches (as well as MAPQ>150) when we want to have well-mapped reliable reads - if a read has lots of mismatches, I wouldn't trust it even if the other matches were even worse. But you're right that 3 or 4 is too stringent, I was thinking of the 75 bps and not the total of 150.

I'd say that we keep reads with <=6 mismatches according to the NM flag. If no one objects by Thursday noon, I'll proceed with this. I'll provide a script for filtering, and upload a filtered set of bam files to the ftp site - you can do whatever is easier for you.

best regards,
Tuuli

Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen@unige.ch

-- 
Matthias Barann
Institute of Clinical Molecular Biology
Christian Albrechts University Kiel
Schittenhelmstr. 12
D-24105 Kiel, Germany

m.barann@ikmb.uni-kiel.de
+49 - (0)431 - 597 8681 (office)