Dear all,
please find attached a very basic QC for the 5 common samples plus an
extra one (NA20524). We have not been able to fully test the pipeline in
order to deliver this now, but it should be fine.
We have seen different length profiles on the samples we have sequenced
and 4 of the common ones have a more or less similar distribution. You
will see that the extra one has very few reads and a different profile
(mode at 15bp, also seen in NA20527).
Workflow:
1) Trimming -> using far (fast adaptor removal) with a minimum of 6
bases overlap and a maximum mismatch of 2 bases every 10. We have done a
very harsh trimming for this test; we can probably make it more strict
and with several rounds.
2) Selection of uniquely mapping reads (mapping done with GEM+Bfast).
3) Generation of mature miRNA coordinates with tool from
http://cm.jefferson.edu/downloadables/
4) Reciprocal comparison (unique reads VS mature miRNA) with bedtools
(only miRNA counts are attached here)
The sequence that can clearly be read
(CGCGACCTCAGATCAGACGTGGCGACCCGCTGA) has a "perfect" match with megablast
on human genome.
Best wishes,
Sergi
---------------------------------
Sergi Beltran Agulló
Bioinformatics Analysis Group
CNAG - National Center for Genomic Analysis
Parc Científic Barcelona - Torre I
Baldiri i Reixac 4, 2a p.
08028 Barcelona
(+34)934033748
sbeltrana(a)pcb.ub.cat
www.cnag.cat
---------------------------------