Hello all,
We're having a Geuvadis RNAseq analysis group call today (July 26th) at
2pm.
Many people are on holidays, and we don't have a presentation scheduled
for today, but let's have at least a short call to catch up on any
analysis updates.
After that we'll continue to discuss the QC companion paper.
Call details are:
from outside spain; 0034917911859
from spain; 900800678
Access code; 3160100
best regards,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Hello all,
Here is a spreadsheet with information of which mRNA samples to use in
analysis.
Some numbers:
- Total number of samples 667 with 464 unique individuals
- 6 samples fail QC (QC_OK column == 0, with some description under QC_Info)
- Total number of unique individuals that pass QC is 462 (UseThisSample==1)
For individuals that have both the low-coverage and high-coverage
replicate passing QC, I've always chosen the primary high-coverage
sample. For the 5 replicates I took the one with the highest number of
well mapped reads.
I doubt that there will be a need to change the QC pass/fail status due
to any future QC discoveries. However, we can add some other QC-related
warnings in the QC_Info column of samples that pass QC.
best regards,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Dear all,
The functional annotations for all Geuvadis variants are now available.
Many thanks to everyone in the loss-of-function group for all the work
that it took to put this together.
The file for all 1000g variants:
/upload/geuvadis/wp4_rnaseq/main_project/external_data/genetic_variation/variant_info/ALL.phase1_release_v3.20101123.snps_indels_sv.sites.gdid.gdannot.v2.vcf.gz
.
The files of variants in Geuvadis data (genotype files and sites file)
have also been updated to include the new annotations:
/upload/geuvadis/wp4_rnaseq/main_project/external_data/genetic_variation/genotype/phase1_phase2/
The wiki is now read-only, so the documentation for this can be found in
the following googledoc.
https://docs.google.com/document/d/1wR0oyg01M-wfoWxjN7KX6c6UqqksbW_3to2_N7S…
best regards,
Tuuli
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
I'll also upload the file to http://jungle.unige.ch/~lappalainen/
<http://jungle.unige.ch/%7Elappalainen/>
The documentation is under the link below. Can you please take a look -
especially Monkol and Manny - and let me know by Thursday if everything
seems OK, and add/edit if necessary? I'll share this with the rest of
the analysis group after that.
https://docs.google.com/document/d/1wR0oyg01M-wfoWxjN7KX6c6UqqksbW_3to2_N7S…
cheers,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Hello all,
The latest updates on the Geuvadis quantification data front:
*Gene, exon and transcript quantifications
*- the raw quantification data files of all 667 samples are available as
before, unchanged
- I changed the library depth normalization from what I uploaded a few
days ago - please don't use the old files.
- the normalized files now include only the 662 samples that we'll use
in analysis
*Intron and junction quantifications
*- Micha's group has produced the quantifications: intron and junction
read counts, and fractions of intron coverage
- The available files are
- raw gtf files, one per individual
- tab-separated tables of the read counts and coverages: raw counts
in 667 samples, and library depth normalized counts in 662 samples
The directory is
/upload/geuvadis/wp4_rnaseq/main_project/analysis_data/quantification/ .
See the readmes for details.
A word about library depth normalization: before I normalized the exon
quantifications by the total number of exonic reads. However, this is
not a good correction for intronic reads, and for the sake of
consistency I wanted to normalize exon, intron and junction
quantifications by the same factor - thus, they are now all normalized
by the total number of well mapped reads (mapped, properly paired,
MAPQ>150, NM<=6).
Have a nice weekend,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Hello,
Here is the final list of Geuvadis RNAseq samples to *exclude from all
mRNA analyses:*
NA18861.4.M_120208_5 (quantification, QC and ASE outlier)
NA19225.6.M_120119_5 (ASE outlier probably due to contamination)
HG00237.4.M_120208_1 (ASE outlier probably due to contamination)
NA12399.7.M_120219_1 (ASE outlier probably due to contamination)
NA07000.1.M_120209_2 (ASE outlier probably due to contamination)
There are a couple of other samples to keep an eye on; within a couple
of days I will finish the analysis of these and provide an updated
sample information file with clear tags of samples to include in
analysis and warning flags.
best regards,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Dear all,
We were planning with Peter to have a call to discuss the QC companion
paper and QC analyses in general. We'll have the call on Thursday 26th
at 3pm, i.e. right after our normal analysis group TC, and everyone who
is interested is welcome to join.
best regards,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Hello all,
I have drafted an outline for the Geuvadis RNAseq main paper based on
the presentations and discussions at the Barcelona meeting. Please take
a look and let me know if I've forgotten or misunderstood something, or
if you have other suggestions. I will send it to the whole RNAseq group
Friday afternoon, so I'd appreciate if you could send your comments
before that.
best regards,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Hello,
Since we just had the Barcelona meeting, I suggest that we skip this
week's Geuvadis analysis group call. But let's schedule one for next
week, Thursday 26th of July.
best regards,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Hello,
In the Barcelona Geuvadis meeting I promised to send everyone a list of
colors to use for populations and labs. They can be found below, please
use these in all of your plots.
best regards,
Tuuli
popnames <- c("CEU", "FIN", "GBR", "TSI", "YRI")
Rpopcols <- c("red", "royalblue2", "olivedrab4", "purple1", "orange")
The RGB values are as follows:
Population R_color_name RGB_red RGB_green RGB_blue
CEU red 255 0 0
FIN royalblue2 67 110 238
GBR olivedrab4 105 139 34
TSI purple1 155 48 255
YRI orange 255 165 0
labnumbers <- 1:7
labnames <- c("UNIGE", "CNAG_CRG", "MPIMG", "ICMB", "HMGU", "UU", "LUMC")
Rlabcols <- c("mediumturquoise", "sienna3", "olivedrab4", "deeppink2",
"purple1", "royalblue3", "darkgoldenrod1")
SeqLabName SeqLabNumber R_color_name RGB_red RGB_green RGB_blue
UNIGE 1 mediumturquoise 72 209 204
CNAG_CRG 2 sienna3 205 104 57
MPIMG 3 olivedrab4 105 139 34
ICMB 4 deeppink2 238 18 137
HMGU 5 purple1 155 48 255
UU 6 royalblue3 58 95 205
LUMC 7 darkgoldenrod1 255 185 15
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
Hello all,
The final version of exon, gene and transcript quantifications are now
on the ftp site in :
/upload/geuvadis/wp4_rnaseq/main_project/analysis_data/quantification/ .
These are in Gencode v12 with the reads with >6 mismatches removed. See
the readme files and wiki for documentation - there are both raw counts,
counts normalized by the number of exonic reads, and RPKMs for
transcripts. Note that the files include all 667 samples, and later this
week I will send information of those samples provide files without the
couple of samples that we'll drop from all analyses.
Splice junction and intron quantifications will follow later this week.
best regards,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch