Hello,
One more clarification about samples to exclude:
Sample NA19144.4.M_120208_2 shows suspicious enough that it has been
flagged as a QC failure in the sample sheet, and its replicate sample
NA19144.1.M_120209_1 has always marked as the sample to use for this
individual. However, in the wiki NA19144.4.M_120208_2 was not previously
listed as a sample to remove from all analysis, but I changed it now to
keep this consistent - let's just clearly omit this sample. If you're
analyzing only the set of 462 unique individuals, this changes nothing.
To summarize, these 7 samples should NOT be used in analyses:
NA18861.4.M_120208_5
NA19225.6.M_120119_5
NA12399.7.M_120219_1
NA07000.1.M_120209_2
HG00237.4.M_120208_1
NA19095.5.M_120131_5
NA19144.4.M_120208_2
We're left with 660 samples from 462 individuals.
best,
Tuuli
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch
On 9/1/12 8:52 PM, Tuuli Lappalainen wrote:
Hello all,
IMPORTANT:
- sample NA19095.5.M_120131_5 should be excluded from all analyses.
This sample is one of the 5 replicates, and for this individual we
originally chose another sample (NA19095.2.M_120131_2) to use. Thus,
the set of unique 462 samples (which should be used in most analyses)
doesn't change. However, if you're analyzing replicates in any way,
exclude NA19095.5.M_120131_5.
Some more info: Peter did analysis of sex-specific expression, which
generally shows very nice clustering to female and male samples.
However, 4 samples showed a mixed pattern that is suggestive of
cross-contamination. Three of these have been excluded previously due
to them showing clear signs of contamination in ASE analysis (
NA12399.7.M_120219_1, NA07000.1.M_120209_2, HG00237.4.M_120208_1).
However, the sample NA19095.5.M_120131_5 behaved slightly abnormally
in ASE analysis but was not excluded based on that, but the
sex-specific analysis shows that this sample needs to be thrown out.
Additionally, there are some QC warnings for samples that will be kept
in the dataset. In the sex-specific transcription analysis the sample
NA11930.3.M_120202_8 behaved slightly abnormally, and Jonas and Olof
have also listed samples that are slight outliers based on different
statistics.
An updated sample information sheet is in the attachment. The
UseThisSample flag gives you the set of 462 unique samples (and hasn't
changed from the previous file
(GD667_mRNA_SampleInformation_270712.txt). Additionally, the QC_OK
column denotes all the samples that have passed QC, and QC_Info has
information of why a sample failed QC, or QC warnings for QC-passed
samples, and these two columns have been updated with the information
above.
This information can be found in the wiki
(
http://sanabre.net/geuvadis/index.php/QC_sample_info), and the sample
information sheets have been uploaded on the ftp site
(/upload/geuvadis/wp4_rnaseq/main_project/analysis_data/qc)
best,
Tuuli
--
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen(a)unige.ch