Hello,

One more clarification about samples to exclude:
Sample NA19144.4.M_120208_2 shows suspicious enough that it has been flagged as a QC failure in the sample sheet, and its replicate sample NA19144.1.M_120209_1 has always marked as the sample to use for this individual. However, in the wiki NA19144.4.M_120208_2 was not previously listed as a sample to remove from all analysis, but I changed it now to keep this consistent - let's just clearly omit this sample. If you're analyzing only the set of 462 unique individuals, this changes nothing.

To summarize, these 7 samples should NOT be used in analyses:
NA18861.4.M_120208_5
NA19225.6.M_120119_5
NA12399.7.M_120219_1
NA07000.1.M_120209_2
HG00237.4.M_120208_1
NA19095.5.M_120131_5
NA19144.4.M_120208_2

We're left with 660 samples from 462 individuals.

best,
Tuuli


Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen@unige.ch
On 9/1/12 8:52 PM, Tuuli Lappalainen wrote:
Hello all,

IMPORTANT:
- sample NA19095.5.M_120131_5 should be excluded from all analyses. This sample is one of the 5 replicates, and for this individual we originally chose another sample (NA19095.2.M_120131_2) to use. Thus, the set of unique 462 samples (which should be used in most analyses) doesn't change. However, if you're analyzing replicates in any way, exclude NA19095.5.M_120131_5.

Some more info: Peter did analysis of sex-specific expression, which generally shows very nice clustering to female and male samples. However, 4 samples showed a mixed pattern that is suggestive of cross-contamination. Three of these have been excluded previously due to them showing clear signs of contamination in ASE analysis ( NA12399.7.M_120219_1, NA07000.1.M_120209_2, HG00237.4.M_120208_1). However, the sample NA19095.5.M_120131_5 behaved slightly abnormally in ASE analysis but was not excluded based on that, but the sex-specific analysis shows that this sample needs to be thrown out.

Additionally, there are some QC warnings for samples that will be kept in the dataset. In the sex-specific transcription analysis the sample NA11930.3.M_120202_8 behaved slightly abnormally, and Jonas and Olof have also listed samples that are slight outliers based on different statistics.

An updated sample information sheet is in the attachment. The UseThisSample flag gives you the set of 462 unique samples (and hasn't changed from the previous file (GD667_mRNA_SampleInformation_270712.txt). Additionally, the QC_OK column denotes all the samples that have passed QC, and QC_Info has information of why a sample failed QC, or QC warnings for QC-passed samples, and these two columns have been updated with the information above.

This information can be found in the wiki (http://sanabre.net/geuvadis/index.php/QC_sample_info), and the sample information sheets have been uploaded on the ftp site (/upload/geuvadis/wp4_rnaseq/main_project/analysis_data/qc)

best,
Tuuli


-- 
Tuuli Lappalainen, PhD
Department of Genetic Medicine and Development
University of Geneva Medical School
CMU / Rue Michel-Servet 1
1211 Geneva 4
Switzerland
Tel. +41-(0)22-3795550
tuuli.lappalainen@unige.ch