Download datasets and supplementary data files |
---|
Summary
Myers and colleagues generated massive neocortical transcriptome data sets for a set of unrelated elderly neurologically and neuropathologically normal humans and from confirmed late onset Alzheimer's disease patients (LOAD, n = 187 normal and 176 LOAD cases, see DOI:10.1016/j.ajhg.2009.03.011 for detail). They used an Illumina Sentrix Bead array (HumanRef-8) that measures expression of approximately 19,730 curated RefSeq sequences (Human Build 34).
Case identifiers: All case identifiers (IDs) in GeneNetwork begin with a capital C followed by a six digit GEO identifier, followed by the sex and age in years. Non-Alzheimer cases are labeled with the suffix letter N: C225652M85N. Alzheimer cases are labeled with the suffix letter A: C388217F97A.
Data were initially downloaded from the NCBI GEO archive under the experiment ID GSE15222. All data were generated using the Illumina HumanRef-8 expression BeadChip (GPL2700) v2 Rev0. This data set in GeneNetwork includes data for 24,354 probes. We have realigned the 50-mer sequences by BLAT to the latest version of the human genome (Feb 2009, hg19) and reannotated the array (August 2009). The annotation in GN will differ from that provided in GEO for this platform. We were unable to obtain 50-mer sequences for several thousand probes (e.g., HTT), and these probes have therefore not been realigned to the human genome.
The GEO data set was processed by Myers and colleagues using Illumina's Rank Invariant transform. We performed a series of QC and renormalization steps to the data to allow more facile comparison to other data sets in GeneNetwork. In brief, data is log2 transformed. We recentered each array to a mean expression of 8 units and a standard deviation of 2 units (2z + 8 transform). The values are therefore modified z scores and each unit represents roughly a two-fold difference in expression. Average expression across all 363 cases range from a low of 6 units (e.g., SYT15) to a high of 19 units for ARSK. APOE has an average expression of 15 units and APP has an average expression of 11.5 units.. The distribution is far from normal with a great excess of measurements of genes with low to moderate expression clustered between 6.5 and 8.5 units.
A small number of arrays (n = 6, GSM226040, GSM226041, GSM226042, GSM226044, GSM226045, GSM226046) had a different distribution from the great majority of other arrays. This was probably due to a batch processing effect. Members of this minority group belonged to both normal and LOAD cases. This putative batch effect has been eliminated in the GeneNetwork rendition of the Myers data. To eliminate the putative batch effect, we simply computed a mean offset for each probe in the "minority set" relative to the remaining "majority set" and added or subtracted this offset to force the mean of each probe in the minority set to conform to mean of the same probe in the majority set.
Experiment design
Expression profiling by array
We recently surveyed the relationship between the human brain transcriptome and genome in a series of neuropathologically normal postmortem samples. We now have analyzed additional samples with a confirmed pathologic diagnosis of late onset Alzheimer's disease (LOAD, final n=187 controls, 176 cases). Nine percent of the cortical transcripts we analyzed had expression profiles correlated with their genotypes in the combined cohort and approximately 5% of transcripts had SNP-transcript relationships that could distinguish LOAD samples. Two of these transcripts have been previously implicated in LOAD candidate gene SNP-expression screens. This study shows how the relationship between common inherited genetic variants and brain transcript expression can be used in the study of human brain disorders. We suggest that studying the transcriptome as a quantitative endo-phenotype has greater power to find risk SNPs influencing expression than the use of discrete diagnostic categories such as presence or absence of disease. see DOI:10.1016/j.ajhg.2009.03.011 for further details and complete author list.
Expression quantitative trait loci study using human brain from 363 cortical samples. Affymetrix 500K chip for genotyping, Illumina Sentrix Human-ref 8 bead array chip for expression. Genotyping data will be available at dbGAP.
About cases
|
About platform
Illumina Human 50 mer probes. Total of 24357 probes according to Myers et al. A total of 24354 probes included in this GeneNetwork file.
From the Methods section of the paper:
Genotyping and Expression Profiling DNA was hybridized to the Affymetrix GeneChip Human Mapping 500K Array Set (502,627 SNPs) as previously described.11,12 Genotypes were extracted with the use of both SNiPer-HD13 and BRLMM (Affymetrix, Santa Clara, CA) algorithms. Genotypes that exhibited less than 98% concordance between calls were excluded. SNPs with call rates less than 90% were excluded from the analysis. HardyWeinberg equilibrium (HWE) was assessed with exact tests and the PLINK analysis toolset.14 SNPs with HWE exact-test p values less than 0.05, as well as SNPs with minor-allele frequencies less than 1%, were excluded. Allele calls had a mean of 97% and a range of 90%–99%. cRNA was hybridized to Illumina Human Refseq-8 Expression BeadChip (24,357 transcripts) via standard protocols. Expression profiles were extracted and rank invariant normalized15–17 with the use of the BeadStudio software available from Illumina, with the Illumina custom error model used. Rankinvariant-normalized expression data were log10 transformed, and missing data were encoded as missing, rather than as a zero level of expression.
Contributors
Webster J, Gibbs R, Myers A
Acknowledgment
Notes
Access to the original data from Dr. Myers' laboratory
or GEO GSE15222
PMI = Post Mortem Interval
Cannot find this record in the GEO website: WGACON-120
This data is based on May 2004 (NCBI35/hg17).