Download datasets and supplementary data files |
---|
Summary
This April 2005 data freeze provides estimates of mRNA expression in adult forebrain and midbrain from 45 lines of mice including C57BL/6J, DBA/2J, their F1 hybrids, and 42 BXD recombinant inbred strains. Data were generated at UTHSC and the University of Memphis with support from grants from the NIAAA Integrative Neuroscience Initiative on Alcoholism (INIA). Samples were hybridized in small pools (n = 3) to a total of 105 Affymetrix M430A and B array pairs. This particular data set was processed using the RMA protocol. To simplify comparisons among transforms, RMA values of each array were adjusted to an average of 8 units and a standard deviation of 2 units.
About cases
We have used a set of BXD recombinant inbred strains generated by crossing C57BL/6J (B6 or B) with DBA/2J (D2 or D). The BXDs are particularly useful for systems genetics because both parental strains have been sequenced (8x coverage of B6 and 1.5x coverage for D). Physical maps in WebQTL incorporate approximately 1.75 million B vs D SNPs from Celera. BXD2 through BXD32 were bred by Benjamin A. Taylor starting in the late 1970s. BXD33 through 42 were bred by Taylor in the 1990s. These strains are available from The Jackson Laboratory. BXD43 through BXD99 were bred by Lu Lu, Jeremy Peirce, Lee M. Silver, and Robert W. Williams in the late 1990s and early 2000s using advanced intercross progeny (Peirce et al. 2004). Many of the 50 new BXD strains are available from Lu Lu and colleagues
All stock was obtained originally from The Jackson Laboratory between 1999 and 2003. Most BXD animals were born and housed at the University of Tennessee Health Science Center. Some cases were bred at the University of Memphis (Douglas Matthews) or the University of Alabama (John Mountz and Hui-Chen Hsu).
About tissue
The INIA M430 brain Database (April05) consists of 105 Affymetrix 430A and 430B microarray pairs. Each pair was hybridized in sequence (A array first, B array second) with a pool of brain tissue (forebrain minus olfactory bulb, plus the entire midbrain) taken from three adult animals of closely matched age and the same sex. RNA was extracted at UTHSC by Lu Lu, Zhiping Jia, and Hongtao Zhai. All samples were subsequently processed in the INIA Bioanalytical Core at the W. Harry Feinstone Center of Excellence by Thomas R. Sutter, Shirlean Goodwin, and colleagues at the University of Memphis.Replication and Sample Balance: Our goal is to obtain data for independent biological sample pools from at least one of sample from each sex for all BXD strains. We have not yet achieved this goal. Ten of 45 strains are still represented by single sex samples: BXD2 (F), BXD8 (F), BXD15 (F), BXD18 (F), BXD25 (F), BXD29 (F), BXD33 (M), BXD45 (F), BXD77 (M), and BXD90 (M). Eleven strains are represented by three independent samples with the following breakdown by sex: C57BL/6J (1F 2M), DBA/2J (2F 2M), B6D2F1 (2F 2M) + D2B6F1 (1F 1M), BXD6 (2F 1M), BXD13 (2F 1M), BXD14 (1F 2M), BXD28 (2F 1M), BXD34 (1F 2M), BXD36 (1F 2M), BXD38 (1F 2M), BXD42 (1F 2M).
Batch Structure: Before running the first batch of 30 pairs of array (dated Jan04), we ran four test samples (Nov03). The main batch of 30 includes the four test samples (four technical replicates). The Nov03 data was combined with the Jan04 data and was treated as a single batch that consists of one male and one female pool from C57BL/6J, DBA/2J, the B6D2F1 hybrid, 11 female BXD samples, and 11 male BXD samples. The second large batch was run February 2005 (Feb05) and consists of 71 pairs of arrays. Batch effects were corrected at the individual probe level as described below.
The table below summarizes information on strain, sex, age, sample name, batch result date, and source of mice.
Id Strain Sex Age Sample_name Result date Source 1 C57BL/6J F 65 R0903F1 Nov03 UTM RW 2 C57BL/6J F 65 R0903F1 Jan04 UTM RW 3 C57BL/6J M 66 R0906F1 Nov03 UTM RW 4 C57BL/6J M 66 R0906F1 Jan04 UTM RW 5 C57BL/6J M 66 R0906F1 Feb05 UTM RW 6 C57BL/6J M 76 R0997F1 Feb05 UTM RW 7 D2B6F1 F 57 R1066F1 Feb05 UTM RW 8 D2B6F1 M 59 R1381F1 Feb05 UTM RW 9 DBA/2J F 60 R0917F1 Nov03 UTM RW 10 DBA/2J F 60 R0917F1 Feb05 UTM RW 11 DBA/2J F 60 R0917F2 Jan04 UTM RW 12 DBA/2J F 64 R1123F1 Feb05 UTM RW 13 DBA/2J M 60 R0918F1 Nov03 UTM RW 14 DBA/2J M 60 R0918F1 Jan04 UTM RW 15 DBA/2J M 73 R1009F1 Feb05 UTM RW 16 B6D2F1 F 127 R0919F1 Jan04 UTM JB 17 B6D2F1 F 127 R0919F2 Jan04 UTM JB 18 B6D2F1 F 64 R1053F1 Feb05 UTM RW 19 B6D2F1 F 64 R1053F1 Feb05 UTM RW 20 B6D2F1 M 127 R0920F1 Jan04 UTM JB 21 B6D2F1 M 127 R0920F2 Jan04 UTM JB 22 B6D2F1 M 66 R1057F1 Feb05 UTM RW 23 BXD1 M 181 R0956F1 Feb05 UTM JB 24 BXD1 F 95 R0895F1 Jan04 UMemphis 25 BXD2 F 142 R0907F1 Feb05 UAB 26 BXD5 F 56 R0744F1 Feb05 UMemphis 27 BXD5 M 71 R0728F1 Jan04 UMemphis 28 BXD6 F 57 R1711F1 Feb05 JAX 29 BXD6 F 92 R0901F1 Feb05 UMemphis 30 BXD6 M 92 R0902F1 Jan04 UMemphis 31 BXD8 F 72 R0167F1 Jan04 UAB 32 BXD9 F 86 R0908F1 Feb05 UMemphis 33 BXD9 M 86 R0909F1 Jan04 UMemphis 34 BXD11 F 97 R0745F1 Feb05 UAB 35 BXD11 M 92 R0666F1 Feb05 UMemphis 36 BXD12 F 64 R0896F1 Feb05 UMemphis 37 BXD12 M 64 R0897F1 Jan04 UMemphis 38 BXD13 F 86 R0730F1 Feb05 UMemphis 39 BXD13 F 86 R0748F1 Jan04 UMemphis 40 BXD13 M 76 R0929F1 Feb05 UMemphis 41 BXD14 M 91 R0912F1 Jan04 UMemphis 42 BXD14 M 68 R1051F1 Feb05 UTM RW 43 BXD15 F 80 R0928F1 Feb05 UMemphis 44 BXD18 F 108 R0771F1 Jan04 UAB 45 BXD19 M 157 R1229F1 Feb05 UTM JB 46 BXD19 F 56 R0236F1 Jan04 UAB 47 BXD21 F 67 R0740F1 Jan04 UAB 48 BXD21 F 67 R0740F1 Feb05 UAB 49 BXD23 F 66 R1035F1 Feb05 UTM RW 50 BXD23 M 66 R1037F1 Feb05 UTM RW 51 BXD23 F 88 R0815F1 Jan04 UAB 52 BXD23 F 88 R0815F1 Feb05 UAB 53 BXD24 F 71 R0914F1 Feb05 UMemphis 54 BXD24 M 71 R0913F1 Jan04 UMemphis 55 BXD25 F 74 R0373F1 Jan04 UTM RW 56 BXD28 F 79 R0910F1 Jan04 UMemphis 57 BXD28 M 79 R0911F1 Feb05 UMemphis 58 BXD28 F 113 R0892F1 Feb05 UTM RW 59 BXD29 F 76 R0693F1 Jan04 UMemphis 60 BXD31 F 61 R1199F1 Feb05 UTM RW 61 BXD31 M 61 R1141F1 Feb05 UTM RW 62 BXD32 F 93 R0898F1 Jan04 UAB 63 BXD32 F 76 R1214F1 Feb05 UMemphis 64 BXD32 M 65 R1478F1 Feb05 UMemphis 65 BXD33 M 77 R0915F1 Jan04 UMemphis 66 BXD34 F 92 R0900F1 Feb05 UMemphis 67 BXD34 M 56 R0617F1 Feb05 UMemphis 68 BXD34 M 72 R0916F1 Jan04 UMemphis 69 BXD36 F 61 R1145F1 Feb05 UTM RW 70 BXD36 M 77 R0926F1 Jan04 UMemphis 71 BXD36 M 61 R1211F1 Feb05 UMemphis 72 BXD38 M 83 R1208F1 Feb05 UMemphis 73 BXD38 F 69 R0729F1 Feb05 UMemphis 74 BXD38 M 69 R0731F1 Jan04 UMemphis 75 BXD39 F 76 R1712F1 Feb05 JAX 76 BXD39 M 71 R0602F1 Feb05 UAB 77 BXD40 F 184 R0741F1 Feb05 UAB 78 BXD40 M 56 R0894F1 Feb05 UMemphis 79 BXD42 F 100 R0742F1 Feb05 UAB 80 BXD42 M 97 R0936F1 Jan04 UMemphis 81 BXD42 M 105 R0937F1 Feb05 UMemphis 82 BXD43 M 63 R1047F1 Feb05 UTM RW 83 BXD44 F 57 R1069F1 Feb05 UTM RW 84 BXD44 M 58 R1072F1 Feb05 UTM RW 85 BXD45 F 58 R1398F1 Feb05 UTM RW 86 BXD48 F 59 R0946F1 Feb05 UTM RW 87 BXD48 M 64 R0970F1 Feb05 UTM RW 88 BXD51 F 63 R1430F1 Feb05 UTM RW 89 BXD51 M 65 R1001F1 Feb05 UTM RW 90 BXD60 F 64 R0976F1 Feb05 UTM RW 91 BXD60 M 59 R1075F1 Feb05 UTM RW 92 BXD62 F 59 R1033F1 Feb05 UTM RW 93 BXD62 M 58 R1027F1 Feb05 UTM RW 94 BXD69 F 60 R1438F1 Feb05 UTM RW 95 BXD69 M 64 R1193F1 Feb05 UTM RW 96 BXD73 F 60 R1275F1 Feb05 UTM RW 97 BXD73 M 76 R1442F1 Feb05 UTM RW 98 BXD77 M 61 R1426F1 Feb05 UTM RW 99 BXD86 F 77 R1414F1 Feb05 UTM RW 100 BXD86 M 77 R1418F1 Feb05 UTM RW 101 BXD87 F 89 R1713F1 Feb05 UTM RW 102 BXD87 M 84 R1709F1 Feb05 UTM RW 103 BXD90 M 61 R1452F Feb05 UTM RW 104 BXD92 F 58 R1299F1 Feb05 UTM RW 105 BXD92 M 59 R1307F1 Feb05 UTM RW
About platform
Affymetrix Mouse Genome 430A and B array pairs: The 430A and B array pairs consist of 992936 25-nucleotide probes that collectively estimate the expression of approximately 39,000 transcripts. The array sequences were selected late in 2002 using Unigene Build 107. The arrays nominally contain the same probe sequences as the 430 2.0 series. However, we have found that roughy 75000 probes differ from those on A and B arrays and those on the 430 2.0
About data processing
Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell.Probe set data: The expression data were processed by Yanhua Qu (UTHSC). The original CEL files were read into the R environment (Ihaka and Gentleman 1996). Data were processed using the Robust Multichip Average (RMA) method (Irrizary et al. 2003). Values were log2 transformed. Probe set values listed in WebQTL are the averages of biological replicates within strain. A few technical replicates were averaged and treated as single samples. A 1-unit difference represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.
- Step 1: We added an offset of 1.0 unit to each cell signal to ensure that all values could be logged without generating negative values. We then computed the log base 2 of each cell.
- Step 2: We performed a quantile normalization of the log base 2 values for the total set of 105 arrays (processed as two batches) using the same initial steps used by the RMA transform.
- Step 3: We computed the Z scores for each cell value.
- Step 4: We multiplied all Z scores by 2.
- Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
- Step 6: We eliminated much of the systematic technical variance introduced by the two batches (n = 34 and n = 71 array pairs) at the probe level. To do this we calculated the ratio of each batch mean to the mean of both batches and used this as a single multiplicative probe-specific batch correction factor. The consequence of this simple correction is that the mean probe signal value for each batch is the same.
- Step 7a: The 430A and 430B arrays include a set of 100 shared probe sets (a total of 2200 probes) that have identical sequences. These probes and probe sets provide a way to calibrate expression of the 430A and 430B arrays to a common scale. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression correction to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a small offset. The result of this step is that the mean of the 430A expression is fixed at a value of 8, whereas that of the 430B chip is typically reduced to 7. The average of the merged 430A and 430B array data set is approximately 7.5.
- Step 7b: We recentered the merged 430A and 430B data sets to a mean of 8 and a standard deviation of 2. This involved reapplying Steps 3 through 5.
- Step 8: Finally, we computed the arithmetic mean of the values for the set of microarrays for each strain. Technical replicates were averaged before computing the mean for independent biological samples. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, source of animals, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We eventually hope to add statistical controls and adjustments for some of these variables.
This data set include further normalization to produce final estimates of expression that can be compared directly to the other transforms (average of 8 units and stabilized standard deviation of 2 units within each array). Please seee Bolstad and colleagues (2003) for a helpful comparison of RMA and two other common methods of processing Affymetrix array data sets.
About the chromosome and megabase position values:
The chromosomal locations of probe sets included on the microarrays were determined by BLAT analysis using the Mouse Genome Sequencing Consortium May 2004 Assembly (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Dr. Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.
Acknowledgment
Support for acquisition of microarray data were generously provided by the NIAAA and its INIA grant program to RWW, Thomas Sutter, and Daniel Goldowitz (U01AA013515, U01AA013499-03S1, U01AA013488, U01AA013503-03S1). Support for the continued development of the GeneNetwork and WebQTL was provided by a NIMH Human Brain Project grant (P20MH062009). All arrays were processed at the University of Memphis by Thomas Sutter and colleagues with support of the INIA Bioanalytical Core.
Notes
This text file originally generated by RWW, YHQ, and EJC, Oct 2004. Updated by RWW, Nov 5, 2004; April 7, 2005; RNA/tissue preparation protocol updatedby JLP, Sept 2, 2005; Sept 26, 2005.