INIA Brain mRNA M430 (Feb04) PDNN

Download datasets and supplementary data files

Summary

This October 2004 data freeze provides initial estimates of mRNA expression in brains of adult BXD recombinant inbred mice measured using Affymetrix M430AB microarrays. In contast to the U74Av2 array, this new data set provides broader coverage (~45,000 transcripts) but does not include replicates or as many strains (25 vs 35). Data were generated at UTHSC and the University of Memphis with support from grants from the NIAAA Integrative Neuroscience Initiative on Alcoholism (INIA). Data were processed using the PDNN method of Zhang. To simplify comparison among transforms, PDNN values of each array were adjusted to an average of 8 units and a variance of 2 units.

About cases

We have exploited a set of BXD recombinant inbred strains. The parental strains from which all BXD lines are derived are C57BL/6J (B) and DBA/2J (D). Both B and D strains have been almost fully sequence (8x coverage for B by a public consortium and approximately 1.5x coverage for D by Celera).

BXD1 through BXD32 were produced by Benjamin A. Taylor starting in the late 1970s. BXD33 through BXD42 were also produced by Taylor, but from a second set of crosses initiated in the early 1990s. These strains are all available from the Jackson Laboratory, Bar Harbor, Maine.

About tissue

The data set consists of a single batch of Affymetrix mouse expression 430A and 430B GeneChip array pairs. Each AB pair was hybridized in sequence (A array first, B array second) with a pool of brain tissue (forebrain minus olfactory bulb, plus the entire midbrain) taken from three adult animals of closely matched age and the same sex. RNA was extracted at UTHSC by Lu Lu, Zhiping Jia, and Hongtao Zhai. All samples were subsequently processed in the INIA Bioanalytical Core at the W. Harry Feinstone Center of Excellence by Thomas R. Sutter and colleagues at the University of Memphis. Before running the main batch of 30 pairs of array, we ran four "test" samples (one male and one female pool from each of the two parental strains, C57BL/6J and DBA/2J). The main set of 30 array pairs includes the same four samples (in other words we have four technical replicates), two F1 hybrid sample (each run two times for a within-batch technical replication), and 22 BXD strains. The data set therefore consists of one male and one female pool from C57BL/6J, DBA/2J, the B6D2F1 hybrid, 11 female BXD samples, and 11 male BXD samples. We should note that the four technical replicates between batches were eventually combined with a correction for a highly significant batch effect. This was done at both the probe and probe set levels to "align" the test batch values with the two main batches. (The ratio of the probe average in the four test arrays to the average of the same probe in the four corresponding main batch arrays was used as a correction factor.) The F1 within-batch technical replicates were simply averaged. In the next batch we will reverse the sex of the BXD samples to achieve a balance with at least 22 BXD strains with one male and one female sample each.

The table below lists the arrays by strain, sex, age, sample identifier, and data results were obtained from the Bioanalytical Core at the University of Memphis. Each array was hybridized to a pool of mRNA from three mice.

Strain	Sex	Age	SampleID	Date
B6D2F1	F	127	919-F1	Jan04
B6D2F1	F	127	919-F2	Jan04
B6D2F1	M	127	920-F1	Jan04
B6D2F1	M	127	920-F2	Jan04
C57BL/6J	F	65	903-F1	Nov03
C57BL/6J	F	65	903-F2	Jan03
C57BL/6J	M	66	906-F1	Nov03
C57BL/6J	M	66	906-F2	Jan04
DBA/2J	F	60	917-F1	Nov03
DBA/2J	F	60	917-F2	Jan04
DBA/2J	M	60	918-F1	Nov03
DBA/2J	M	60	918-F2	Jan04
BXD1	F	95	895-F1	Jan04
BXD5	M	71	728-F1	Jan04
BXD6	M	92	902-F1	Jan04
BXD8	F	72	S167-F1	Jan04
BXD9	M	86	909-F1	Jan04
BXD12	M	64	897-F1	Jan04
BXD13	F	86	748-F1	Jan04
BXD14	M	91	912-F1	Jan04
BXD18	F	108	771-F1	Jan04
BXD19	F	56	S236-F1	Jan04
BXD21	F	67	740-F1	Jan04
BXD23	F	88	815-F1	Jan04
BXD24	M	71	913-F1	Jan04
BXD25	F	74	S373-F1	Jan04
BXD28	F	79	910-F1	Jan04
BXD29	F	76	693-F1	Jan04
BXD32	F	93	898-F1	Jan04
BXD33	M	77	915-F1	Jan04
BXD34	M	72	916-F1	Jan04
BXD36	M	77	926-F1	Jan04
BXD38	M	69	731-F1	Jan04
BXD42	M	97	936-F1	Jan04

About platform

Affymetrix 430A and 430B GeneChip Set: Expression data were generated using 430AB array pairs. The chromosomal locations of probe sets were determined by BLAT analysis of concatenated probe sequences using the Mouse Genome Sequencing Consortium May 2004 (mm5) assembly. This BLAT analysis is performed periodically by Yanhua Qu as each new build of the mouse genome is released. We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis. It is possible to confirm the BLAT alignment results yourself simply by clicking on either the Verify UCSC and Verify Ensembl links in the Trait Data and Editing Form (right side of the Location line).

About data processing

Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell.

Step 1: We added an offset of 1 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.

Step 2: We took the log base 2 of each probe signal.

Step 3: We computed the Z scores for each probe signal.

Step 4: We multiplied all Z scores by 2.

Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.

Step 6a: The 430A and 430B arrays include a set of 100 shared probe sets (2200 probes) that have identical sequences. These probes provide a way to calibrate expression of the A and B arrays to a common scale. The absolute mean expression on the 430B array is almost invariably lower than that on the 430A array. To bring the two arrays into alignment, we regressed Z scores of the common set of probes to obtain a linear regression corrections to rescale the 430B arrays to the 430A array. In our case this involved multiplying all 430B Z scores by the slope of the regression and adding or subtracting a very small offset. The result of this step is that the mean of the 430A GeneChip expression is fixed at a value of 8, whereas that of the 430B chip is typically 7. Thus average of A and B arrays is approximately 7.5.

Step 6b: We recenter the whole set of 430A and B transcripts to a mean of 8 and a standard deviation of 2. This involves reapplying Steps 3 through 5 above but now using the entire set of probes and probe sets from a merged 430A and B data set.

Step 7: We corrected for technical variance introduced by two batches. Means separated by the first batch for each gene are corrected same as means of the second batch.

Step 8: Finally, we compute the arithmetic mean of the values for the set of microarrays for each strain. In this data set we have modest numbers of replicates and for this reason we do not yet provide error terms for transcripts or probes. Note, that we have not (yet) corrected for variance introduced by differences in sex, age, or any interaction terms. We have not corrected for background beyond the background correction implemented by Affymetrix in generating the CEL file. We expect to add statistical controls and adjustments for these variables in subsequent versions of WebQTL.

Probe set data: The original expression values in the Affymetrix CEL files were read into PerfectMatch to generate the normalized PDNN data set.

PDNN values of each array were subsequently normalized to a achieve a mean value of 8 units and a variance of 2 units.

When necessary, we computed the arithmetic mean for technical replicates and treated these as single samples. We then computed the arithmetic mean for the set of 2 to 5 biological replicates for each strain.

About the array probe sets names:

Most probe sets on the mouse 430A and 430B arrays consist of a total of 22 probes, divided into 11 perfect match(PM) probes and 11 mismatch (MM) controls. Each set of these 25-nucleotide-long probes has an identifier code that includes a unique number, an underscore character, several suffix characters that highlight design features, a a final A or B character to specify the array pair member. The most common probe set suffix is at. This code indicates that the probes should hybridize relatively selectively with the complementary anti-sense target (i.e., the complemenary RNA) produced from a single gene.

Acknowledgment

Data for the microarrays were generously provided by support from NIAAA INIA grants to RWW and Thomas Sutter. Support for sample acquistion and WebQTL have been provided by NIMH Human Brain Project, and the Dunavant Chair of Excellence, University of Tennessee Health Science Center. All arrays were processed at the University of Memphis by Dr. Thomas Sutter and colleagues with support of the INIA Bioanalytical Core.

Notes

This text file originally generated by RWW, YHQ, and EJC, Oct 2004. Updated by RWW, Nov 5, 2004.