Download datasets and supplementary data files |
---|
Summary
FINAL RECOMMENDED EYE DATA SET. The HEIMED September 2008 RMA data release provides estimates of gene expression in whole eyes of 103 lines of young adult mice generated using 221 Affymetrix M430 2.0 arrays. This data set is intended for exploration of the genetics and genomics of the mouse eye, retina, lens, retinal pigment epithelium, cornea, iris and choroid. Data were generated at UTHSC with support from a grant from Dr. Barrett Haik, Director of the Hamilton Eye Institute (HEI). We used pooled RNA samples, usually two independent pools--one male, one female pool--for most lines of mice. This data set was processed using the RMA protocol. A total of 2223 probes sets are associated with LRS values greater than 46 (LOD >10).
Users of these mouse eye data may also find the following complementary resources extremely useful:
- NEIBank collection of ESTs and SAGE data.
- RetNet: the Retinal Information Network--tables of genes and loci causing inherited retinal diseases
- Mouse Retina SAGE Library from the Cepko laboratory. This site provides extensive developmental data from as early as embryonic day E12.5.
- Digital reference of ophthalmology from Columbia provides high quality photographs of human ocular diseases, case studies, and short explanations. This reference does not have a molecular focus.
- Mouse Retinal Developmental Gene Expression data sets from the Friedlander laboratory. This site provides extensive developmental data using the Affymetrix U74 v 2 array (predecessor of the M430).
- Data sets on differential gene expression in anatomical compartments of the human eye from Pat Brown's lab. View expression signatures for different ocular tissues using the geneXplorer 2.0.
Experiment design
Expression profiling by array
About cases
This is the complete and final HEIMED data set. HEIMED consists of expression data for 103 genetically defined lines of mice with standard errors of the mean. Almost all animals are young adults between 50 and 80 days of age (Table 1, maximum age is 123 days). We measured expression in conventional inbred strains, BXD recombinant inbred (RI) strains, reciprocal F1s between C57BL/6J and DBA/2J, and several mutant and knockout lines. We have combined all common strains, F1 hybrids, and mutants into a group called the Mouse Diversity Panel (MDP). Four lines, namely, C57BL/6J (B6), DBA/2J (D2), and the pair of B6D2F1 and D2B6F1 hybrids are common to both the MDP and the BXD set. This is a breakdown of cases that are part of HEIMED:
- 68 BXD strains. The first 32 of these strains are from the Taylor series of BXD strains generated at the Jackson Laboratory by Benjamin A. Taylor. BXD1 through BXD32 were started in the late 1970s, whereas BXD33 through 42 were started in the 1990s. Only one of these strains, BXD24 (know also known as BXD24b), has retinal degeneration (a spontaneous mutation). The other 36 BXD strains (BXD43 and higher) were bred by Lu Lu, Jeremy Peirce, Lee M. Silver, and Robert W. Williams starting in 1997 using B6D2 generation 10 advanced intercross progeny. This modified breeding protocol doubles the number of recombinations per BXD strain and improves mapping resolution (Peirce et al. 2004). All of the Taylor series of BXD strains and many of the new BXD strains are available from the Jackson Laboratory. All of the new BXD strains (BXD43 and higher) are also available directly from Lu Lu and colleagues at the University of Tennessee Health Science Center in Memphis, TN, USA.
- 35 MDP lines, including 26 inbred strains representing closely related substrains (e.g, BALB/cJ and BALB/cByJ), many of the most widely used common Mus musculus domesticus inbred strains (e.g., C57BL/6J and 129S1/SvImJ), inbred but wild-derived representatives of common subspecies (Mus musculus domesticus, e.g, WSB/EiJ; M. musculus musculus, e.g., CZECHII/EiJ; M. musculus molossinus, e.g., MOLF/EiJ; M. musculus castaneus, e.g., CAST/EiJ); and even one different species of mouse (Mus spicilegus, PANCEVO/EiJ). The MDP also includes the reciprocal F1 hybrids (B6D2F1 and D2B6F1) and the following 6 KO lines and the Nyx-nob mutant:
- 6 knockouts (KO), including a KO of Rpe65, and 5 DeltaGen Inc. knockout lines provided by Dr. Ted Choi. These KO lines have had a bacterial lacZ construct inserted into the gene. The endogenous promoter drives expression of beta-galactosidase. RT-PCR analysis detects a gene transcript in most tissues. The following KOs from DeltaGen were studied: Gabra1, Gabbr1, Gnb1, Gpr19, and Clcn3. We also included one spontaneous mutant of the nyctalopin (Nyx no b wave "nob") gene (Pardue et al., 1998) that is on a BALB/cByJ background.
Rod photoreceptor degeneration in inbred mice: Six strains of mice included in HEIMED suffer from severe loss of photoreceptors (mainly rods) and have the equivalent of night blindness in human patients. The death of photoreceptors in these strains occurs by one to two months of age and is often caused by the retinal degeneration 1 (rd1) mutant allele in the rod cyclic-GMP phosphodiesterase 6 beta subunit gene (Pde6b). The following strains are known to have photoreceptor degeneration: C3H/HeJ, FVB/NJ, MOLF/EiJ, SJL/J and BXD24/TyJ. BXD24/TyJ is now known as BXD24b/TyJ and has nearly complete retinal degeneration. BXD24a/TyJ, a 1988 F80 inbred stock that has been rederived from cryogenic storage, does not have retinal degeneration (stock number 005243) and is an ideal coisogenic control, but is not included in the HEIMED data set.
As expected (Dickerson LW et al., 2002) and as judged from the absence of rhodopsin expression, one of the DeltaGen KO lines (chloride ion channel 3, Clcn3) also has retinal degeneration: B6129P2F2N1-Clcn3. Degeneration in this strain is likely to include all rods and all cones. The cone defect is obvious from the decrease in expression of Gnat2, a gene associated with cones and achromatopsia in mice and humans.
Lines of mice were selected using the following criteria:
- genetic and phenotypic diversity, including use by the Phenome Project
- representation of a fairly wide variety of different subspecies of Mus
- their use in making genetic reference populations including recombinant inbred strains, cosomic strains, congenic and recombinant congenic strains
- their use by the Complex Trait Consortium to make the Collaborative Cross (Tel Aviv/Wellcome, Oak Ridge/DOE, and Perth/UWA)
- genome sequence data from three sources (NHGRI, Celera, and Perlegen-NIEHS)
- interesting mutations or knockouts affecting genes with high expression in the eye
- general availability from The Jackson Laboratory. The only exception are the DeltaGen KO mice.
We have included all eight parents of the Collaborative Cross (129S1/SvImJ, A/J, C57BL/6J, CAST/EiJ, NOD/LtJ, NZO/HlLtJ, PWK/PhJ, and WSB/EiJ) in the MDP. Fourteen MDP strains have been partially sequenced by Perlegen for the NIEHS, including including 129S1/SvImJ, A/J, AKR/J, BALB/cByJ, C3H/HeJ, CAST/EiJ, DBA/2J, FVB/NJ, KK/HlJ, MOLF/EiJ, NOD/LtJ, NZW/LacJ, PWD/PhJ, and WSB/EiJ (see the GeneNetwork SNP Browser for data, details, and see Perlegen's excellent data resources and browser).
- 129S1/SvImJ : Collaborative Cross strain sequenced by NIEHS; background for many knockouts (R1 ES cell line); Phenome Project A list. This strain (JAX No 002448, aka 129S1/Sv-++Kitl/+) carries hypopigmentation mutations (white bellied chinchilla) of the tyrosinase gene on Chr 7 and a mutant allele of the steel (Kitl) gene. It is also a cone photoreceptor function loss 3 mutant (Cpfl3 allele) of the Gnat2 gene that is a model for achromatopsia (JAX Stock Number: 002448)
- A/J: Collaborative Cross strain sequenced by Perlegen/NIEHS; parent of the AXB/BXA panel. A tyrosinase (Tyr c allele) albino mutant. This strain is particularly sensitive to light-induced photoreceptor loss (Danciger et al., 2007). (JAX Stock Number: 000646)
- BALB/cByJ: Sequenced by NIEHS; maternal parent of the CXB panel; Phenome Project old group A list. A tyrosinase (Tyr c allele) albino mutant and also a tyrosinase related protein 1 (Tyrp1 b) brown allele mutant. Small brain, not aggressive (JAX Stock Number: 001026)
- BALB/cJ: Phenome Project A list. A tyrosinase (Tyr c allele) albino mutant and also a tyrosinase related protein 1 (Tyrp1 b) brown allele mutant. Large brain and aggressive (JAX Stock Number: 000651)
- BXSB/MpJ: A white-bellied agouti strains with interesting autoimmune disease restricted to males that is associated with a mutation in the Yaa gene that causes glomerulonephritis, a dramatic increase in number of peripheral monocytes, and pre-B-cell deficiency (JAX Stock Number: 000740)
- C3H/HeJ: The Heston (He) substrain with a wildtype agouti (A allele) coat color. Sequenced by Perlegen/NIEHS; paternal parent of the BXH panel; Phenome Project old group A list. Important to note for this eye expression dataset, C3H/HeJ is a Pdeb6 rd1 mutant with near total photoreceptor loss at as early as postnatal day 30. Also a Tlr4 mutant that is endotoxin resistant. (JAX Stock Number: 000659)
- C57BL/6J: Sequenced by NIH/NHGRI; parental strain of AXB/BXA, BXD, and BXH; Phenome Project A list. Single most widely used inbred strain of mouse. (JAX Stock Number: 000664)
- C57BLKS/J: Black Kaliss strain (non-agouti a allele) derived from C57BL/6J, but genetically contaminated at some point mainly with DBA/2J and then reinbred. Now at the Jackson Laboratory. (JAX Stock Number: 000662)
- CAST/EiJ: A wild-derived inbred Mus musculus castaneus strain. Samples of this subspecies were captured in Southeast Asia. One of three wild-derived strains in the Collaborative Cross sequenced by NIEHS; Phenome Project A list. CAST/Ei and CAST/EiJ are the same strain. The addition of the "J" is trivial and was added when stock were transferred from Dr. Eicher's lab to the Jackson Laboratory production facility in about 2004. (JAX Stock Number: 000928)
- CBA/CaJ: Agouti strain from the Jackson Laboratory. Wildtype pigment genes. (JAX Stock Number: 000654)
- CZECHII/EiJ: Czech 2 is a wild-derived inbred strain M. musculus musculus strain. Samples of this subspecies were caught in the Czech Republic and inbred at the Jackson Laboratory by Eva Eicher. White-bellied agouti. (JAX Stock Number: 001144).
- DBA/2J: The dilute, brown, agouti (dba) strain is the oldest inbred strain of mouse. Inbreeding was started in 1909 by Little. A tyrosinase related protein 1 (Tyrp1 b) brown allele mutant. A myosin 5a (Myo5a d) dilute allele mutant. Sequenced by Perlegen/NIEHS and Celera; paternal parent of the BXD panel; Phenome Project old A group list. (JAX Stock Number: 000671)
- FVB/NJ: Friend's leukemia virus B (FVB) strain. Sequenced by Perlegen/NIEHS and Celera. Tyr c locus albino and a Pdeb6 rd1 mutant derived from Swiss mice at NIH. This has been the most common strain used to make transgenic mice due to large and easily injected oocytes; Phenome Project A list (JAX Stock Number: 001800).
- KK/HlJ: K Kondo's (KK) Kasukabe strain is a homozygous age-related hearing loss (ahl) allele mutant of the Cdh23 gene. A Tyr c locus albino strain. Males have a form of type 2 diabetes. Sequenced by Perlegen/NIEHS. (JAX Stock Number: 002106)
- LG/J: Large (LG) strain. Paternal parent of the Large-by-Small set of RI strains made by James Cheverud and colleagues (the LGXSM panel, not to be confused with the LongXShort or LXS panel). A Tyr c locus albino strain. (JAX Stock Number: 000675)
- LP/J: White-bellied agouit strain with a piebald mutation in the endothelin receptor type B Ednrb gene from at the Jackson Laboratory. Some reduction in melanocytes in choroid of eye due to neural creast migration abnormalities. (JAX Stock Number: 000676)
- MOLF/EiJ: A wild-derived inbred strain derived from M. musculus molossinus samples camputered in Fukuoka, Japan. This strain has the retinal degeneration rd1 allele in Pde6b. There appears to have been some genetic contamination of this strain with conventional inbred strains in the past several decades (F. Pardo, personal communication to RWW, August 2006). However, the strain is currently fully inbred. (JAX Stock Number: 000550)
- NOD/LtJ: Non-obese diabetic strain, originally from M. Hattori in Kyoto, Japan. This is the Edward Leiter (Lt) substrain from the Jackson Laboratory. Collaborative Cross strain sequenced by NIEHS; Phenome Project B list. Homozygous age-related hearing loss (ahl) allele mutant of the Cdh23 gene. A Tyr c locus albino strain. (JAX Stock Number: 001976)
- NZO/HlLtJ: New Zealand Obese strain. This is a severely obese and hypertensive strain. Males often develop a type 2 diabetes. Collaborative Cross strain. Agouti coat color. (JAX Stock Number: 002105)
- NZB/BlNJ: New Zealand Black inbred strain from Bielschowsky (BL, substrain is "B lowercase L N", not "BiN") now maintained at the Jackson Laboratory. (JAX Stock Number: 000648)
- NZW/LacJ: New Zealand White strain from the Laboratory Animal Center (Carshalton, UK), now maintained at the Jackson Laboratory. Carries the Tyr c locus albino mutation, the pink-eye dilution mutation in the Oca2 or p locus, and the brown allele at Tyrp1. (JAX Stock Number: 001058)
- PANCEVO/EiJ: PANCEVO/EiJ is a wild-derived inbred strain from the Mus spicilegus samples caught in the Pancevo, Serbia. This species of mouse is also known as the Steppe mouse (taxon identifier 10103). M. spicilegus is a colonial mound-building species. No known ocular or retina mutations, but the expression level of Gnat2 is low in this strain, either due to a 3' UTR length variant or possible achromatosia (cone degeneration) (JAX Stock Number: 001384)
- PWD/PhJ: A wild-derived Mus musculus musculus agouti strain inbred from samples caught near Prague, Czech Republic. Sequenced by Perlegen/NIEHS; parental strain for a consomic set by Forjet and colleagues. (JAX Stock Number: 004660)
- PWK/PhJ: A wild-derived Mus musculus musculus inbred strain from samples caught near Lhotka, Czech Republic. Collaborative Cross strain; Phenome Project D list. (JAX Stock Number: 003715)
- SJL/J: Swiss Webster inbred strain from Jim Lambert's lab at the Jackson Laboratory. This strain has the retinal degeneration rd1 allele in Pde6b. It also carries both the Tyr c albino mutation and the pink-eye dilution mutation in the Oca2 or p locus. Highly aggressive males. (JAX Stock Number: 000686)
- WSB/EiJ: Watkin Star line B (or "wild son-of-a-bitch") is a wild-derived Mus musculus domesticus inbred strain from samples caught in Maryland, USA. A Collaborative Cross strain sequenced by NIEHS; Phenome Project C list (JAX Stock Number: 001145)
- B6D2F1 and D2B6F1 (also listed as BDF1 and DBF1 in some graphs and tables): F1 hybrids generated by crossing C57BL/6J with DBA/2J. These black reciprocal F1 can be used to detect dominance effects. Comparison of the two reciprocal F1s can be used to detect parental origin (imprinting) effects. The D2B6F1 animals are currently available from the Jackson Laboratory as a special order.) (JAX Stock Number for B6D2F1 hybrids obtained from the Jackson Laboratory, aka B6D2F1/J 100006)
Most of the common inbred strains harbor mutations in genes the control pigmentation (Silvers, 2008 and material above in this INFO file). These gene include the albino and chinchilla alleles of the tyrosinase gene (Tyr, or human OCA1), the brown allele of the tyrosinase related protein 1 (Tyrp1), the pink-eye dilution allele of Oca2 (probe set 1418211), the non-agouti (black) and white-bellied alleles of the agouti signaling protein Asip, the steel allele of Kitlg, the dilute allele of Myo5a (probe set 1419754), and the piebald allele of Ednrb. In some of these cases, effects of the mutation are easily detected at the transcript level (Tyrp1, Oca2, and Myo5a), but in the other cases (Tyr, Asip, Ednrb, and Kitlg), mutations do not leave a strong imprint on expression.
About tissue
Tissue preparation protocol. Animal were killed by rapid cervical dislocation. Eyes were removed immediately and placed in RNAlater at room temperature. Usually six eyes from animals with a common sex, age, and strain were stored in a single tube.
Each array was hybridized with a pool of cRNA from 4 to 8 eyes from 2 to 4 animals. RNA was extracted at UTHSC by Zhiping Jia. If tissue was saved for RNA extraction at a later time, eyes were placed directly in RNAlater (Ambion, Inc.) and treated per the manufacturer’s directions. If eyes were used for immediate RNA extraction then we proceeded immediately to the next steps.
Dissecting and preparing eyes for RNA extraction
- Place eyes for RNA extraction in RNA STAT-60 (Tel-Test Inc.) and process per manufacturer’s instructions (in brief form below).
- Store RNA in 75% ethanol at –80 deg. C until use.
Total RNA was extracted with RNA STAT-60 (Tel-Test Inc.) according to the manufacturer's instructions. Briefly we:
- homogenize tissue samples in the RNA STAT-60 (1 ml/50 to 100 mg tissue)
- allowed the homogenate to stand for 5 min at room temperature
- added 0.2 ml of chloroform per 1 ml RNA STAT-60
- shook the sample vigorously for 15 sec and let the sample sit at room temperature for 3 min
- centrifuged at 12,000 G for 15 min
- transfered the aqueous phase to a fresh tube
- added 0.5 ml of isopropanol per 1 ml RNA STAT-60
- vortexed and allowed sample to stand at room temperature for 5-10 min
- centrifuged at 12,000 G for 10-15 min
- removed the supernatant and washed the RNA pellet with 75% ethanol
- stored the pellet in 75% ethanol at -80 deg C until use
Sample Processing. All samples were processed in the VA Medical Center, Memphis, Rheumatology Disease Research Core Center led by Dr. Weikuan Gu. All arrays were processed by Dr. Yan Jiao. In brief, samples were purified using a standard sodium acetate in alcohol method (recommended by Affymetrix). The RNA quality was checked using a 1% agarose gel. The 18S and 28S bands had to be clear and the 28S band had to be more prominent. RNA concentation was measured using a spectrophotometer. The 260/280 ratios had to be greater than 1.7, and the majority were 1.8 or higher. We used a total of 8 micrograms of RNA as starting amount for cDNA synthesis using a standard Eberwine T7 polymerase method (Superscript II RT, Invitrogen Inc., Affy Part No 900431, GeneChip Expression 3' Amplification One-Cyle cDNA Synthesis Kit). The Affymetrix IVT labeling kit (Affy 900449) was used to generate labeled cRNA. At this point the cRNA was evaluated again using both the 260/280 ratio (values of 2.0 or above were acceptable) and 1% agarose gel inspection of the product (a size range from 200 to 7000 bp is considered suitable for use). We used 45 micrograms of labeled cRNA for fragmentation. Those samples that passed both QC steps (<10% usually fail) were then sheared using a fragmentation buffer included in the Affymetrix GeneChip Sample Cleanup Module (Part No.900371). After fragmentation, samples were either stored at -80 deg. C until use (roughly one third) or were used immediately for hybridization.
Dealing with ocular pigmentation: Variable ocular pigmentation is a potential confound in a study of the whole eye transcriptome. Even the most careful RNA preparations taken from brown and beige colored mice tend to have faint residual pigmentation that affects hybridization signal. To address this problem, Dr. Yan Jiao purified total RNA using the Qiagen RNeasy MinElute Cleanup Kit (Cat No. 74204) all four batches.
Replication, sex, and sample balance: Our goal was to obtain data for independent biological sample pools from both sexes for most lines of mice. The four batches of arrays included in this final data set, collectively represent a reasonably well balanced sample of males and females, in general without within-strain-by-sex replication. Two strains are represented by a single male sample pool (BXD29 and A/J). Four lines are represented by two or three male sample pools (all of the five DeltaGen KO line). The SJL/J may be a single mixed sex sample. Users can study possible sex effects by comparing any results of expression data to that of a surrogate measurement that summarizes the overall sex balance of HEIMED. To do this just compare your data to those of probe sets 1427262_at (Xist, high in females) and probe set 1426438_at (Ddx3y, high in males). These two sex-specific probes are quantitative surrogates for the sex balance in this data set.
Technical duplicates: One sample, highlighted in the tables below, is a technical duplicate. The pair of technical duplicates were both of high quality. For statistical analysis, they should be combined and treated as single biological sample.
Batch structure: This data set consists of four batches (Table 2, far right column). The final September 2008 data set consists of a total of 221 arrays and 220 independent samples.
- Batch 1: November 2005, n = 78 arrays original arrays of which 76 were accepted into this final data set.
- Batch 2: January 2006, n = 62 arrays of which 62 were accepted.
- Batch 3: August 2006, n = 39 arrays of which 36 were accepted. (These three batches, including some arrays that were eventually dropped from the final 2008 data set, were combined to form the September 2006 data set.)
- Batch 4: Summer 2008, n = 53 arrays of which 47 were accepted.
Table 1: HEIMED case IDs, including sample tube ID, strain, age, sex, and source of mice (see Table 2 for information on array quality control)
|
About platform
Affymetrix Mouse Genome 430 2.0 arrays: The 430 2.0 array consists of 992936 25-nucleotide probes that estimate the expression of approximately 39,000 transcripts (many probes overlap and target the same transcript). The array sequences were selected late in 2002 using Unigene Build 107. The array nominally contains the same probe sequences as the old M430A and 430B array pair. However, we have found that roughy 75000 probes differ between those on A and B arrays and those on the new 430 2.0.
As part of the development of HEIMED, we have manually annotated individual probe sets by sequence alignment to the mouse genome and transcriptome. Approximately 13,000 probe sets that have comparatively high expression in eye and CNS were curated by one of the authors (RWW) and now have specific information on the part of the transcript targeted by each probe set. The other 33,000 transcripts have corresponding data that was generated by Xusheng Wang using computational methods (BLAT analysis combined with annotated genome sequence).
One example may help explain how to exploit this annotation. The four probe sets for rhodopsin include information on the target location. Only the first probe set targets the last two coding exons. The other three probe sets target different parts of the 3’ UTR (mid, distal, and far distal regions). The probe sets can be reordered by from high to low expression using the Sort By function in Search Results pages. In the case of rhodopsin, the probe set that targets that last two coding exons and proximal parts of the 3’ UTR also has the highest expression . Finally, the HEIMED gene descriptions have been customized to help vision researchers. In the case of rhodopsin, the description appended after the gene name reads “rod photoreceptor pigment, retinitis pigmentosa-associated”. For less well known genes this kind of annotation can be extremely useful. For example, the more verbose annotation for Cerkl reads “neuronal survival and apoptosis-related, retinal ganglion cell expressed, retinitis pigmentosa 26); alternative 3' UTR of short form message, intron 2”.
Legend: Distribution of expression values for all probe sets in HEIMED.
About data processing
Range of Gene Expression in the Eye. Expression of transcripts in the HEIMED and most other GN data sets is measured on a log2 scale. Each unit corresponding approximately to a 2-fold difference in hybridization signal intensity. To simplify comparisons among different data sets and cases, log2 RMA values of each array have been adjusted to an average expression of 8 units and a standard deviation of 2 units (variance stabilized). Values of all 45,101 probe sets in this data set range from a low of 4.8 (Tcf15, probe set 1420281_at) to a high of 15.5 (crystallin gamma C, Crygc, probe set 1422674_s_at). This corresponds to 10.7 units or a 1 to 1700 dynamic range of expression (2^10.73).
We used pooled RNA samples of whole eyes, usually two independent pools--one male, one female pool--for most lines of mice. This data set was processed using the RMA protocol. A total of 2223 probes sets are associated with LRS values greater than 46 (LOD >10).
We calibrated this log intensity scale using Affymetrix spike-in control probe sets. These 18 control probe sets target exogenous bacterial mRNAs that are added to each sample (a graded dose spike cocktail) during preparation at concentrations of 1.5, 5, 25, and 100 pM. (To find these probe sets, search GN’s ALL search field using the string “AFFX pM”.) A value of 6 or less is equivalent to an mRNA concentration of under 0.4 pM, a value of 8 is equivalent to ~1.5 pM, 9.5 is equivalent to ~5 pM, 11.5 is equivalent to ~25 pM, 13.5 is equivalent to ~100 pM, and a value of 15.5 is equivalent to an mRNA concentration of 400 pM or greater.
This range can be converted to the mRNA molecules per cell in the eye assuming that a value of 8 is equivalent to about 1 mRNA copy per cell (Kanno et al. 2006, see http://www.biomedcentral.com/1471-2164/7/64). Since the expression of rhodopsin mRNA is normally 15 units, we predict that there are 27 or ~128 Rho mRNAs per cell in the whole eye and ~256 in rods themselves (assuming that rods make up about half of all cells in the eye). For this purpose it may be useful to know that a normal mouse eye contains between 6 and 8 million rod photoreceptors (Guo, Lu, and Williams; GN BXD Phenotype ID 11024).
Note that some probe sets with very low expression still provide reliable data. For example, probe set 1440397_at (Cacna2d1) has expression of only 5.5 units (a value that would be declared as "absent" using conventional Affymetrix procedures), but the values for this calcium channel transcript are associated with a very strong cis QTL with an LRS of 79 (LOD = 17). This strong linkage is definitely not due to chance since the probability of the expression data mapping precisely to the location of the parent gene itself is about 10e-16. This indicates a high signal to noise ratio and the detection of significant strain variation of the correct transcript.
The standard error of the mean for the HEIMED data set is computed for 2 to 6 biological replicates. The standard error of such small samples tends to systematically underestimate the population standard error. With n = 2 the underestimate is about 25%, whereas for n = 6 the underestimate is 5%. Gurland and Tripathi (1971) provide a correction and equation for this effect (see Sokal and Rohlf, Biometry, 2nd ed., 1981, p 53 for an equation of the correction factor for small samples of n < 20.) Probe (cell) level data from the CEL file: These CEL values produced by GCOS are 75% quantiles from a set of 91 pixel values per cell. The CEL files were processed using the RMA protocol. We processed the first three batches together. The last batch was processed separately and merged as described below.
- Step 1: We added an offset of 1.0 unit to each cell signal to ensure that all values could be logged without generating negative values. We then computed the log base 2 of each cell.
- Step 2: We performed a quantile normalization of the log base 2 values for the total set of arrays using the same initial steps used by the RMA transform.
- Step 3: We computed the Z scores for each cell value.
- Step 4: We multiplied all Z scores by 2.
- Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
- Step 6: Finally, when appropriate, we computed the arithmetic mean of the values for the set of microarrays for each strain. Technical replicates were averaged before computing the mean for independent biological samples.
After RMA processing using Biobase affy10 build running under R version 2.7.1, all array data sets were rank-order normalized. This second round of quantile normalization removes much residual non-linearity across arrays and forces every array to have the same distribution of values as the mean of all arrays. Comparative array data quality was then evaluated in DataDesk. Outlier arrays were flagged by visual inspection in DataDesk, usually by means of an analysis of scatter plots and more quantitatively by generating a correlation matrix of all arrays. Those arrays with mean correlation <0.96 versus all other arrays indicates trouble or a biological outlier). In some cases, outliers were expected, such as samples from strains with retinal degeneration (FVB/NJ, NOD/LtJ, MOLF/EiJ, C3H/HeJ and BXD24), samples from wild subspecies such as WSB/EiJ, CAST/EiJ, PWD/PhJ, and PWK/PhJ, and knockouts. However, when arrays were anomolous both within strain and across strains, they were often simply discarded. The assumption is that anomolous data are much more likely due to experimental and technical errors than to informative biological variation. Approximately 10% of arrays were discarded.
After this process, the acceptable set of arrays was renormalized using all step as above, starting with the original RMA procedure, etc.
We reviewed the data set using a new method developed by RW Williams, Jeremy Peirce, and Hongqiang Li. For the full set of arrays that passed standard QC protocols described above, we computed the strain means for the BXD strains, B6, D2, and F1s. Using this set of strain means we then computed LRS scores for all 45101 probe sets and counted the number of transcripts that generated QTLs with LRS values greater than 50. This value (e.g., 1800) represented the QTL harvest for the full data set. We then dropped a single array from the data set, recomputed strain means, and recomputed the number of transcripts with LRS scores great than 50. This value is expected to typically reduce the number of QTLs that reach the criterion level (e.g., 1750 QTLs > 50). This process was repeated for every array to obtain an array-specific difference value--the effect of removing that array on the total QTL count. For example, the loss of a single array might cause a decrease in 50 QTLs. Values ranged from approximately -90 (good arrays) to +40 (bad arrays). This procedure is similar in some ways to a jackknife protocol, although we are not using this procedure to esimate an error term, but rather as a method to polish a data set.
During this process we discovered that nearly 20 arrays in the batch 2 had been mislabeled at some point in processing. We computed the correct strain membership of each array using a large number of Mendelian probe sets (more than 50) and comparing their match to standard SNP and microsatellite markers and the original array data set of November 2005. This allowed us to rescue a large number of arrays that were of high quality.
A third batch of approximately 40 arrays were processed by Yan Jiao and Weikuan Gu in August 2006. These complete data set assembled by Hongqiang Li. This process again included a correction for a batch effect.
For the June 2006 data set Hongqiang Li used a new batch correction method that stabilizes the range of expression in each batch. For each of the three large batches, we extracted the minumum and maximum raw probe expression (CEL file level) value. We then adjusted raw probe values in each batch to have the same range as the first and largest batch (batch 1) using a simple linear interpolation. These procedures generated new correct CEL files which were then used with RMA to generate final probe set estimates.
For the final fourth batch of arrays (Sept 2008) Arthur Centeno and Rob Williams corrected for a systematic difference in probe set expression values between original arrays run in 2005 and 2006 and the new arrays added in 2008 (n = 45 acceptable arrays). This difference is due to unknown technical batch effects that are probably associated with labeling, hybridization, and scanning. We performed a simple correction to normalize values of the new set of arrays to those of the old set (batches 1 through 3). No changes were made to any values of the previous three batches. We corrected only the probe set level (RMA) values and not the CEL files. For this final batch, we corrected for the difference (offset) in probe set expression between the first three batches arrays run in 2005 and 2006 (a total of 174 acceptable arrays) and the new batch (n = 47 acceptable arrays). This difference is due to unknown technical effects that are probably related to various steps in labeling, hybridization, and scanning. The correction was applied as follows: (1) RWW selected 51 high quality arrays with similar expression characteristics (r = 0.97 or better between pairs of arrays) in the old data set (from batches 1, 2, and 3) and 34 high quality arrays in the final batch. RWW used scatterplots of full RMA transcriptome data sets to review many pairs of arrays within these new and old array batches. Strains with retinal degeneration or unusual eye gene expression characteristics were excluded from these selected subsets. The average expression values for each probe set were then computed for both the old and new array subsets. The offset value (old minus new) was added to each probe set across all 47 new arrays. This processes forces the average probe set in the new arrays to be very close to that of the previous arrays.
Table 2: Sample tube ID, strain, original CEL filename, and Affymetrix quality control values. Columns labeled Scale factor, Background Average, Present, Absent, Marginal and 3'/5' ratios for actin and Gapdh were collated from the Affymetrix Report (RPT) files.
|
Contributors
Eldon E. Geisert, Lu Lu, Natalie E. Freeman-Anderson, Xusheng Wang, Weikuan Gu, Yan Jiao, Robert W. Williams
Citation
Eldon E. Geisert, Lu Lu, Natalie E. Freeman-Anderson, Xusheng Wang, Weikuan Gu, Yan Jiao, Robert W. Williams. Gene expression landscape of the mammalian eye: A global survey and database of mRNAs of 103 varieties of mice. Molecular Vision 2009; in press.
Acknowledgment
Support for acquisition of microarray data sets was generously provided by Dr. Barrrett Haik, Chair of the Department of Ophthalmology, and director of the Hamilton Eye Institute. Support for the continued development of GeneNetwork was provided by a NIDA/NIMH/NIAAA Human Brain Project grant and from funds from NEI grant to Dr. Eldon Geisert (R01EY017841), an NEI Vision Core grant (EY14080) and an Unrestricted Grant from Research To Prevent Blindness.
We thank Dr. Ted Choi, Chief Scientific Director of Predictive Biology, Inc. (past director of molecular genetics at Deltagen Inc.) for providing us with eye samples from several interesting DeltaGen knockouts.
Notes
This data set is available as a bulk download in several formats. The data are available as either strain means or the individual arrays. Due to the involved normalization procedures required to correct for batch effects we strongly recommend not using the raw CEL files without special statistical procedures.