GNF Stem Cells U74Av2 (Mar04) RMA

Download datasets and supplementary data files


This data set is now superceeded by the March 2004 RMA data set. The original March 2003 data freeze provides estimates of mRNA expression in hematopoietic stem cells (HSC) from adult female BXD recombinant inbred mice measured using Affymetrix U74Av2 microarrays. Data were generated at the Genomics Institute of the Norvartis Research Foundations (GNF) and by de Haan and colleagues at the University of Groningen. Samples from 22 strains were hybridized to 44 arrays in a single batch. Data were processed using the Microarray Suite 5 (MAS 5) protocol of Affymetrix. To simplify comparison between data sets (HSC and other tissues), the MAS 5 values of each array were log2 transformed and adjusted to an average of 8 units.

Experiment design

About amplification and hybridization:

Total RNA was quantified using RiboGreen and split into equal aliquots of approximately 10 ng, representing RNA from approximately 10,000 cells, and labeled using a total of three rounds of RNA amplification, exactly as described previously (Scherer et al. 2003). Labeled cRNA was fractionated and hybridized to the U74Av2 microarray following standard Affymetrix protocols.

About the chromosome and megabase position values:

The chromosomal locations of probe sets and gene markers were determined by BLAT analysis using the Mouse Genome Sequencing Consortium Oct 2003 Assembly (see We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis.

About cases

BXD recombinant inbred mice were purchased from The Jackson Laboratory and upon arrival were housed under clean conventional conditions in the Central Animal Facility of the University of Groningen, Netherlands. We used female mice between 3 and 6 months old.

Stem cells (described below) were isolated from pooled bone marrow obtained from three BXD animals per strain. Pooled RNA samples were split in two aliquots and each sample was independently amplified and hybridized to the U74Av2 array (3 mice x 2 arrays).

About tissue

Bone marrow cells were flushed from the femurs and tibiae of three mice and pooled. After standard erythrocyte lysis nucleated cells were incubated with normal rat serum for 15 min at 4 degrees Celsius. Subsequently cells were stained with a panel of biotinylated lineage-specific antibodies (murine progenitor enrichment cocktail, containing anti-CD5, anti-CD45R, anti-CD11b, anti-TER119, anti-Gr-1, and anti-7-4, Stem Cell Technologies, Vancouver, Canada), FITC-anti-Sca-1 and APC-anti-c-kit (Pharmingen). Cells were washed twice, and incubated for 30 minutes with streptavidin-PerCP (Pharmingen). After two washes cells were resuspended in PBS with 1% BSA, and purified using a MoFlo flow cytometer. The lineage-depleted bone marrow cell population was defined as the 5% cells showing least PerCP-fluorescence intensity. Stem cell yield across all BXD samples varied from 16,000 to 118,000 Lin-Sca-1+ c-kit+ cells. A small aliquot of each sample of purified cells was functionally tested for stem cell activity by directly depositing single cells in a cobblestone area forming cell assay. The remainder of the cells was immediately collected in RNA lysis buffer. Total RNA was isolated using StrataPrep Total RNA Microprep kit (Stratagene) as described by the manufacturer. RNA pellets were resolved in 500 microliters absolute ethanol, and sent on dry ice by courrier to GNF, La Jolla, CA.

About data processing

About data processing:

Probe (cell) level data from the CEL file: These CEL values produced by MAS 5 are the 75% quantiles from a set of 36 pixel values per cell.
  • Step 1: We added an offset of 1.0 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.
  • Step 2: We took the log2 of each cell.
  • Step 3: We computed the Z score for each cell.
  • Step 4: We multiplied all Z scores by 2.
  • Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
  • Step 6: We computed the arithmetic mean of the values for the set of microarrays for each of the individual strains.
Probe set data from the TXT file: These TXT files were generated using the MAS 5. The same simple steps described above were also applied to these values. Every microarray data set therefore has a mean expression of 8 with a standard deviation of 2. A 1-unit difference therefore represents roughly a two-fold difference in expression level. Expression levels below 5 are usually close to background noise levels.

About the array probe set names:

Most probe sets on the U74Av2 array consist of a total of 32 probes, divided into 16 perfect match probes and 16 mismatch controls. Each set of these 25-nucleotide-long probes has an identifier code that includes a unique number, an underscore character, and several suffix characters that highlight design features. The most common probe set suffix is at. This code indicates that the probes should hybridize relatively selectively with the complementary anti-sense target (i.e., the complemenary RNA) produced from a single gene. Other codes include:

  • f_at (sequence family): Some probes in this probe set will hybridize to identical and/or slightly different sequences of related gene transcripts.
  • s_at (similarity constraint): All Probes in this probe set target common sequences found in transcripts from several genes.
  • g_at (common groups): Some probes in this set target identical sequences in multiple genes and some target unique sequences in the intended target gene.
  • r_at (rules dropped): Probe sets for which it was not possible to pick a full set of unique probes using the Affymetrix probe selection rules. Probes were picked after dropping some of the selection rules.
  • i_at (incomplete): Designates probe sets for which there are fewer than the standard numbers of unique probes specified in the design (16 perfect match for the U74Av2).
  • st (sense target): Designates a sense target; almost always generated in error.

Descriptions for the probe set extensions were taken from the Affymetrix GeneChip Expression Analysis Fundamentals.



Scherer A, Krause A, Walker JR, Sutton SE, Seron D, Raulf F, Cooke MP (2003) Optimized protocol for linear RNA amplification and application to gene expression profiling of human renal biopsies. Biotechniques 34:546-550, 552-554, 556.

de Haan G, Bystrykh LV, Weersing E, Dontje B, Geiger H, Ivanova N, Lemischka IR, Vellenga E, Van Zant G (2002) A genetic and genomic analysis identifies a cluster of genes associated with hematopoietic cell turnover Blood 100:2056-2062.

Wang J, Williams RW, Manly KF (2003) WebQTL: Web-based complex trait analysis. Neuroinformatics 1: 299-308.

Williams RW, Manly KF, Shou S, Chesler E, Hsu HC, Mountz J, Wang J, Threadgill DW, Lu L (2002) Massively parallel complex trait analysis of transcriptional activity in mouse brain. International Mouse Genome Conference 16:46.


Cell and samples were generated by Leonid V. Bystrykh, Ellen Weersing, Bert Dontje, Gerald de Haan, Department of Stem Cell Biology, University of Groningen, the Netherlands. RNA amplification and array processing were carried out by Michael Cooke, John Hogenesch, Andrew Su, and colleagues at GNF.

Data normalization and conversion for WebQTL were handled by Robert Williams, Kenneth Manly, Jintao Wang, and Yanhua Qu at UTHSC and Roswell Park Cancer Institute.


Information about this text file:

This text file originally generated by GdH and RWW, March 2003. Updated by RWW, October 30, 2004.