Monks CEPH B-cells Agilent (Dec04) Log10Ratio

Download datasets and supplementary data files


PUBLISHED DATA SET: This is the first human data set entered into GeneNetwork and not all features have been implemented. You can currently explore and use the data for expression analysis and correlations among transcripts. However, mapping functions have not been implemented. These data were provided by Stephanie Santorico and are taken from her paper (Monks et al., 2004).

Probes are mapped to UCSC Genome Browser hg18. Please update array annotation to hg19 (RWW to Arthur C. Sept 2009).


Combining genetic inheritance information, for both molecular profiles and complex traits, is a promising strategy not only for detecting quantitative trait loci (QTLs) for complex traits but for understanding which genes, pathways, and biological processes are also under the influence of a given QTL. As a primary step in determining the feasibility of such an approach in humans, we present the largest survey to date, to our knowledge, of the heritability of gene-expression traits in segregating human populations. In particular, we measured expression for 23,499 genes in lymphoblastoid cell lines for members of 15 Centre d'Etude du Polymorphisme Humain (CEPH) families. Of the total set of genes, 2,340 were found to be expressed, of which 31% had significant heritability when a false-discovery rate of 0.05 was used. QTLs were detected for 33 genes on the basis of at least one P value <.000005. Of these, 13 genes possessed a QTL within 5 Mb of their physical location. Hierarchical clustering was performed on the basis of both Pearson correlation of gene expression and genetic correlation. Both reflected biologically relevant activity taking place in the lymphoblastoid cell lines, with greater coherency represented in Kyoto Encyclopedia of Genes and Genomes database (KEGG) pathways than in Gene Ontology database pathways. However, more pathway coherence was observed in KEGG pathways when clustering was based on genetic correlation than when clustering was based on Pearson correlation. As more expression data in segregating populations are generated, viewing clusters or networks based on genetic correlation measures and shared QTLs will offer potentially novel insights into the relationship among genes that may underlie complex traits.

About cases

About the cases and families used to generate this set of data:

The text below is taken from Monks et al. (2004). We will add additional new annotation over the next several months.

Families: Fifteen families from the CEPH/Utah family collection were selected for profiling. The family identifiers were 1334, 1340, 1345, 1346, 1349, 1350, 1358, 1362, 1375, 1377, 1408, 1418, 1421, 1424, and 1477. These families were selected because of the availability of genotypes and lymphoblastoid cell lines for all three generations and because of their large numbers of children. In total, the families represent 210 individuals. Of these, 167 individuals provided adequate quantity and quality of RNA for expression profiling.

About tissue

Tissue Growth, Processing, and Profiling: Lymphoblastoid cell lines were obtained from Coriell Repositories and propagated. All cell lines were grown in media and supplements purchased from the Invitrogen Corporation. The culture media consisted of RPMI supplemented with 15% fetal bovine serum, 1% penicillin/streptomycin, and 0.5% sodium pyruvate. To minimize variability between experiments, all fetal bovine serum used was from lot number 10082147 1129480. The cell lines were grown at 37°C in humidified incubators, in an atmosphere of 5% CO2.

Experiment series were set up by seeding 25-ml cultures in T25 flasks at a density of 2.5×105 cells/ml. Each culture was grown for 48 h or until the cell density was at least 780,000 cells/ml. To harvest the cells, the cultures were centrifuged, the media was decanted, and 500 μl of guanidine isothiocynate cell lysis buffer (Buffer RLT, Qiagen) was added. Cell lysates were then transferred to 96-well block format and stored at −80°C.

Total RNA was isolated using RNeasy 96 kits (Qiagen) with the following protocol modifications. Harvesting of cells was performed in 500 μl, instead of in the 150 μl specified by the protocol. To eliminate DNA contamination, the appended DNase protocol was used in concert with the isolation protocol. DNase was added to the membrane after the first 350-μl RW1 wash (guanidinium thiocyanate and ethanol) and was allowed to sit on an RNeasy membrane for 30 min. An additional 350-μl RW1 buffer wash and an additional 500-μl RPE buffer wash were performed.

To quantitate and perform quality control on the experiments, the A260/A280 ratio was taken through use of a Spectramax spectrophotometer (Molecular Devices). Samples whose A260/A280 ratio deviated ±0.2 from the accepted ratio value of 2.0 were excluded. Formaldehyde gels (1.2%) were run on each sample to ensure that ribosomal RNA bands were intact and that significant degradation had not occurred. Samples that met the minimal mass requirement of 13 μg (for two replicates) and whose ribosomal bands were visible in the QC gel were transferred from the 96-well block and aliquoted into microcentrifuge tubes by use of a Multiprobe II EX (Packard BioScience Company). For samples of individuals that were to be used in the pool, 46 μg of RNA was allocated by use of the same procedure. In total, 167 individuals in 15 pedigrees provided adequate quantity and quality of RNA for expression profiling.

The microcentrifuge tubes were vacuum dried and stored at −80°C before processing. Dried total RNA samples were reconstituted, and 3 μg of total RNA was used from each sample for subsequent RT-PCR–in vitro transcription amplification using the T7 promoter, which produced allyl-UTP–labeled single-stranded complementary RNA (sscRNA) (Hughes et al. 2001). Amplified cRNA was purified using the RNeasy purification kit (Qiagen) and was coupled with either cy3 or cy5 (Hughes et al. 2001). Purified cy3/cy5-labeled cRNA was fragmented using a ZnOAc/EDTA addition and was hybridized to at least two DNA microarray slides with fluor reversal for 24 h in a hybridization chamber, washed, and scanned using a laser confocal scanner (Hughes et al. 2001). Arrays were quantified on the basis of the intensity of each spot relative to background, by use of the Qhyb program (Rosetta Inpharmatics) (Marton et al. 1998).

Expression profiling of lymphoblastoid cell lines was performed using a 25K human gene oligonucleotide microarray. All individuals were compared with a common pool created from equal portions of RNA from all samples that passed quality control and were from founders within the 15 pedigrees (Gene Expression Omnibus Web site). Sequences for the microarray were selected from the RefSeq database (NCBI Reference Sequence Web site; see the Electronic-Database Information section for genes and accession numbers) and EST contigs (van’t Veer et al. 2002).

Genotype Data and Genetic Maps: GENOTYPE DATA HAVE NOT YET BEEN INTEGRATED INTO GENENETWORK. Genotype data for 346 autosomal genetic markers for 210 of the pedigree members were obtained from the CEPH genotype database, version 9.0 (CEPH Genotype Database Web site). Genetic markers were selected from the 14,404 markers represented in the full database, so that at least 75% of the pedigrees had genotypes available for at least 75% of the families. The median intermarker distance was 11 cM, on the basis of the deCODE genetic map (Kong et al. 2002). Marker-allele frequencies available from the CEPH genotype database were used for estimating identity-by-descent probabilities.

Statistical Methods : MORE TO COME: For each profile, genes were tested to assess differential expression relative to the pool, by use of procedures described elsewhere (Hughes et al. 2000). For each transcript/probe, the value is measured as the gene expression for an individual compared with that of the pool.

About platform

GEO Platform information at GEO Rosetta (Merck) custom-commercial array, GPL564

About data processing

Data provided by Stephanie Monks Santorico, University of Colorado, Denver (Oct 8, 2008). Annotation files to follow late Oct 2008. Mapping functions will not be implemented until 2009.

Data entry by Arthur Centeno, Oct 30, 2008.

Agilent annotation entry by Hongqiang Li and Xusheng Wang, Oct 31, 2008.

This annotation file started by Robert W. Williams, Oct 15, 2008. Most text taken from Monks et al., 2004. Last update, Nov 17, 2008 by RWW.


Monks SA1Leonardson AZhu HCundiff PPietrusiak PEdwards SPhillips JWSachs ASchadt EE.

1Department of Statistics, Oklahoma State University, Stillwater, OK 74078-1056, USA.


Monks SA, Leonardson A, Zhu H, Cundiff P et al. Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 2004 Dec;75(6):1094-105. PMID: 15514893