Adv Physiol Educ Information on EB 2010
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Advan. Physiol. Edu. 26: 256-270, 2002;
1043-4046/02 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (41)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Murphy, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murphy, D.
ADV PHYSIOL EDUC 26:256-270, 2002
© 2002 American Physiological Society

APS REFRESHER COURSE REPORT

GENE EXPRESSION STUDIES USING MICROARRAYS: PRINCIPLES, PROBLEMS, AND PROSPECTS

David Murphy

University of Bristol Research Centre for Neuroendocrinology, Bristol Royal Infirmary, Bristol BS2 8HW, England

Abstract

Anumber of mammalian genomes having been sequenced, an important next step is to catalog the expression patterns of all transcription units in health and disease by use of microarrays. Such discovery programs are crucial to our understanding of the gene networks that control developmental, physiological, and pathological processes. However, despite the excitement, the full promise of microarray technology has yet to be realized, as the superficial simplicity of the concept belies considerable problems. Microarray technology is very new; methodologies are still evolving, common standards have yet to be established, and many problems with experimental design and variability have still to be fully understood and overcome. This review will describe the time course of a microarray experiment—RNA isolation from sample, target preparation, hybridization to the microarray probe, data capture, and bioinformatic analysis. For each stage, the advantages and disadvantages of competing techniques are compared, and inherent sources of error are identified and discussed.

Key words: functional genomics; microarray; cDNA; oligonucleotide; bioinformatics

The sequencing projects that have elucidated the human genome (http://www.nature.com/cgi-taf/dynapage.taf?file=/nature/journal/v409/n6822/index.html, http://www.ncbi.nlm.nih.gov/genome/guide/human/, http://www.sciencemag.org/content/vol291/issue5507/, and Refs. 8, 52) and will soon reveal the genetic complement of key model vertebrate organisms such as the mouse (http://www.ncbi.nlm.nih.gov/genome/guide/mouse/index.html), rat (http://www.ncbi.nlm.nih.gov/genome/guide/R_norvegicus.html), puffer fish (http://fugu.hgmp.mrc.ac.uk/), and zebra fish (http://www.ncbi.nlm.nih.gov/genome/guide/D_rerio.html) must rank among the greatest achievements of human civilization. This is all the more so as these efforts have been truly international—the Human Genome Sequencing Consortium includes scientists at 16 institutions in France, Germany, Japan, China, the United Kingdom, and the United States. Furthermore, public and charitable funding has ensured that genome data are freely available to all, enabling information to be put to immediate use to the maximal possible benefit of all humankind.

Despite the magnificent efforts, the exact number of genes needed to make a human being is still not known. The computing technologies that recognize transcription units in raw genomic information are still being developed. Current estimates range from 28,000 to 120,000 (9, 17, 28, 33, 42, 52). Irrespective of the exact number, the challenge remains to determine exactly what all of these genes do in terms of the development and physiological functioning of the organism. This is the task of a newly emerging discipline—functional genomics. A crucial aspect of functional genomics is the description of global expression patterns. In this regard, we need to address two questions:

For many years, molecular biologists tackled the analysis of one or a few genes at a time. Techniques relied on the use of nucleic acid (in situ hybridization, Northern blotting, RNase protection, etc.) or antibody (immunocytochemistry, Western blotting, etc.) probes. Descriptions of gene expression were a haphazard process, largely dependent on the opportunistic availability of these probes and the intuition of researchers. The availability of genome information and the parallel development of microarray technology have provided the means to perform global analyses of the expression of thousands of genes in a single assay (15, 27). The results provide an assessment of the expression levels of the genes included on the microarray in a particular cell, tissue or organ. The basic concept of microarray analysis is simple (Fig. 1). RNA is harvested from a cell type or tissue of interest and labeled to generate the target—the free nucleic acid sample whose identity or abundance is being detected. This is hybridized to the tethered probe DNA sequences corresponding to specific genes that have been affixed, in a known configuration, onto a solid matrix. Hybridization, based on Watson and Crick base pairing, between probe and target provides a quantitative measure of the abundance of a particular sequence in the target population. This information is captured digitally and subjected to various analyses to extract biological information. Comparison of hybridization patterns enables the identification of mRNAs that differ in abundance in two or more target samples (Fig. 1). Thus microarrays provide a powerful tool with which to screen biological specimens for alterations in the expression of mRNAs that accompany, and may regulate physiological and pathological change.



View larger version (27K):
[in this window]
[in a new window]
 
FIG. 1 Course of a typical microarray experiment, starting from the collection of biological samples (in this case, brain) from which the targets are derived, through the preparation and hybridization of the target, to the microarray probe and the elaboration and analysis of gene expression profiling data (bioinformatics), to the biological conclusions that one might wish to derive. Each stage is a source of variability, namely tissue source, tissue harvesting, microarray platforms, microarray variability, sample labeling, data analysis, and human error! The identification and control of these sources of error are a crucial aspect of present microarray development. In this particular case, the samples give very similar hybridization patterns but reveal a number of differentially expressed genes (arrows).

 
Before we consider in detail the many reasons for not carrying out a microarray study, we should iterate the reasons why this technique has elicited so much excitement in recent years. Microarrays are a novel way of getting fast answers to obvious questions. Although microarray "discovery research" is not in itself hypothesis driven, rigorous but unbiased data gathering can narrow the search for genes that represent the best targets for future hypothesis-driven studies. Thus microarray techniques are finding utility in However, the superficial simplicity of the microarray concept belies considerable problems, and the full promise of the technology has yet to be realized. Microarray technology is very new; methodologies are still evolving, common standards have yet to be established, and many problems with experimental design and variability have still to be fully understood and overcome. Each stage in the course of a microarray experiment is thus a source of error (Fig. 1).

BIOLOGICAL SAMPLES

Microarray analysis has revealed a large degree of variability in gene expression patterns in particular tissues, even within supposedly genetically identical individuals. In addition, differences can be expected as a consequence of time of day, age, and physiological state (including sex and stage of the estrous cycle) and as a response to disease. Furthermore, variability can be introduced into a microarray experiment by way of sample heterogeneity and sampling error. The only way to overcome these difficulties is to

What Is Normal?

Before we can interpret microarray data related to physiological and pathological transitions, we need to answer the question, "What is normal?" For any system under study, we need to understand the magnitude and diversity of gene expression in the unperturbed state. Different strains of the same species have been shown to exhibit marked variability in the overall pattern of gene expression in the brain. Sandberg et al. (44) compared the expression profiles of more than 10,000 genes in six brain regions of two inbred mouse lines, C57Bl/6 and 129SvEv. Around 1% of genes were identified as being differentially expressed in at least one brain region. This is not unexpected, and indeed, these gene expression differences may represent a functional substrate for physiological and behavioral differences between strains. Similarly, it has been shown that gene expression patterns differ greatly between genetically identical individuals of the same age, sex, and physiological condition. Comparison of the expression profiles of 5,406 genes in a number of tissues of the C57Bl/6 inbred mouse strain revealed considerable variability, ranging from 0.8% in the liver to 3.3% in the kidney (40). It is accepted that, for any particular parameter, physiological "normalcy" is not a strict value but is rather a range of values presented by healthy individuals. It is perhaps no surprise that "normal" gene expression displays similar variability. Such variability between individuals emphasizes the need to pool samples, and to perform replicates.

Sample Collection

Ideally, a tissue sample from which target RNA is isolated must be pure. Although this might be possible for cultured cells, it is impossible for samples obtained from animals or from patient material due to the cellular heterogeneity of tissues. This is a problem compounded by human error and the variability inherent in tissue collection.

Whereas microarrays can be seen as a new and precise way to define cellular phenotype based on patterns of gene expression, it follows that the ideal sampling methodology should enable the analysis of the cellular contents of single cells or of groups of selected cells (12, 14). The mRNA content of an individual cell can be isolated using a patch pipette to penetrate the cell membrane. The cellular contents are then aspirated into the pipette and transferred to a microcentrifuge tube for RNA isolation and target preparation. Samples can be obtained from randomly selected cells, from electrophysiologically characterized cells or from defined cell types in transgenic mice that specifically express reporters, such as enhanced green fluorescent protein (55). This level of analysis has now been taken to the subcellular level. Previous studies have revealed that a complex subset of mRNAs is present within the dendritic subdomain of neurons, where their local translation may contribute to synaptic plasticity (35). Microarray analysis has been used to screen for genes that are enriched in neuronal dendrites in response to various stimuli. A patch pipette was used to harvest individual dendrites and cell soma from primary rat hippocampal neuronal cultures treated with (RS)-3,5-dihydroxyphenylglycine, a metabotrobic glutamate receptor agonist that modulates protein translation in dendrites. Targets derived from these samples were used to analyze microarrays, which revealed that a few mRNAs changed in abundance as a result of stimulation (12).

Another method for rapidly procuring pure, targeted, single or multiple cells from specific microscopic regions of tissue sections is laser capture microdissection (LCM) (6). A tissue sample is covered with a transparent plastic film and observed under the light microscope. The cells of interest having been identified, a focused infrared laser beam is activated. The heat of the beam melts the film, causing it to adhere to the targeted cells, which then can be lifted away, leaving the rest of the tissue section intact. Bonaventure et al. (4) have elegantly validated this approach by cataloguing gene expression profiles of seven rat brain nuclei or subnuclei, thus identifying putative specific markers of potential functional importance. Arcturus sells the industry standard LCM machine (http://www.arctur.com/about/technology/technology_lcm.htm).

ARRAY PLATFROMS

There are two microarray platforms in common use—cDNA microarrays, which utilize cloned probe molecules corresponding to characterized expressed sequences, and oligonucleotide microarrays, made of synthetic probe sequences based on database information (20).

cDNA Microarrays

A cDNA microarray comprises a collection of gene sequences [usually pure PCR products ranging in size from 100–2,000 bp derived from cDNA and expressed sequence tag (EST) clones] that are applied individually to precise locations on a solid matrix, usually nylon or glass. Nylon membranes have a number of advantages:

Glass, too, has its advantages: Two different targets derived from two samples to be compared can be labeled with different fluors and simultaneously hybridized with a glass microarray in a single competitive reaction. In contrast, nylon microarrays are generally probed in serial or parallel hybridization reactions.

Fabrication of cDNA microarrays. For both glass and membrane matrixes, each microarray element is generated by the deposition of a few nanoliters of purified PCR product at a concentration of 100–500 µg/ml. Spots are typically 100 µm in diameter and can be deposited at a density of up to 20,000 features/cm2. Spotting is achieved by contact (mechanical microspotting) or noncontact (ink jetting) methods.

mechanical microspotting.
A DNA sample is loaded into a spotting pin by capillary action, and a small volume is transferred to a solid surface by physical contact between the pin and the solid substrate. After the first spotting cycle, the pin is washed, and a second sample is loaded and deposited to an adjacent address. Robotic control systems and multiplexed print heads allow automated microarray fabrication (29, 46).

ink jetting.
A DNA sample is loaded into a miniature nozzle equipped with a piezoelectric fitting (or other form of propulsion), which is used to expel a precise amount of liquid from the jet onto the substrate. After the first jetting step, the nozzle is washed and a second sample is loaded and deposited to an adjacent address. A repeat series of cycles with multiple jets enables rapid microarray production (47).

A number of contact and noncontact robotic arraying systems are commercially available (http://ihome.cuhk.edu.hk/~b400559/array.html#Arrayer/%20Spotter and http://www.lab-on-a-chip.com/files/maauto.pdf). Prefabricated cDNA chips are available from a number of suppliers (18, 20, and http://ihome.cuhk.edu.hk/~b400559/array.html#Microarray%20slide and http://www.lab-on-a-chip.com/suppliers/inform.html).

Target labeling and hybridization of cDNA microarrays. Sample RNA is converted to target by use of the enzyme reverse transcriptase, an oncoretroviral enzyme that uses RNA as a template for the synthesis of a single-stranded cDNA. Reverse transcriptase requires a short primer to initiate cDNA synthesis, and this is usually provided by oligo(dT), which anneals to the poly(A) tail found at the 3' end of the vast majority of mammalian mRNAs. The label incorporated into the cDNA can be either radioactive or fluorescent. Radioactive target is generated by incorporation of [33P]dCTP, a relatively weak emitter that reduces interference between the closely physically juxtaposed microarray elements. Clearly, the use of a radioactive target requires that comparison of different targets must be carried out using serial hybridizations to the same microarray or by parallel analyses using separate microarrays.

An advantage of fluorescence detection is that competitive hybridization to the same microarray (usually glass, see above) can be used to compare targets derived from different samples. The relative hybridization of the targets labeled with different fluors to the same probe can be readily quantified. The fluorescent labels Cy3-dUTP and Cy5-dUTP are frequently paired, as they have high incorporation efficiencies with reverse transcriptase and good photostability and yield and are widely separated in their excitation and emission spectra, allowing highly discriminating optical filtration. However, it should be noted that the different fluors produce targets with different characteristics. Thus microarray experiments must either be repeated with the fluors swapped around or be performed with the same fluor in parallel on different probes.

RNA purity is a critical factor in hybridization performance, particularly when fluorescence is used, as cellular protein, lipid, and carbohydrate can mediate significant nonspecific binding of labeled cDNAs to matrix surfaces.

A limitation of cDNA microarray technology is the large amount of RNA required to produce an adequate signal over noise (11). This is a particular issue with low-abundance transcripts. Fluorescence detection requires >=10 µg of total RNA (equivalent to a million cells), whereas radioactive detection enables detection with as little as 0.1 µg of starting total RNA (10,000 cells). However, as described above, the ultimate aim is to carry our expression profiling with as few cells as possible, preferably single cells. For targets to be derived from such samples, some form of amplification process needs to be incorporated into the procedure. PCR (43) is a highly efficient method for exponentially amplifying a population of single-stranded cDNA. However, the nonlinear amplification results in a target in which sequence representation is skewed compared with the original mRNA pool. In contrast, the amplified antisense RNA (aRNA) procedure (13, 50) is a linear procedure that produces a target more representative of the initial mRNA population. An mRNA sample is converted into cDNA using an oligo(dT) primer that contains a bacteriophage T7 RNA polymerase promoter site. After the cDNA is rendered double stranded, T7 RNA polymerase is used to transcribe antisense RNA copies. The procedure can produce up to 106-fold amplification. Both Ambion (http://www.ambion.com/catalog/CatNum.php?1750) and Arcturus (http://www.arctur.com/products/riboamp_main.htm) sell linear amplification kits.

Data capture. Once targets have been hybridized to probes and the microarray has been washed to remove as much unbound and nonspecifically bound target as possible, the array must be scanned to determine how much target is bound to each probe spot. Data are captured from microarrays hybridized with 33P-labeled target by means of a phosphorimager system (e.g., the Molecular Dynamics Storm and Typhoon machines; http://www.mdyn.com). Microarrays hybridized with fluorescent targets are stimulated with a laser. The emitted light is then captured by either a charge-coupled device or a confocal scanner. A number of companies produce machines for scanning fluorescently labeled microarrays (http://ihome.cuhk.edu.hk/~b400559/array.html#Scanner and http://www.lab-on-a-chip.com/files/mascanner.pdf).

Advantages of cDNA microarrays. cDNA microarrays are a relatively accessible and cost-effective technology. Hybridization does not need specialized equipment, and data capture can be carried out using equipment that is very often already available in the laboratory. Prefabricated microarrays are relatively cheap, and custom chip manufacture is within the reach of many researchers, affording flexibility of design as necessitated by the scientific goals of the experiment.

The long target sequences (<=2 kbp) increases detection sensitivity.

Disadvantages of cDNA microarrays. Sequence homologies between clones representing different closely related members of the same gene family may result in a failure to specifically detect individual genes. However, closely related genes can often be distinguished by using probes corresponding to the 3'-untranslated region of an mRNA, as these regions often display gene-specific sequence diversity.

The state of the double-stranded DNA on the microarray is ill defined and may well have constraining contacts with the matrix and inter- and intrastrand cross-links that will affect hybridization.

Each sample must be synthesized, purified, and stored before microarray fabrication.

Microarray fabrication is dependent on the curation of extensive clone sets. Even the best maintained sets are prone to mix-ups, with clones not containing the sequence that they are supposed to. Halgren et al. (24) sequenced 1,189 IMAGE consortium cDNAs (http://image.llnl.gov/) obtained commercially from Research Genetics (http://www.resgen.com). Only 62.2% were uncontaminated and contained cDNA inserts that had significant sequence identity to published data for the ordered clones; 7.1% contained both a correct and an incorrect plasmid; and 5.9% contained multiple, distinct, incorrect plasmids, indicating the likelihood of multiple contaminating events. Through this kind of analysis will emerge systems that will enable the better curation of clone stocks.

Oligonucleotide Microarrays

Oligonucleotide microarrays are made by synthesizing single-stranded probes on the basis of sequence information in databases. A number of technologies are available (http://www.lab-on-a-chip.com/suppliers/inform.html). For example, oligonucleotide synthesis has been combined with ink jet spotting. Motorola Life Sciences (http://www.motorola.com/lifesciences) synthesize 30-mer oligonucleotides "’offline" and spots them onto slides coated with a three-dimensional, branched polymeric substrate gel surface (Motorola Life Sciences recently sold their microarray business to Amersham Biosciences; http://www.amersham.com). Aligent Technologies (http://www.chem.agilent.com/Scripts/IDS.asp?lPage=1624), in partnership with Rosetta Inpharmatics (http://www.rii.com), has described oligonucleotide synthesis in situ using an ink-jet printing method employing standard phosphoramidite chemistry (26).

Affymetrix GeneChips. The industry leader in the field of oligonucleotide microarrays is undoubtedly Affymetrix Corporation (http://www.affymetrix.com), which uses photolithography-directed combinatorial chemical synthesis to manufacture so-called GeneChips, microarrays bearing hundreds of thousands of different oligonucleotides on a derivatized glass surface (Figs. 2 and Refs. 19, 34, 40). By use of the Affymetrix Fluidics system, GeneChips are hybridized with fragments (35–200 residues long) of biotinylated target RNA derived from 5 µg of total cell RNA or 0.2 µg of poly(A)-selected mRNA. Hybridized probe is recognized by a streptavadin-phycoerythrin conjugate, and then the fluorescent image is captured using Affymetrix Microarray Reader.



View larger version (27K):
[in this window]
[in a new window]
 
FIG. 2 Principles underlying the manufacture and use of Affymetrix oligonucleotide arrays. A: GeneChip manufacture. 1) A silicon wafer solid support is derivatized with a covalent linker molecule terminated with a photolabile protecting group (X). Light is then directed through a mask to deprotect and activate selected sites. 2) The wafer is then flooded with a photoprotected (X) DNA base, resulting in spatially defined coupling on the chip surface. Each base is represented by a different symbol (square, star, circle, cartwheel). 3) A second photomask is used to deprotect different, but defined, regions of the wafer. 4) The wafer is then flooded with a second photoprotected DNA base. 5) Repeated deprotection and coupling cycles enable the preparation of high-density oligonucleotide microarrays. 6) The resultant chip bears ~400,000 groups of oligonucleotides in an area of ~1.6 cm2, with each feature containing ~10 million oligonudeotides of a given sequence. B: Probe selection. A model gene consists of a promoter (TATA) driving the transcription of a pre-mRNA consisting of two exons (I and II) separated by an intron. Transcription termination occurs downstream of the polyadenylation signal (AAUAAA). Introns are removed by splicing to produce the mature translated mRNA consisting of a 5'-untranslated region (5' UTR), the coding sequences, and a 3' UTR. Based on cDNA, expressed sequence tags (EST), and genomic sequence information corresponding to mature mRNA sequences, <=20 different independent 25-mer and minimally overlapping oligonucleotides are selected to serve as sensitive, unique sequence detectors for each transcript (in this case i to vi). These are the perfect match (PM) sequences. Mismatch (MM) control probes are identical to their perfect match partners except for a single base difference in a central position (arrow). Comparison of the PM and MM signal following hybridization allows automatic background subtraction.

 
Advantages of Affymetrix GeneChips. Because probes are based entirely on sequence information, there is no need to prepare and verify physical intermediates such as bacterial clones, PCR products, or cDNAs. As such, there is less possibility of probe mix-ups. However, Affymetrix has experienced some problems in this regard. Up to one-quarter of the sequences on one set of mouse microarrays were wrong. The company had used sequences from the public sequence databases that were known to be ambiguous and that actually corresponded to the wrong strand from the DNA double helix. As a result, the oligonucleotides could not detect their target mRNAs (http://www.affymetrix.com/support/technical/product_updates/mgu74_product_bulletin.affx).

The use of multiple, short-sequence detectors enables splice variants and closely related members of a gene family to be distinguished (Fig. 2B). By use of probes representing regions of genes that significantly diverge or "are significantly unique" between family members, microarrays can distinguish transcripts that are up to 90% identical.

For each probe designed to be perfectly complementary to a target sequence, a partner probe is generated that is identical except for a single base mismatch in its center (Fig. 2B). This probe mismatch strategy, along with the use of multiple probes for each transcript, helps identify and minimize the effects of nonspecific hybridization and background signal and allows the direct subtraction of cross-hybridization signals and discrimination between real and nonspecific signals.

Short-chain oligonucleotides with single points of constraint are probably more accessible for hybridization to target than cDNA probes.

Disadvantages of Affymetrix GeneChips. There are several disadvantages to the Affymetrix GeneChips. First is a need for access to expensive specialized equipment. Second, oligonucleotide chips are only availabile from commercial manufacturers. Custom oligonucleotide microarrrays can be commisioned, but at great expense. Third, prefabricated GeneChips are themselves very expensive, although the price, particularly for academic users, is falling. Fourth, although short-sequence probes confer high specificity, they may have decreased sensitivity/binding compared with cDNA microarrays. Low sensitivity is compensated for by employing multiple probes.

Comparability of Different Microarray Platforms

Technical problems inherent in probe manufacture and use still confound the extraction of meaningful data from comparative microarray experiments. Such sources of variability include

Furthermore, little work has been carried out on the comparability of different microarray platforms. In the absence of common standards, different platforms are likely to give different results. Difficulties with intraplatform comparability will be compounded by different gene and probe choices.

ANALYSIS OF MICROARRAY DATA

Microarray experiments produce a huge amount of data. A single microarray run can produce between 100,000 and a million data points, and a typical experiment may require tens or hundreds of runs (21). Quite simply, for the first time in the history of the biomedical sciences, our ability to generate data in vast quantities is running ahead of our ability to make sense of them. Moving from data to knowledge is a considerable challenge. Although procedures for the assessment, curation, and presentation of microarray data are rapidly evolving, statistical approaches are neither routine nor standardized (10, 76). The Microarray Gene Expression Database (MGED; 53) consortium has the goal of facilitating the adoption of common standards for microarray experiment annotation and data representation, as well as the introduction of standard experimental controls, and data normalization methods. The projects being pursued by MGED are:

The hope is that researchers will be able to share MAGE-compatible data seamlessly. Some central internet sites have been established for depositing microarray data (e.g., http://www.ebi.ac.uk/arrayexpress).

A variety of microarray analysis software packages are available from commercial and academic sources (http://ihome.cuhk.edu.hk/~b400559/array.html#Software and http://www.lab-on-a-chip.com/suppliers/inform.html).

Low-Level Analysis

Primary image data having been collected from a microarray experiment, the aims of the first level of analysis, so-called low-level analysis, are background elimination, filtration, and normalization, all of which should contribute to the removal of systematic variation between chips, enabling group comparisons. Background noise is removed from cDNA microarrays by subtracting nonspecific signal from spot signal. In contrast, preprocessing of Affymetrix data is intrinsic to the perfect match and mismatch strategy (Fig. 2B). Normalization in both cases involves comparing different microarrays relative to some standard intensity value. This could be the overall intensity of the microarray, the overall intensity of all of the genes on the microarray, the intensity of so-called housekeeping genes (the expression of which are supposedly constant), or spiked targets, containing a known and constant amount of a labeled control. Negative normalization controls might be represented by target sequences from a different organism. Data are often then subjected to log transformation to improve the characteristics of the distribution of the expression values.

High-Level Analysis

High-level microarray analysis is often called "data mining," the uncovering of relevant patterns of interest in data from a particular problem domain. Typically this will involve data processing using various statistical techniques to identify the patterns. In addition, data needs to be packaged, presented, archived, and compared with other types of information.

Statistical analysis. The statistical analysis of microarray data is probably the most difficult problem associated with the use of these techniques. The aim is to apply standard statistical approaches to determine gene expression and gene expression alteration significance, thus enabling the extraction of significant biological information from a morass of noise and variability. However, present methodologies do not deal well with the number of possible combinations. Statisticians are experienced with handling data involving a limited number of variables, but a large number of samples (e.g., the average weight of persons in England is a problem of a single variable and 49 million samples). Microarrays turn this problem on its head, producing thousands of variables from a small number of samples. A number of different methods have been explored.

fold change.
Simple and intuitive, this method, involves the calculation of a ratio relating the expression level of a gene under control and experimental conditions. An arbitrary ratio (usually 2-fold) is then selected as being "significant." Because this ratio has no biological merit, this approach amounts to nothing more than a blind guess. The selection of an arbitrary threshold results in both low specificity (false positives, particularly with low-abundance transcripts or when a data set is derived from a divergent comparison) and low sensitivity (false negatives, particularly with high-abundance transcripts or when a data set is derived from a closely linked comparison). It is now accepted that the use of the fold change method should be discontinued.

unusual ratio.
This method selects genes for which the ratio of control and experimental values is an arbitrarily selected distance from the mean control-to-experimental ratio. This is usually taken to be ±2 standard deviations. This can be calculated by applying z-transformation, subtraction of the mean, and division by the standard deviation to the log ratio values. As a fixed proportion threshold is used, the unusual-ratio method will always identify the most affected genes. However, genes will be reported, even if there are no differentially expressed genes. Although flawed, the unusual-ratio method is commonly used for the analysis of cDNA microarrays.

univariate statistics.
If log ratios follow a normal distribution, a probability (P value) that the gene is erroneously reported as being differentially regulated above a given threshold can be assigned using a univariate statistical test (e.g., the t-test). However, such tests require correction. From a statistical point of view, interrogating R genes on a microarray is the same as running R parallel tests. The Bonferroni correction takes this into account, and adjusts P to P/R. However, because a microarray experiment involves an R of thousands, no differentially expressed genes would ever be reported as reaching significance. Less conservative correction methods have been reported (10).

analysis of variance.
Ultimately, the analysis of microarray data, and the selection of differentially expressed genes, will be achieved by analysis of variance (ANOVA) based on explicit experimental models.

Identifying patterns in microarray data. The output from the analysis of a microarray experiment is usually a large data spreadsheet filled with numbers related to the signal intensity for each gene on the chip. Further analysis is required to identify groups of genes that are similarly regulated across the biological samples under study. A variety of mathematical procedures have been developed that partition genes or samples into groups, or clusters, with maximum similarity, thus enabling the identification of gene signatures or informative gene subsets. Methods for classification are either unsupervised or supervised. Supervised methods use existing biological information about specific genes that are functionally related to "guide" or "test" the cluster algorithm. With unsupervised methods, no prior test set is required.

The most commonly employed unsupervised classification methods are the clustering techniques (16). They fall within the categories of hierarchical and nonhierarchical (partitional) clustering. Most cluster analysis techniques are hierarchical; the resultant classification has an increasing number of nested classes, and the result resembles a phylogenetic classification. Hierarchical clustering has the advantage that it is simple and the result can be easily visualized. Nonhierarchical clustering techniques, such as k-means clustering (51), partition objects into different clusters without trying to specify the relationship between/among individual elements. A self-organizing map [SOM (48)] is a neural-network-based divisive clustering approach. A SOM assigns genes to a series of partitions on the basis of the similarity of their expression vectors to reference vectors that are defined for each partition. It is the process of defining these reference vectors that distinguishes SOMs from k-means clustering.

Principal component analysis [PCA (1, 41)] is a mathematical decomposition technique that picks out the most abundant themes to reoccur in an experiment. A set of expression patterns, called principal components, is identified, and linear combinations of these are assembled to represent the behavior of genes in a data set. PCA can be applied to both genes and experiments as a means of classification. In most implementations of PCA, it is difficult to define accurately the precise boundaries of distinct clusters in the data or to define genes (or experiments) belonging to each cluster. However, PCA is a powerful technique for the analysis of gene expression data when used with another classification technique, such as k-means clustering or SOMs, that requires the user to specify the number of clusters.

One approach to supervised modeling is linear discriminant analysis (LDA), which uses a training set consisting of all classes of interest and then tries to set up a model that classifies an unknown sample unambiguously into one of the already established classes [(23) http://www.stat.berkeley.edu/users/terry/zarray/Html/discr.html).

Relational and functional databases. Microarray data need to be interpreted within the context of gene function and the functional relationships between genes. This demands relating microarray data with existing biological knowledge. However, this project has had to face up to the linguistic ambiguities of the existing scientific literature; supposedly rigid, solid scientific concepts are often couched in imprecise terms. What is needed is a common gene language. Thus the aim of the Gene Ontology Consortium (http://www.geneontology.org) is to "produce a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing."

dChip. One of the best Affymetrix analysis packages is dChip (31, 32, 45), available online at no cost (http://www.dchip.org). dChip is based on a statistical model for Affymetrix expression data at the probe level. This approach facilitates automatic probe selection in the analysis stage to reduce errors caused by outliers, cross-hybridizing probes, and image contamination. Data are then normalized using a rank-selection method. The program selects a set of genes with the property that the rank of a gene in this set according to its expression measurement in one microarray is similar to its rank using values for the second microarray. Genes thus selected tend to be nondifferentially expressed, and this forms a valid basis for the computation of a normalization relation. By pooling information across multiple microarrays, it is possible to assess standard errors for the model-based expression indexes (MBEI) calculated for each gene. After obtaining MBEIs, dChip can perform some high-level analysis, such as hierarchical and functional clustering, involving ANOVA-based gene filtering, comparative analysis, PCA, and LDA. Some of these functions require R (http://www.r-project.org), a statistical package and language, as the engine for computational and graphic tasks.

CONFIRMATIONAL STUDIES

Because of the statistical issues raised by microarray technology, it is necessary that findings be confirmed using independent methodological criteria, preferably with separate samples rather than with the tissue or RNA used to derive the original targets.

A rapid, high through-put, but expensive, method for confirmation of microarray data is quantitative (real time) RT-PCR using the TaqMan (Applied Biosystems; http://www.appliedbiosystems.com/products/productdetail.cfm?prod_id=42), iCycler (Bio-Rad Laboratories; http://www.bio-rad.com/iCycler/), LightCycler (Roche Diagnostics; http://www.lightcycler-online.com/) machines. TaqMan PCR (http://www.appliedbiosystems.com) exploits the 5'-nuclease activity of Taq DNA polymerase in conjunction with DNA probes labeled with quencher and reporter dyes. A positive PCR reaction results in the removal of the reporter dye from the influence of the quencher dye, leading to an increase in measurable fluorescence. The real-time reaction information allows quantification of target nucleic acid. Advantages of TaqMan are that it is a closed-tube assay, reducing the risk of contamination, and no post-PCR processing (such as gel electrophoresis) is required. Multiple reactions, detecting more than one sequence per reaction, are possible using different quencher dyes.

Alternatively, Northern blots or ribonuclease protection assays provide the benefit of direct quantification. Finally, in situ hybridization can be used as a sensitive measure of gene expression changes in specific cell types within a mixed tissue. This is important, as significant gene expression changes detected on a microarray may be related to a small fraction of the cells in a tissue.

As steady-state levels of RNA are not necessarily reflective of the final steady-state level of the functional protein translation product of an mRNA, further studies might involve the use of specific antibody probes in Western blot or immunocytochemical studies.

Because a microarray experiment may reveal putative changes in the expression of tens or hundreds of genes, it is practically impossible to confirm all of the data. However, it is incumbent upon investigators to evaluate a reasonable number of genes. That said, confirmational studies may raise other issues. Although a microarray experiment might indicate an increase or decrease in the expression of a gene, an independent method might reveal a greater or a lesser change. Does such a result represent sufficient confirmation of the microarray findings, or does a quantitative difference raise new questions about the validity of the microarray data?

PRESENT AND FUTURE CHALLENGES

Hardware

Scientists in academia and industry are diligently addressing the technical problems of microarrays. The quality, reproducibility, comparability, sensitivity, and dynamic range of microarrays will improve. In 1965, Gordon Moore, the founder of Intel, observed that the number of transistors per semiconductor chip doubles every 18–24 months (36). Microarrays are on a similar trajectory. In 1998, an Affymetrix microarray contained fewer than 1,000 genes; by 2000, it boasted of 12,000. The ultimate aim is to represent all of the expressed sequences of the genome on a single chip. Toward this end, Affymetrix has recently released the Human Genome U133 GeneChip Set, comprised of two microarrays containing almost 45,000 probe sets corresponding to more than 39,000 transcript variants representing greater than 33,000 of the best characterized human genes (http://www.affymetrix.com/products/arrays/specific/hgu133.affx). However, until the day when all transcription units have been identified, microarrays will remain incomplete. Although this is acceptable, microarrays should be unbiased in their selection of genes.

Another factor driving the development of bigger chips is cost; as size and volume increase, prices will surely drop.

Software

We continue to "search for a body of mathematics that will serve as a natural language for gene expression information" (54). The role of this mathematics will be to

It is to be anticipated that, as these aims are realized, human judgement and expertise will be replaced by artificial intelligence.

Experimental Design

Careful contemplation of the ultimate research objective for a study will ensure that appropriate type and number of treatment groups are incorporated into an experimental design. Every microarray study should include a sufficient number of independent experiments to allow statistical evaluation of claims of an increase or decrease in gene expression (30, 38). The number of microarrays and replicates needed to achieve statistical significance is dependent on the coefficient of variation. Reproducibility must be demonstrated, including rigorous evaluation of the run-to-run variability for each gene. This will permit appropriate adjustments to be made that will reduce the false discovery rate.

Another challenge that can be overcome by good experimental design is the need to distinguish among primary and secondary effects and subsequent events. An initial perturbation of a biological system will induce gene expression changes that will be followed by more alterations related to secondary, cellular changes, and subsequent modulations at the organismal level. All of these levels of cause-and-effect plasticity are of interest and could be dissected by incorporating a broad range of time points. Similarly, effects not directly related to an experimental perturbation can be eliminated by inclusion of appropriate control groups. For example, if studying gene expression changes as a consequence of drug interactions with a specific receptor, controls might include comparisons with groups using a receptor pathway inhibitor or a nonactive analog.

TOWARD AN UNDERSTANDING OF GENE FUNCTION

The microarray-based approach to the problem of gene function clusters genes according to their expression behavior under defined conditions and to assign function. The hypothesis of this "guilt-by-association" approach is that clustered genes may be coregulated and therefore may be involved in similar functions. However, sequence and expression analysis alone is insufficient to fully inform us about gene function. To make sense of these data, the hypotheses that emerge from analysis of systemic expression information must be tested empirically. This will involve the integration of genomic knowledge with biochemistry, cell biology, genetics, structural biology, and proteomics. Ultimately, hypotheses must be tested within the physiological integrity of the whole organism. This will demand the development of a new, high-throughput systems biology coupled with rapid and efficient gene transfer techniques.

Acknowledgments

The Wellcome Trust is thanked for support. Dr. Mohamed Ghorbel and Greig Sharman (University of Bristol) are thanked for critical and constructive comments on the manuscript.

Address for reprint requests and other correspondence: D. Murphy, Univ. of Bristol Research Centre for Neuroendocrinology, Bristol Royal Infirmary, Marlborough St., Bristol BS2 8HW, UK (E-mail: d.murphy{at}bristol.ac.uk)

Received for publication August 22, 2002. Accepted for publication August 23, 2002.

REFERENCES

  1. Alter O, Brown PO, and Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97: 10101–10106, 2000.[Abstract/Free Full Text]
  2. Altman RB. Whole-genome expression analysis: challenges beyond clustering. Curr Opp Structural Biol 11: 340–347, 2001.
  3. Banerjee N and Zhang MQ. Functional genomics as applied to mapping transcription regulatory networks. Curr Opin Microbiol 5: 313–317, 2002.[Web of Science][Medline]
  4. Bonaventure P, Guo H, Tian B, Liu X, Bittner A, Roland B, Salunga R, Ma XJ, Kamme F, Meurers B, Bakker M, Jurzak M, Leysen JE, and Erlander MG. Nuclei and subnuclei gene expression profiling in mammalian brain. Brain Res 943: 38–47, 2002.[Web of Science][Medline]
  5. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, and Vingron M. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet 29: 365–371, 2001.[Web of Science][Medline]
  6. Burgess JK and Hazelton RH. New developments in the analysis of gene expression. Redox Rep 5: 63–73, 2000.[Web of Science][Medline]
  7. Cao Y and Dulac C. Profiling brain transcription: neurons learn a lesson from yeast. Curr Opin Neurobiol 11: 615–620, 2001.[Web of Science][Medline]
  8. Clarke PA, te Poele R, Wooster R, and Workman P. Gene expression microarray analysis in cancer biology, pharmacology, and drug development: progress and potential. Biochem Pharmacol 62: 1311–1336, 2001.[Web of Science][Medline]
  9. Das M, Burge CB, Park E, Colinas J, and Pelletier J. Assessment of the total number of human transcription units. Genomics 77: 71–78, 2001.[Web of Science][Medline]
  10. Draghici S. Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today 7: S55–S63, 2002.[Web of Science][Medline]
  11. Duggan DJ, Bittner M, Chen Y, Meltzer P, and Trent JM. Expression profiling using cDNA microarrays. Nat Genet 21, Suppl: 10–14, 1999.
  12. Eberwine J, Kacharmina JE, Andrews C, Miyashiro K, McIntosh T, Becker K, Barrett T, Hinkle D, Dent G, and Marciano P. mRNA expression analysis of tissue sections and single cells. J Neurosci 2: 8310–8314, 2001.
  13. Eberwine J, Yeh H, Miyashiro K, Cao Y, Nair S, Finnell R, Zettel M, and Coleman P. Analysis of gene expression in single live neurons. Proc Natl Acad Sci USA 89: 3010–3014, 1992.[Abstract/Free Full Text]
  14. Eberwine J. Single-cell molecular biology. Nat Neurosci 4, Suppl: 1155–1156, 2001.[Web of Science][Medline]
  15. Eisen MB and Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol 303: 179–205, 1999.[Web of Science][Medline]
  16. Eisen MB, Spellman PT, Brown PO, and Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868, 1998.[Abstract/Free Full Text]
  17. Ewing B and Green P. Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet 25: 232–234, 2000.[Web of Science][Medline]
  18. Fitzgerald DA and Guimbellot JS. Tailored Arrays. The Scientist 5: 26, 2001. [online] http://www.thescientist.com/yr2001/sep/profile1_010917.html
  19. Fodor SPA, Read JL, Pirrung MC, Stryer L, Lu AT, and Solas D. Light-directed, spatially addressable parallel chemical synthesis. Science 251: 767–773, 1991.[Abstract/Free Full Text]
  20. Gershon D. Microarray technology: an array of opportunities. Nature 416: 885–891, 2002.[Medline]
  21. Goodman N. Biological data becomes computer literate: new advances in bioinformatics. Curr Opin Biotechnol 13: 68–71, 2002.[Web of Science][Medline]
  22. Greenberg SA. DNA microarray gene expression analysis technology and its application to neurological disorders. Neurology 57: 755–761, 2001.[Abstract/Free Full Text]
  23. Hakak Y, Walker JR, Li C, Wong WH, Davis KL, Buxbaum JD, Haroutunian V, and Fienberg AA. Genome-wide expression analysis reveals dysregulation of myelination-related genes in chronic schizophrenia. Proc Natl Acad Sci USA 98: 4746–4751, 2001.[Abstract/Free Full Text]
  24. Halgren RG, Fielden MR, Fong CJ, and Zacharewski TR. Assessment of clone identity and sequence fidelity for 1189 IMAGE cDNA clones. Nucleic Acids Res 29: 582–588, 2001.[Abstract/Free Full Text]
  25. Heck DE, Roy A, and Laskin JD. Nucleic acid microarray technology for toxicology: promise and practicalities. Adv Exp Med Biol 500: 709–714, 2001.[Web of Science][Medline]
  26. Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH, and Linsley PS. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 19: 342–347, 2001.[Web of Science][Medline]
  27. King HC and Sinha AA. Gene expression profile analysis by DNA microarrays. JAMA 286: 2280–2288, 2001.[Abstract/Free Full Text]
  28. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, Szustakowki J, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, and Chen YJ. Initial sequencing and analysis of the human genome. Nature 409: 860–921, 2001.[Medline]
  29. Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, Hwang SY, Brown PO, and Davis RW. Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci USA 94: 13057–13062, 1997.[Abstract/Free Full Text]
  30. Lee ML, Kuo FC, Whitmore GA, and Sklar J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci USA 97: 9834–9839, 2000.[Abstract/Free Full Text]
  31. Li C and Wong WH. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2: research0032.1–0032.11, 2001.
  32. Li C and Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA 98: 31–36, 2001.[Abstract/Free Full Text]
  33. Liang F, Holt I, Pertea G, Karamycheva S, Salzberg SL, and Quackenbush J. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat Genet 24: 239–240, 2000.
  34. Lipshutz RL, Fodor SPA, Gingeras TR, and Lockhart DL. High density synthetic oligonucleotide arrays. Nat Genet 21, Suppl: 20–24, 1999.[Web of Science][Medline]
  35. Miyashiro K, Dichter M, and Eberwine J. On the nature and distribution of mRNAs in hippocampal neurites: implications for neuronal functioning. Proc Natl Acad Sci USA 91: 10800–10804, 1994.[Abstract/Free Full Text]
  36. Moore GE. Cramming more components onto integrated circuits. Electronics 38: 114–117, 1965.
  37. Nadon R and Shoemaker J. Statistical issues with microarrays: processing and analysis. Trends Genet 18: 265–271, 2002.[Web of Science][Medline]
  38. Pan W, Lin J, and Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 3: research0022, 2002.[Medline]
  39. Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, and Foder SPA. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci USA 91: 5022–5026, 1994.[Abstract/Free Full Text]
  40. Pritchard CC, Hsu L, Delrow J, and Nelson PS. Project normal: defining normal variance in mouse gene expression. Proc Natl Acad Sci USA 98: 13266–13271, 2001.[Abstract/Free Full Text]
  41. Raychaudhuri S, Stuart JM, and Altman RB. Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific Symposium on Biocomputing 5: 452–463, 2000.
  42. Roest Crollius H, Jaillon O, Bernot A, Dasilva C, Bouneau L, Fischer C, Fizames C, Wincker P, Brottier P, Quetier F, Saurin W, and Weissenbach J. Estimate of human gene number provided by genomewide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet 25: 235–238, 2000.[Web of Science][Medline]
  43. Saiki RK, Bugawan TL, Horn GT, Mullis KB, and Erlich HA. Analysis of enzymatically amplified beta-globin and HLA-DQ alpha DNA with allele-specific oligonucleotide probes. Nature 324: 163–166, 1986.[Medline]
  44. Sandberg R, Yasuda R, Pankratz DG, Carter TA, Del Rio JA, Wodicka L, Mayford M, Lockhart DJ, and Barlow C. Regional and strain-specific gene expression mapping in the adult mouse brain. Proc Natl Acad Sci USA 97: 11038–11043, 2000.[Abstract/Free Full Text]
  45. Schadt EE, Li C, Su C, and Wong WH. Analyzing high-density oligonucleotide gene expression array data. J Cell Biochem 80: 192–202, 2001.
  46. Schena M, Shalon D, Davis RW, and Brown PO. Quantitative monitoring of gene expression pattern with a complementary DNA microarray. Science 270: 467–470, 1995.[Abstract/Free Full Text]
  47. Shalon D, Smith SJ, and Brown PO. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 6: 639–645, 1996.[Abstract/Free Full Text]
  48. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E, and Golub T. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96: 2907–2912, 1999.[Abstract/Free Full Text]
  49. An’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AMA, Mao M, Hans L, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, and Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–536, 2002.[Medline]
  50. Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD, and Eberwine JH. Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci USA 87: 1663–1667, 1990.[Abstract/Free Full Text]
  51. Varela JC, Goldstein MH, Baker HV, and Schultz GS. Microarray analysis of gene expression patterns during healing of rat corneas after excimer laser photorefractive keratectomy. Invest Ophthalmol Vis Sci 43: 1772–1782, 2002.[Abstract/Free Full Text]
  52. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, and Zhu X. The sequence of the human genome. Science 291: 1304–1351, 2001.[Abstract/Free Full Text]
  53. Wyrick JJ and Young RA. Deciphering gene expression regulatory networks. Curr Opin Genet Dev 12: 130–136, 2002.[Web of Science][Medline]
  54. Young RA. Biomedical discovery with DNA arrays. Cell 102: 9–15, 2000.[Web of Science][Medline]
  55. Young WS III, Iacangelo A, Luo XZ, King C, Duncan K, and Ginns EI. Transgenic expression of green fluorescent protein in mouse oxytocin neurones. J Neuroendocrinol 11: 935–939, 1999.[Web of Science][Medline]
  56. Zhu Z, Pilpel Y, and Church GM. Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J Mol Biol 318: 71–81, 2002.[Web of Science][Medline]



This article has been cited by other articles:


Home page
BloodHome page
J. W. Shin, R. Huggenberger, and M. Detmar
Transcriptional profiling of VEGF-A and VEGF-C target genes in lymphatic endothelium reveals endothelial-specific molecule-1 as a novel mediator of lymphangiogenesis
Blood, September 15, 2008; 112(6): 2318 - 2326.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. G. Bjorklund, C. Natanaelsson, A. E. Karlstrom, Y. Hao, and J. Lundeberg
Microarray analysis using disiloxyl 70mer oligonucleotides
Nucleic Acids Res., March 27, 2008; 36(4): 1334 - 1342.
[Abstract] [Full Text] [PDF]


Home page
The OncologistHome page
R. S. N. Fehrmann, X.-y. Li, A. G. J. van der Zee, S. de Jong, G. J. te Meerman, E. G. E. de Vries, and A. P. G. Crijns
Profiling Studies in Ovarian Cancer: A Review
Oncologist, August 1, 2007; 12(8): 960 - 966.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Morrison, J. Hurley, J. Garcia, K. Yoder, A. Katz, D. Roberts, J. Cho, T. Kanigan, S. E. Ilyin, D. Horowitz, et al.
Nanoliter high throughput quantitative PCR
Nucleic Acids Res., October 6, 2006; 34(18): e123 - e123.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
C. Hindmarch, S. Yao, G. Beighton, J. Paton, and D. Murphy
A comprehensive description of the transcriptome of the hypothalamoneurohypophyseal system in euhydrated and dehydrated rats
PNAS, January 31, 2006; 103(5): 1609 - 1614.
[Abstract] [Full Text] [PDF]


Home page
Health Informatics JournalHome page
M. G. Tyshenko and W. Leiss
Current trends in publicly available genetic databases
Health Informatics Journal, December 1, 2005; 11(4): 295 - 308.
[Abstract] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (41)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Murphy, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murphy, D.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online