Transcription of Microarray Analysis - The Basics
1 Microarray AnalysisThe BasicsThomas GirkeDecember 9, 2011 Microarray AnalysisSlide 1/42 TechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisSlide 2/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisTechnologySlide 3/42 Microarray and Chip TechnologyDefinitionHybridization-based technique that allows simultaneousanalysis of thousands of samples on a solid ProfilingGene copy numberResequencingGenotypingSingle-nucle otide polymorphismDNA-protein interaction ( : ChIP-on-chip)Gene discovery ( : Tiling arrays)Identification of new cell technologiesProtein arraysCompound arraysMicroarray AnalysisTechnologySlide 4/42 Why Microarrays?
2 Simultaneous Analysis of thousands of genesDiscovery of gene functionsGenome-wide network analysisAnalysis of mutants and transgenicsIdentification of drug targetsCausal understanding of diseasesClinical studies and field trialsMicroarray AnalysisTechnologySlide 5/42 Different Types of MicroarraysSingle channel approachesAffymetrix gene chipsMacroarraysMultiple channel approachesDual color (cDNA) microarraysSpecialty approachesBead arrays: Lynx, Illumina, ..PCR-based profiling: CuraGen, .. Microarray AnalysisTechnologySlide 6/42 Dual Color MicroarraysMicroarray AnalysisTechnologySlide 7/42 Affymetrix DNA ChipsMicroarray AnalysisTechnologySlide 8/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisChallengesSlide 9/42 Profiling Chips Monitor Differences of mRNA LevelsEfficient strategy for down-stream follow-up experimentsimportant!
3 Microarray AnalysisChallengesSlide 10/42 Strategies to Validate Array HitsReal-time PCR, Northern, testsKnockout plants and/or activation tagged linesProtein profilingMetabolic profilingOther tests: in situ hybs, biochemical and physiological testsIntegration with sequence, proteomics and metabolicdatabasesMicroarray AnalysisChallengesSlide 11/42 Sources of Variation in Transcriptional ProfilingExperimentsEvery step in transcriptional profiling experiments cancontribute to the inherent noise of array in biosamples, RNA quality and target labeling arenormally the biggest noise introducing steps in experimental design and initial calibration experimentscan minimize those AnalysisChallengesSlide 12/42 Experimental DesignBiological questions.
4 Which genes are expressed in a sample?Which genes are differentially expressed (DE) in a treatment, mutant, genes are co-regulated in a series of treatments?Selection of best biological samples and referenceComparisons with minimum number of variablesSample selection: maximum number of expressed genesAlternative reference: pooled RNA of all time points (saves chips)Develop validation and follow-up strategy for expected expression real-time PCR and Analysis of transgenics or mutantsChoose type of experimentcommon reference, : S1 x S1+T1, S1 x S1+T2paired references, : S1 x S1+T1, S2 x S2+T1loop & pooling designsmany other designsAt least three (two) biological replicates are essentialBiological replicates.
5 Utilize independently collected biosamplesTechnical replicates: utilize often the same biosample or RNA poolMicroarray AnalysisChallengesSlide 13/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisData AnalysisSlide 14/42 Basic Data Analysis StepsImage Processing: transform feature and background pixelinto intensity valuesTransformationsRemoval of flagged values (optional)Detection limit (optional)Background subtractionTaking logarithmsNormalizationIdentify EGs and DEGsWhich genes are expressed?Which genes are differentially expressed?
6 Cluster Analysis (time series)Which genes have similar expression profiles?Promoter analysisIntegration with functional information: pathways, AnalysisData AnalysisSlide 15/42 Image AnalysisOverall slide qualityGrid alignment (linkage between spots and feature IDs)Signal quantification: mean, median, threshold, backgroundManual spot flaggingExport to text fileImage Analysis software (selection)ScanAlyze ( )TIGR SpotFinder ( ) Microarray AnalysisData AnalysisSlide 16/42 Background CorrectionFiltering (optional)Intensities below detection limitNegative intensitiesSpacial quality issuesBackground correctionBG consists of non-specific hybridization and backgroundfluorescenceIf BG is higher than signal.
7 (1) remove values, (2) set signal tolowest measured intensity, (3) many other approachesBG subtractionLocal backgroundGlobal backgroundNo background subtractionBackground subtraction can cause ratio inflation, thereforebackground corrected intensities below threshold are often setto threshold or similar AnalysisData AnalysisSlide 17/42 NormalizationNormalization is the process of balancing the intensities of thechannels to account for variations in labeling and hybridizationefficiencies. To achieve this, various adjustment strategies are usedto force the distribution of all ratios to have a median (mean) of 1or the log-ratios to have a median (mean) of AnalysisData AnalysisSlide 18/42 Log Transformation: Scatter PlotsReasons for working with log-transformed intensities and ratios(1) spreads features more evenly across intensity range(2) makes variability more constant across intensity range(3) results in close to normal distribution of intensities and experimental errorsMicroarray AnalysisData AnalysisSlide 19/42 Log Transformation.
8 HistogramsDistribution of log transformed data is closer to being bell-shapedMicroarray AnalysisData AnalysisSlide 20/42 Normalization If Large Fraction of Genes IS DEMinimize normalization requirements(dynamic range limits)Pre-scanning: hybridize equal amounts of labelDuring scanning: balance average intensities through laserpower and PMP adjustmentsNormalization if large fraction of genes is DESpike-in controlsHousekeeping controlsDetermine constant feature setMicroarray AnalysisData AnalysisSlide 21/42 Normalization If Large Fraction of Genes IS NOT DEGlobal Within-Array NormalizationMultiply one channels with normalization factor Ch2 x mCh1/mCh2 (treats both channels differently)Linear regression fit of log2(Ch2) against log2(Ch1) adjust Ch1 with fitted values (treats both channelsdifferently)Linear regression fit of log2(ratios) against avg log2(int)
9 Subtract fitted value from raw log ratios (treats bothchannels equally)Non-linear regression fit of log2(ratios) against avg log2(int)Most commonly used: Loess (locally weighted polynomial)regression joins local regressions with overlapping windows tosmooth curve subtract fitted value on Loess regression from raw logratios (treats both channels equally) Microarray AnalysisData AnalysisSlide 22/42MA PlotsMicroarray AnalysisData AnalysisSlide 23/42 Normalization If Large Fraction of Genes IS NOT DESpacial Within-Array NormalizationAll of the above methods can be used to correct for spacialbias on the array.
10 Examples:Block or Print Tip Loess2D Loess RegressionMicroarray AnalysisData AnalysisSlide 24/42 Normalization If Large Fraction of Genes IS NOT DEBetween-Array NormalizationTo compare ratios between dual-color arrays or intensitiesbetween single-color arraysScaling log(rat) - mean log(rat) or log(int) - mean log(int) Result: mean = 0 Centering (z-value) [rat - mean(rat)] / [STD] or [int - mean(int)] / [STD] Result: mean = 0, STD = 1 Distribution Normalization (apply to group of arrays!) (1) Generate centered data, (2) sort each array byintensities, (3) calculate mean for sorted values across arrays,(4) replace sorted array intensities by corresponding meanvalues, (5) sort data back to original order Result.