Example: confidence

Microarray Analysis - The Basics

Microarray AnalysisThe BasicsThomas GirkeDecember 9, 2011 Microarray AnalysisSlide 1/42 TechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisSlide 2/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisTechnologySlide 3/42 Microarray and Chip TechnologyDefinitionHybridization-based technique that allows simultaneousanalysis of thousands of samples on a solid ProfilingGene copy numberResequencingGenotypingSingle-nucle otide polymorphismDNA-protein interaction ( : ChIP-on-chip)Gene discovery ( : Tiling arrays)Identification of new cell technologiesProtein arraysCompound arraysMicroarray AnalysisTechnologySlide 4/42 Why Microarrays?Simultaneous Analysis of thousands of genesDiscovery of gene functionsGenome-wide network analysisAnalysis of mutants and transgenicsIdentification of drug targetsCausal understanding of diseasesClinical studies and field trialsMicroarray AnalysisTechnologySlide 5/42 Different Types of MicroarraysSingle channel approachesAffymetrix gene chipsMacroarraysMultiple channel approachesDual color (cDNA) microarraysSpecialty approachesBead arrays: Lynx, Illumina.

Sources of Variation in Transcriptional Pro ling Experiments Every step in transcriptional pro ling experiments can contribute to the inherent ’noise’ of array data. Variations in biosamples, RNA quality and target labeling are normally the biggest …

Tags:

  Analysis, Basics, Transcriptional, Microarray, Microarray analysis the basics

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Microarray Analysis - The Basics

1 Microarray AnalysisThe BasicsThomas GirkeDecember 9, 2011 Microarray AnalysisSlide 1/42 TechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisSlide 2/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisTechnologySlide 3/42 Microarray and Chip TechnologyDefinitionHybridization-based technique that allows simultaneousanalysis of thousands of samples on a solid ProfilingGene copy numberResequencingGenotypingSingle-nucle otide polymorphismDNA-protein interaction ( : ChIP-on-chip)Gene discovery ( : Tiling arrays)Identification of new cell technologiesProtein arraysCompound arraysMicroarray AnalysisTechnologySlide 4/42 Why Microarrays?Simultaneous Analysis of thousands of genesDiscovery of gene functionsGenome-wide network analysisAnalysis of mutants and transgenicsIdentification of drug targetsCausal understanding of diseasesClinical studies and field trialsMicroarray AnalysisTechnologySlide 5/42 Different Types of MicroarraysSingle channel approachesAffymetrix gene chipsMacroarraysMultiple channel approachesDual color (cDNA) microarraysSpecialty approachesBead arrays: Lynx, Illumina.

2 PCR-based profiling: CuraGen, .. Microarray AnalysisTechnologySlide 6/42 Dual Color MicroarraysMicroarray AnalysisTechnologySlide 7/42 Affymetrix DNA ChipsMicroarray AnalysisTechnologySlide 8/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisChallengesSlide 9/42 Profiling Chips Monitor Differences of mRNA LevelsEfficient strategy for down-stream follow-up experimentsimportant! Microarray AnalysisChallengesSlide 10/42 Strategies to Validate Array HitsReal-time PCR, Northern, testsKnockout plants and/or activation tagged linesProtein profilingMetabolic profilingOther tests: in situ hybs, biochemical and physiological testsIntegration with sequence, proteomics and metabolicdatabasesMicroarray AnalysisChallengesSlide 11/42 Sources of Variation in transcriptional ProfilingExperimentsEvery step in transcriptional profiling experiments cancontribute to the inherent noise of array in biosamples, RNA quality and target labeling arenormally the biggest noise introducing steps in experimental design and initial calibration experimentscan minimize those AnalysisChallengesSlide 12/42 Experimental DesignBiological questions:Which genes are expressed in a sample?

3 Which genes are differentially expressed (DE) in a treatment, mutant, genes are co-regulated in a series of treatments?Selection of best biological samples and referenceComparisons with minimum number of variablesSample selection: maximum number of expressed genesAlternative reference: pooled RNA of all time points (saves chips)Develop validation and follow-up strategy for expected expression real-time PCR and Analysis of transgenics or mutantsChoose type of experimentcommon reference, : S1 x S1+T1, S1 x S1+T2paired references, : S1 x S1+T1, S2 x S2+T1loop & pooling designsmany other designsAt least three (two) biological replicates are essentialBiological replicates: utilize independently collected biosamplesTechnical replicates: utilize often the same biosample or RNA poolMicroarray AnalysisChallengesSlide 13/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisData AnalysisSlide 14/42 Basic Data Analysis StepsImage Processing.

4 Transform feature and background pixelinto intensity valuesTransformationsRemoval of flagged values (optional)Detection limit (optional)Background subtractionTaking logarithmsNormalizationIdentify EGs and DEGsWhich genes are expressed?Which genes are differentially expressed?Cluster Analysis (time series)Which genes have similar expression profiles?Promoter analysisIntegration with functional information: pathways, AnalysisData AnalysisSlide 15/42 Image AnalysisOverall slide qualityGrid alignment (linkage between spots and feature IDs)Signal quantification: mean, median, threshold, backgroundManual spot flaggingExport to text fileImage Analysis software (selection)ScanAlyze ( )TIGR SpotFinder ( ) Microarray AnalysisData AnalysisSlide 16/42 Background CorrectionFiltering (optional)Intensities below detection limitNegative intensitiesSpacial quality issuesBackground correctionBG consists of non-specific hybridization and backgroundfluorescenceIf BG is higher than signal.

5 (1) remove values, (2) set signal tolowest measured intensity, (3) many other approachesBG subtractionLocal backgroundGlobal backgroundNo background subtractionBackground subtraction can cause ratio inflation, thereforebackground corrected intensities below threshold are often setto threshold or similar AnalysisData AnalysisSlide 17/42 NormalizationNormalization is the process of balancing the intensities of thechannels to account for variations in labeling and hybridizationefficiencies. To achieve this, various adjustment strategies are usedto force the distribution of all ratios to have a median (mean) of 1or the log-ratios to have a median (mean) of AnalysisData AnalysisSlide 18/42 Log Transformation: Scatter PlotsReasons for working with log-transformed intensities and ratios(1) spreads features more evenly across intensity range(2) makes variability more constant across intensity range(3) results in close to normal distribution of intensities and experimental errorsMicroarray AnalysisData AnalysisSlide 19/42 Log Transformation: HistogramsDistribution of log transformed data is closer to being bell-shapedMicroarray AnalysisData AnalysisSlide 20/42 Normalization If Large Fraction of Genes IS DEMinimize normalization requirements(dynamic range limits)Pre-scanning: hybridize equal amounts of labelDuring scanning.

6 Balance average intensities through laserpower and PMP adjustmentsNormalization if large fraction of genes is DESpike-in controlsHousekeeping controlsDetermine constant feature setMicroarray AnalysisData AnalysisSlide 21/42 Normalization If Large Fraction of Genes IS NOT DEGlobal Within-Array NormalizationMultiply one channels with normalization factor Ch2 x mCh1/mCh2 (treats both channels differently)Linear regression fit of log2(Ch2) against log2(Ch1) adjust Ch1 with fitted values (treats both channelsdifferently)Linear regression fit of log2(ratios) against avg log2(int) subtract fitted value from raw log ratios (treats bothchannels equally)Non-linear regression fit of log2(ratios) against avg log2(int)Most commonly used: Loess (locally weighted polynomial)regression joins local regressions with overlapping windows tosmooth curve subtract fitted value on Loess regression from raw logratios (treats both channels equally) Microarray AnalysisData AnalysisSlide 22/42MA PlotsMicroarray AnalysisData AnalysisSlide 23/42 Normalization If Large Fraction of Genes IS NOT DESpacial Within-Array NormalizationAll of the above methods can be used to correct for spacialbias on the array.

7 Examples:Block or Print Tip Loess2D Loess RegressionMicroarray AnalysisData AnalysisSlide 24/42 Normalization If Large Fraction of Genes IS NOT DEBetween-Array NormalizationTo compare ratios between dual-color arrays or intensitiesbetween single-color arraysScaling log(rat) - mean log(rat) or log(int) - mean log(int) Result: mean = 0 Centering (z-value) [rat - mean(rat)] / [STD] or [int - mean(int)] / [STD] Result: mean = 0, STD = 1 Distribution Normalization (apply to group of arrays!) (1) Generate centered data, (2) sort each array byintensities, (3) calculate mean for sorted values across arrays,(4) replace sorted array intensities by corresponding meanvalues, (5) sort data back to original order Result: mean = 0, STD = 1, identical distribution betweenarraysMicroarray AnalysisData AnalysisSlide 25/42 Box Plots for Between-Array Normalization StepsMicroarray AnalysisData AnalysisSlide 26/42 Analysis Methods for Affymetrix Gene ChipsMethodBG Adjust Normalization MM Correct Probeset SummaryMAS5regionalscaling bysubtractTukey biweightadjustmentconstantidealized MMaveragegcRMAby GCquantile/robust fit ofcontentnormalizationlinear modelRMAarrayquantile/robust fit ofbackgroundnormalizationlinear modelVSN/variance/robust fit ofstabilizing TFlinear modeldChip/by invariantsubtractmultiplicativesetmismat chmodelQin et al.

8 (2006), BMC Bioinfo, 7 : Affymetrix Documentation: MAS5 PLIER: Affymetrix Documentation: PLIER, not included heregcRMA: Wu et al. (2004), JASA, 99, : Irizarry et al. (2003), Nuc Acids Res, 31, : Huber et al. (2002), Bioinformatics, 18, Suppl I & : Li & Wong (2001), PNAS, 98, AnalysisData AnalysisSlide 27/42 Performance Comparison of Affy MethodsQin et al. (2006), BMC Bioinfo, 7:23: 24 RNA samples hybridized to chips and 47genes tested by qRT-PCR, plot shows PCC for 6 summary contrasts of 6 , gcRMA, and dChip (PM-MM) outperform the other methods. PLIER notincluded AnalysisData AnalysisSlide 28/42 Analysis of Differentially Expressed GenesAdvantages of statistical test over fold change threshold forselecting DE genesIncorporates variation between measurementsEstimate for error rateDetection of minor changesRanking of DE genesApproachesParametric test: t-testNon-parametric tests: Wilcoxon sign-rank/rank-sum testsBootstrap Analysis (boot package)Significance Analysis of Microarrays (SAM)Linear Models of Microarrays (LIMMA)Rank ProductANOVA and MANOVA (R/maanova)Multiplicity of testing: p-value adjustmentsMethods: fdr, bonferroni, AnalysisData AnalysisSlide 29/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisData DepositoriesSlide 30/42 Microarray Databases and DepositoriesNCBI GEO: @ EBI.

9 OthersMicroarray AnalysisData DepositoriesSlide 31/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisR and BioConductorSlide 32/42 Why Using R and BioConductor for Array Analysis ?Complete statistical package and programming languageUseful for all bioscience areasPowerful graphicsAccess to fast growing number of Analysis packagesIs standard for data mining and biostatistical analysisTechnical advantages: free, open-source, available for all OSsBooks & DocumentationsimpleR - Using R for Introductory Statistics (Gentleman etal., 2005)Bioinformatics and Computational Biology Solutions Using Rand Bioconductor (John Verzani, 2004)UCR Manual (Thomas Girke) Microarray AnalysisR and BioConductorSlide 33/42 Installation1 Install R binary for your operating system from: the required packages from BioConductor by executingthe following commands in R:> source(" ")> biocLite()> biocLite(c("GOstats", "Ruuid", "graph", "GO", "Category","plier", "affylmGUI", "limmaGUI", "simpleaffy","ath1121501", "ath1121501cdf", "ath1121501probe", "biomaRt","affycoretools")) Microarray AnalysisR and BioConductorSlide 34/42R Essentials# General R command syntax> object <- function(arguments)# Execute an R script> source(" ")# Finding help> ?

10 Function# Load a library> library(affy)# Summary of all functions within a library> library(help=affy)# Load library manual (PDF file)> openVignette() Microarray AnalysisR and BioConductorSlide 35/42 OutlineTechnologyChallengesData AnalysisData DepositoriesR and BioConductorHomework AssignmentMicroarray AnalysisHomework AssignmentSlide 36/42 Obtain Sample Data from GEOR etieve the Arabidopsis light treatment series (GSE5617)from GEO with the following query:Arabidopsis[Organism] AND Atgenexpress[Title] ANDlight[Title]Download the following Cel files from this GSE5617 AnalysisHomework AssignmentSlide 37/42 Define Replicates and TreatmentsGenerate file and save it in your working should contain the following content:Name FileName TargetDSREP1 dark45mDSREP2 dark45mDSREP3 dark45mPSREP1 red1mdark44mPSREP2 red1mdark44mPSREP3 red1mdark44mBSREP1 blue45mBSREP2 blue45mBSREP3 blue45mMicroarray AnalysisHomework AssignmentSlide 38/42 Homework TasksA.


Related search queries