Example: tourism industry

Introduction BIOINFORMATICS - Gerstein Lab

1(c) Mark Gerstein , 1999, Yale, Gerstein , Yale (c) Mark Gerstein , 1999, Yale, +3(c) Mark Gerstein , 1999, Yale, isBioinformatics? (Molecular)Bio-informatics One idea for a definition? BIOINFORMATICS is conceptualizingbiology in terms ofmolecules(in the sense of physical-chemistry) andthen applying informatics techniques(derivedfrom disciplines such as applied math, CS, andstatistics) to understand andorganize theinformation associatedwith these molecules,on alarge-scale. BIOINFORMATICS is MIS for Molecular BiologyInformation4(c) Mark Gerstein , 1999, Yale, Biology: an Information Science Central Dogmaof Molecular BiologyDNA-> RNA-> Protein-> Phenotype-> DNA Molecules Sequence, Structure, Function Processes Mechanism, Specificity, Regulation Central Paradigmfor BioinformaticsGenomic Sequence Information-> mRNA (level)-> Protein Sequence-> Protein Structure-> Protein Function-> Phenotype L

Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “ informatics ” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. • Bioinformatics is “MIS ...

Tags:

  Introduction, Bioinformatics, Introduction bioinformatics

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Introduction BIOINFORMATICS - Gerstein Lab

1 1(c) Mark Gerstein , 1999, Yale, Gerstein , Yale (c) Mark Gerstein , 1999, Yale, +3(c) Mark Gerstein , 1999, Yale, isBioinformatics? (Molecular)Bio-informatics One idea for a definition? BIOINFORMATICS is conceptualizingbiology in terms ofmolecules(in the sense of physical-chemistry) andthen applying informatics techniques(derivedfrom disciplines such as applied math, CS, andstatistics) to understand andorganize theinformation associatedwith these molecules,on alarge-scale. BIOINFORMATICS is MIS for Molecular BiologyInformation4(c) Mark Gerstein , 1999, Yale, Biology: an Information Science Central Dogmaof Molecular BiologyDNA-> RNA-> Protein-> Phenotype-> DNA Molecules Sequence, Structure, Function Processes Mechanism, Specificity, Regulation Central Paradigmfor BioinformaticsGenomic Sequence Information-> mRNA (level)-> Protein Sequence-> Protein Structure-> Protein Function-> Phenotype Large Amounts of Information Standardized Statistical(idea from D Brutlag, Stanford, graphics from S Strobel) Genetic material Information transfer (mRNA) Protein synthesis (tRNA/mRNA) Some catalytic activity Most cellular functions are performed orfacilitated by proteins.

2 Primary biocatalyst Cofactor transport/storage Mechanical motion/support Immune protection Control of growth/differentiation5(c) Mark Gerstein , 1999, Yale, Biology Information - DNA Raw DNA Sequence Coding or Not? Parse into genes? 4bases:AGCT ~1 Kinagene,~2 M in genomeatggcaattaaaattggtatcaatggttttggtc gtatcggccgtatcgtattccgtgcagcacaacaccgtga tgacattgaagttgtaggtattaacgacttaatcgacgtt gaatacatggcttatatgttgaaatatgattcaactcacg gtcgtttcgacggcactgttgaagtgaaagatggtaactt agtggttaatggtaaaactatccgtgtaactgcagaacgt gatccagcaaacttaaactggggtgcaatcggtgttgata tcgctgttgaagcgactggtttattcttaactgatgaaac tgctcgtaaacatatcactgcaggcgcaaaaaaagttgta ttaactggcccatctaaagatgcaacccctatgttcgttc gtggtgtaaacttcaacgcatacgcaggtcaagatatcgt ttctaacgcatcttgtacaacaaactgtttagctccttta gcacgtgttgttcatgaaactttcggtatcaaagatggtt taatgaccactgttcacgcaacgactgcaactcaaaaaac tgtggatggtccatcagctaaagactggcgcggcggccgc ggtgcatcacaaaacatcattccatcttcaacaggtgcag cgaaagcagtaggtaaagtattacctgcattaaacggtaa attaactggtatggctttccgtgttccaacgccaaacgta tctgttgttgatttaacagttaatcttgaaaaaccagctt cttatgatgcaatcaaacaagcaatcaaagatgcagcgga aggtaaaacgttcaatggcgaattaaaaggcgtattaggt tacactgaagatgctgttgtttctactgacttcaacggtt gtgctttaacttctgtatttgatgcagacgctggtatcgc attaactgattctttcgttaaattggtatc.

3 Caaaaatagggttaatatgaatctcgatctccattttgtt catcgtattcaacaacaagccaaaactcgtacaaatatga ccgcacttcgctataaagaacacggcttgtggcgagatat ctcttggaaaaactttcaagagcaactcaatcaactttct cgagcattgcttgctcacaatattgacgtacaagataaaa tcgccatttttgcccataatatggaacgttgggttgttca tgaaactttcggtatcaaagatggtttaatgaccactgtt cacgcaacgactacaatcgttgacattgcgaccttacaaa ttcgagcaatcacagtgcctatttacgcaaccaatacagc ccagcaagcagaatttatcctaaatcacgccgatgtaaaa attctcttcgtcggcgatcaagagcaatacgatcaaacat tggaaattgctcatcattgtccaaaattacaaaaaattgt agcaatgaaatccaccattcaattacaacaagatcctctt tcttgcacttgg6(c) Mark Gerstein , 1999, Yale, Biology Information:Protein Sequence 20 letter alphabet ACDEFGHIKLMNPQRSTVWYbut notBJOUXZ Strings of ~300 aa in an average protein (in bacteria)

4 ,~200 aa in a domain ~200 K known protein sequencesd1dhfa_LNCIVAVSQNMGIGKNGDLPWPPL RNEFRYFQRMTTTSSVEGKQ-NLVIMGKKTWFSId8dfr_ _LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSH VEGKQ-NAVIMGKKTWFSId4dfra_ISLIAALAVDRVIG MENAMPWN-LPADLAWFKRNTL--------NKPVIMGRHT WESId3dfr__TAFLWAQDRDGLIGKDGHLPWH-LPDDLH YFRAQTV--------GKIMVVGRRTYESFd1dhfa_ LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSV EGKQ-NLVIMGKKTWFSId8dfr__LNSIVAVCQNMGIGK DGNLPWPPLRNEYKYFQRMTSTSHVEGKQ-NAVIMGKKTW FSId4dfra_ISLIAALAVDRVIGMENAMPW-NLPADLAW FKRNTLD--------KPVIMGRHTWESId3dfr__TAFLW AQDRNGLIGKDGHLPW-HLPDDLHYFRAQTVG-------- KIMVVGRRTYESFd1dhfa_ VPEKNRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKL TEQPELANKVDMVWIVGGSSVYKEAMNHPd8dfr__ VPEKNRPLKDRINIVLSRELKEAPKGAHYLSKSLDDALAL LDSPELKSKVDMVWIVGGTAVYKAAMEKPd4dfra_ ---G-RPLPGRKNIILS-SQPGTDDRV-TWVKSVDEAIAA CGDVP------EIMVIGGGRVYEQFLPKAd3dfr__ ---PKRPLPERTNVVLTHQEDYQAQGA-VVVHDVAAVFAY AKQHLDQ----ELVIAGGAQIFTAFKDDVd1dhfa_-PEK NRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKLTEQP ELANKVDMVWIVGGSSVYKEAMNHPd8dfr__ -PEKNRPLKDRINIVLSRELKEAPKGAHYLSKSLDDALAL LDSPELKSKVDMVWIVGGTAVYKAAMEKPd4dfra_ -P--KRPLPERTNVVLTHQEDYQAQGA-VVVHDVAAVFAY AKQHLD----QELVIAGGAQIFTAFKDDV7(c) Mark Gerstein , 1999, Yale, Biology Information.

5 Macromolecular Structure DNA/RNA/Protein Almost all protein(RNA Adapted From D Soll Web Page,Right Hand Top Protein from M Levitt web page)8(c) Mark Gerstein , 1999, Yale, Biology Information:Protein Structure Details Statistics on Number of XYZ triplets 200 residues/domain->200 CA atoms, separated by A Avg. Residue is Leu: 4 backbone atoms + 4 sidechain atoms, 150 cubic A =>~1500 xyz triplets (=8x200) per protein domain 10 K known domain, ~300 foldsATOM 1 C ACE 0 1 GKY 67 ATOM 2 O ACE 0 1 GKY 68 ATOM 3 CH3 ACE 0 1 GKY 69 ATOM 4 N SER 1 1 GKY 70 ATOM 5 CA SER 1 1 GKY 71 ATOM 6 C SER 1 1 GKY 72 ATOM 7 O SER 1 1 GKY 73 ATOM 8 CB SER 1 1 GKY 74 ATOM 9 OG SER 1 1 GKY 75 ATOM 10 N ARG 2 1 GKY 76

6 ATOM 11 CA ARG 2 1 GKY 77 ATOM 12 C ARG 2 1 GKY 1444 CB LYS 186 1 GKY1510 ATOM 1445 CG LYS 186 1 GKY1511 ATOM 1446 CD LYS 186 1 GKY1512 ATOM 1447 CE LYS 186 1 GKY1513 ATOM 1448 NZ LYS 186 1 GKY1514 ATOM 1449 OXT LYS 186 1 GKY1515 TER 1450 LYS 186 1 GKY15169(c) Mark Gerstein , 1999, Yale, theWorld ofSequencesBacteria, , ~1600genes[Science269: 496]Eukaryote,13 Mb, ~6 Kgenes[Nature387:1]199519971998 Animal, ~100Mb, ~20 Kgenes[Science282: 1945]Human, ~3Gb, ~100 Kgenes[?]

7 ??]2000?10(c) Mark Gerstein , 1999, Yale, BiologyInformation:Whole Genomes The Revolution Driving EverythingFleischmann, ,Adams, ,White,O.,Clayton, ,Kirkness, ,Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., McKenney, K.,Sutton, G., Fitzhugh, W., Fields, C., Gocayne, J. D., Scott, J., Shirley, R., Liu, L. I., Glodek, A.,Kelley, J. M., Weidman, J. F., Phillips, C. A., Spriggs, T., Hedblom, E., Cotton, M. D.,Utterback, T. R., Hanna, M. C., Nguyen, D. T., Saudek, D. M., Brandon, R. C., Fine, L. D.,Fritchman, J. L., Fuhrmann, J. L., Geoghagen, N. S. M., Gnehm, C. L., McDonald, L. A.

8 , Small, ,Fraser, ,Smith, , (1995)."Whole-genomerandom sequencing and assembly ofHaemophilusinfluenzae rd."Science269: 496-512.(Picture adapted from TIGR website, ) Integrative Data1995, HI (bacteria): Mb & 1600 genes done1997, yeast: 13 Mb & ~6000 genes for yeast1998, worm: ~100Mb with 19 K genes1999: >30 completed genomes!2003, human: 3 Gb & 100 K sequence nowaccumulate so quickly that,in less than a week, asingle laboratory canproduce more bits of datathan Shakespearemanagedinalifetime,although the latter makebetter G A Pekso,Nature401: 115-116 (1999)11(c) Mark Gerstein , 1999, Yale, ExpressionDatasets: theTranscriptosomeAlso: SAGE;Samson andChurch, Chips;Aebersold,ProteinExpressionYoung/L ander, Chips,Abs.

9 , array,Rel. Exp. overTimecourseSnyder,Transposons,Protein (c) Mark Gerstein , 1999, Yale, Data(courtesy of J Hager)Yeast Expression Data inAcademia:levels for all 6000 genes!Can only sequence genomeonce but can do an infinitevariety of these arrayexperimentsat 10 time points,6000 x 10 = 60K floatstelling signal frombackground13(c) Mark Gerstein , 1999, Yale, Whole-GenomeExperimentsSystematic KnockoutsWinzeler, E. A., Shoemaker, D. D.,Astromoff, A., Liang, H., Anderson, K.,Andre, B., Bangham, R., Benito, R.,Boeke, J. D., Bussey, H., Chu, A. M.,Connelly, C., Davis, K., Dietrich, F., Dow,S. W., El Bakkoury, M.

10 , Foury, F., Friend,S. H., Gentalen, E., Giaever, G.,Hegemann, J. H., Jones, T., Laub, M.,Liao, H., Davis, R. W. & et al. (1999).Functional characterization of the genome by gene deletion andparallel , 901-62 hybrids, linkage mapsHua, S. B., Luo, Y., Qiu, M., Chan, E., Zhou, H. &Zhu, L. (1998). Construction of a modular yeasttwo-hybrid cDNA library from human EST clones forthe human genome protein linkage ,143-52 For yeast:6000 x 6000 / 2~18 Minteractions14(c) Mark Gerstein , 1999, Yale, Biology Information:Other Integrative Data Information tounderstand genomes Metabolic Pathways(glycolysis), traditionalbiochemistry Regulatory Networks Whole OrganismsPhylogeny, traditionalzoology Environments, Habitats,ecology The Literature(MEDLINE) The (Pathway drawing from P Karp s EcoCyc, Phylogenyfrom S J Gould, Dinosaur in a Haystack)15(c)


Related search queries