Example: marketing

Boyer-Moore - Johns Hopkins University

boyer -MooreBen LangmeadYou are free to use these slides. If you do, please sign the guestbook ( ), or email me and tell me brie!y how you re using them. For original Keynote "les, email of Computer ScienceExact matching: slightly less na ve algorithmThere would have been a time for such a wordT:P:wordwordWe match w and o, then mismatch (r u)There would have been a time for such a wordT:P:wordwordword word wordskip!skip!.. since u doesn t occur in P, we can skip the next two alignmentsMismatched text character (u) doesn t occur in PBoyer-MooreUse knowledge gained from character comparisons to skip future alignments that de"nitely won t we mismatch, use knowledge of the mismatched text character to skip we match some characters, use knowledge of the matched characters to skip alignments in one direction, then try

Boyer-Moore: Good suffix rule Like with the bad character rule, the number of skips possible using the good suffix rule can be precalculated into a few tables (Gus"eld 2.2.4 and 2.2.5)

Tags:

  Romeo, Boyer, Boyer moore

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Boyer-Moore - Johns Hopkins University

1 boyer -MooreBen LangmeadYou are free to use these slides. If you do, please sign the guestbook ( ), or email me and tell me brie!y how you re using them. For original Keynote "les, email of Computer ScienceExact matching: slightly less na ve algorithmThere would have been a time for such a wordT:P:wordwordWe match w and o, then mismatch (r u)There would have been a time for such a wordT:P:wordwordword word wordskip!skip!.. since u doesn t occur in P, we can skip the next two alignmentsMismatched text character (u) doesn t occur in PBoyer-MooreUse knowledge gained from character comparisons to skip future alignments that de"nitely won t we mismatch, use knowledge of the mismatched text character to skip we match some characters, use knowledge of the matched characters to skip alignments in one direction, then try character comparisons in opposite directionBoyer, RS and Moore, JS.

2 "A fast string searching algorithm." Communications of the ACM (1977): 762-772. Bad character rule Good suffix rule For longer skipsBoyer-Moore: Bad character ruleT:P:GCTTCTGCTACCTTTTGCGCGCGCGCGGAACC TTTTGCUpon mismatch, let b be the mismatched character in T. Skip alignments until (a) b matches its opposite in P, or (b) P moves past 1:T:P:GCTTCTGCTACCTTTTGCGCGCGCGCGGAACCTT TTGCStep 2:T:P:GCTTCTGCTACCTTTTGCGCGCGCGCGGAACCTT TTGCStep 3:(etc)Case (a)Case (b)bbBoyer-Moore: Bad character ruleT:P:GCTTCTGCTACCTTTTGCGCGCGCGCGGAACC TTTTGCStep 1:T:P:GCTTCTGCTACCTTTTGCGCGCGCGCGGAACCTT TTGCStep 2:T:P:GCTTCTGCTACCTTTTGCGCGCGCGCGGAACCTT TTGCStep 3:We skipped 8 alignmentsIn fact, there are 5 characters in T we never looked at Boyer-Moore : Bad character rule preprocessingAs soon as P is known, build a | |-by-n table.

3 Say b is the character in T that mismatched and i is the mismatch s offset into P. The number of skips is given by element in bth row and ith "eld gives space-efficient :P:GCTTCTGCTACCTTTTGCGCGCGCGCGGAACCTTTTG CB oyer-Moore: Good suffix ruleLet t be the substring of T that matched a suffix of P. Skip alignments until (a) t matches opposite characters in P, or (b) a pre"x of P matches a suffix of t, or (c) P moves past t, whichever happens "rstT:P:CGTGCCTACTTACTTACTTACTTACGCGAACT TACTTACStep 1:tT:P:CGTGCCTACTTACTTACTTACTTACGCGAACTT ACTTACStep 2:T:P:CGTGCCTACTTACTTACTTACTTACGCGAACTTA CTTACStep 3:Case (a)Case (b)tBoyer-Moore: Good suffix ruleLike with the bad character rule, the number of skips possible using the good suffix rule can be precalculated into a few tables (Gus"eld and )Rule on previous slide is the weak good suffix rule.

4 There is also a strong good suffix rule (Gus"eld )T:P:CTTGCCTACTTACTTACTCTTACTTACtCTTACTT ACCTTACTTACWeak:Strong:With the strong good suffix rule (and other minor modi"cations), Boyer-Moore is O(m) worst-case time. Gus"eld discusses mismatch! Boyer-Moore : Putting it togetherAfter each alignment, use bad character or good suffix rule, whichever skips moreT:P:GTTATAGCTGATCGCGGCGTAGCGGCGAAGTA GCGGCGStep 1:bc: 6, gs: 0T:P:GTTATAGCTGATCGCGGCGTAGCGGCGAAGTAGCG GCGStep 2:bc: 0, gs: 2T:P:GTTATAGCTGATCGCGGCGTAGCGGCGAAGTAGCG GCGStep 3:bc: 2, gs: 7T:P:GTTATAGCTGATCGCGGCGTAGCGGCGAAGTAGCG GCGStep 4:Bad character rule:Upon mismatch, let b be the mismatched character in T.

5 Skip alignments until (a) b matches its opposite in P, or (b) P moves past (a) of good suffix rulePart (b) of good suffix rulePart (a) of bad character ruleGood suffix rule:Let t be the substring of T that matched a suffix of P. Skip alignments until (a) t matches opposite characters in P, or (b) a pre"x of P matches a suffix of t, or (c) P moves past t, whichever happens " : Putting it togetherT:P:GTTATAGCTGATCGCGGCGTAGCGGCGA AGTAGCGGCGStep 1:T:P:GTTATAGCTGATCGCGGCGTAGCGGCGAAGTAGC GGCGStep 2:T:P:GTTATAGCTGATCGCGGCGTAGCGGCGAAGTAGC GGCGStep 3:T:P:GTTATAGCTGATCGCGGCGTAGCGGCGAAGTAGC GGCGStep 4:Up to now: 15 alignments skipped, 11 text characters never examinedBoyer-Moore: Worst and best casesBoyer-Moore (or a slight variant) is O(m) worst-case timeWhat s the best case?

6 Every character comparison is a mismatch, and bad character rule always slides P fully past the mismatchHow many character comparisons?!oor(m / n)Contrast with naive algorithmPerformance comparisonNa ve matchingNa ve matchingBoyer-MooreBoyer-Moore# character comparisonswall clock time# character comparisonswall clock timeP: tomorrow T: Shakespeare s complete worksP: 50 nt string from Alu repeat*T: Human reference (hg19) chromosome 1 Comparing simple Python implementations of na ve exact matching and Boyer-Moore exact matching:* GCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGG CCGAGGCGGG336 matches| T | = 249 M17 matches| T | = M5,906, s785, s307,013,905137 s32,495,11155 s


Related search queries