Example: confidence

Frequent Itemsets - Stanford University

Chapter 6 Frequent ItemsetsWe turn in this chapter to one of the major families of techniques forcharacter-izing data: the discovery of Frequent Itemsets . This problem is often viewed asthe discovery of association rules, although the latter is a more complex char-acterization of data, whose discovery depends fundamentally on the discoveryof Frequent begin, we introduce the market-basket model of data, whichis essen-tially a many-many relationship between two kinds of elements, called items and baskets, but with some assumptions about the shape of the data. Thefrequent- Itemsets problem is that of finding sets of items that appear in (arerelated to) many of the same problem of finding Frequent Itemsets differs from the similarity searchdiscussed in Chapter 3.

6.1. THEMARKET-BASKETMODEL 215 nothing. Among the singleton sets, obviously {cat} and {dog} are quite frequent. “Dog” appears in all but basket (5), so its support is …

Tags:

  Frequent, Frequent itemsets, Itemsets

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Frequent Itemsets - Stanford University

1 Chapter 6 Frequent ItemsetsWe turn in this chapter to one of the major families of techniques forcharacter-izing data: the discovery of Frequent Itemsets . This problem is often viewed asthe discovery of association rules, although the latter is a more complex char-acterization of data, whose discovery depends fundamentally on the discoveryof Frequent begin, we introduce the market-basket model of data, whichis essen-tially a many-many relationship between two kinds of elements, called items and baskets, but with some assumptions about the shape of the data. Thefrequent- Itemsets problem is that of finding sets of items that appear in (arerelated to) many of the same problem of finding Frequent Itemsets differs from the similarity searchdiscussed in Chapter 3.

2 Here we are interested in the absolute number of basketsthat contain a particular set of items. In Chapter 3 we wanted itemsthat havea large fraction of their baskets in common, even if the absolute number ofbaskets is difference leads to a new class of algorithms for finding frequentitem-sets. We begin with the A-Priori Algorithm, which works by eliminating mostlarge sets as candidates by looking first at smaller sets and recognizing that alarge set cannot be Frequent unless all its subsets are. We then consider variousimprovements to the basic A-Priori idea, concentrating on very large data setsthat stress the available main , we consider approximate algorithms that work faster but are notguaranteed to find all Frequent Itemsets .

3 Also in this class of algorithms arethose that exploit parallelism, including the parallelism we can obtain througha MapReduce formulation. Finally, we discuss briefly how to find frequentitemsets in a data 6. Frequent The Market-Basket ModelThemarket-basketmodel of data is used to describe a common form of many-many relationship between two kinds of objects. On the one hand, we haveitems, and on the other we havebaskets, sometimes called transactions. Each basket consists of a set of items (anitemset), and usually we assume thatthe number of items in a basket is small much smaller than the total numberof items. The number of baskets is usually assumed to be very large,biggerthan what can fit in main memory.

4 The data is assumed to be represented in afile consisting of a sequence of baskets. In terms of the distributed file systemdescribed in Section , the baskets are the objects of the file, and each basketis of type set of items. Definition of Frequent ItemsetsIntuitively, a set of items that appears in many baskets is said to be Frequent . To be formal, we assume there is a numbers, called thesupport threshold. IfIis a set of items, thesupportforIis the number of baskets for whichIis asubset. We sayIisfrequentif its support issor :In Fig. are sets of words. Each set is a basket, and thewords are items. We took these sets by Googlingcat dogand taking snippetsfrom the highest-ranked pages.

5 Do not be concerned if a word appears twicein a basket, as baskets are sets, and in principle items can appear only , ignore {Cat, and, dog, bites}2.{Yahoo, news, claims, a, cat, mated, with, a, dog, and, produced,viable,offspring}3.{Cat, killer, likely, is, a, big, dog}4.{Professional, free, advice, on, dog, training, puppy, training}5.{Cat, and, kitten, training, and, behavior}6.{Dog, &, Cat, provides, dog, training, in, Eugene, Oregon}7.{ Dog, and, cat , is, a, slang, term, used, by, police, officers, for, a, male female, relationship}8.{Shop, for, your, show, dog, grooming, and, pet, supplies}Figure : Here are eight baskets, each consisting of items that are wordsSince the empty set is a subset of any set, the support for is 8.

6 However,we shall not generally concern ourselves with the empty set, since ittells THE MARKET-BASKET the singleton sets, obviously{cat}and{dog}are quite Frequent . Dog appears in all but basket (5), so its support is 7, while cat appears inall but (4) and (8), so its support is 6. The word and is also quite Frequent ;it appears in (1), (2), (5), (7), and (8), so its support is 5. The words a and training appear in three sets, while for and is appear in two each. Noother word appears more than that we set our threshold ats= 3. Then there are five frequentsingleton Itemsets :{dog},{cat},{and},{a}, and{training}.Now, let us look at the doubletons.

7 A doubleton cannot be frequentunlessboth items in the set are Frequent by themselves. Thus, there areonly tenpossible Frequent doubletons. Fig. is a table indicating which baskets containwhich , 62, 3, 71, 2, 7, 81, 2, 3, 6, 7cat5, 62, 3, 71, 2, 5, 7and52, 7anoneFigure : Occurrences of doubletonsFor example, we see from the table of Fig. that doubleton{dog, training}appears only in baskets (4) and (6). Therefore, its support is 2, and it is notfrequent. There are five Frequent doubletons ifs= 3; they are{dog, a} {dog, and} {dog, cat}{cat, a} {cat, and}Each appears at least three times; for instance,{dog, cat}appears five , let us see if there are Frequent triples.

8 In order to be a Frequent triple,each pair of elements in the set must be a Frequent doubleton. For example,{dog, a, and}cannot be a Frequent itemset, because if it were, then surely{a,and}would be Frequent , but it is not. The triple{dog, cat, and}might befrequent, because each of its doubleton subsets is Frequent . Unfortunately, thethree words appear together only in baskets (1) and (2), so it is not a frequenttriple. The triple{dog, cat, a}might be Frequent , since its doubletons are allfrequent. In fact, all three words do appear in baskets (2), (3), and (7), soit is a Frequent triple. No other triple of words is even a candidate forbeinga Frequent triple, since for no other triple of words are its three doubletonsubsets Frequent .

9 As there is only one Frequent triple, there canbe no frequentquadruples or larger sets. 216 CHAPTER 6. Frequent ITEMSETSOn-Line versus Brick-and-Mortar RetailingWe suggested in Section that an on-line retailer would use similaritymeasures for items to find pairs of items that, while they might not bebought by many customers, had a significant fraction of their customersin common. An on-line retailer could then advertise one item of the pairto the few customers who had bought the other item of the pair. Thismethodology makes no sense for a bricks-and-mortar retailer, because un-less lots of people buy an item, it cannot be cost effective to advertise asale on the item.

10 Thus, the techniques of Chapter 3 are not often usefulfor brick-and-mortar , the on-line retailer has little need for the analysis we dis-cuss in this chapter, since it is designed to search for Itemsets thatappearfrequently. If the on-line retailer was limited to Frequent Itemsets ,theywould miss all the opportunities that are present in the long tail toselect advertisements for each customer Applications of Frequent ItemsetsThe original application of the market-basket model was in the analysis of truemarket baskets. That is, supermarkets and chain stores recordthe contentsof every market basket (physical shopping cart) brought to theregister forcheckout.


Related search queries