Example: marketing

Syntax - Stata

Kmeans and kmedians Kmeans and kmedians cluster analysisSyntaxMenuDescriptionOptionsRema rks and examplesMethods and formulasReferenceAlso seeSyntaxKmeans cluster analysiscluster kmeans[varlist][if][in], k(#)[options]Kmedians cluster analysiscluster kmedians[varlist][if][in], k(#)[options]optionDescriptionMain k(#)perform cluster analysis resulting in#groupsmeasure(measure)similarity or dissimilarity measure; default isL2(Euclidean)name(clname)name of resulting cluster analysisOptionsstart(startoption)obtaink initial group centers by usingstartoption; seeOptionsfor detailskeepcentersappend thekfinal group means or medians to the dataAdvancedgenerate(groupvar)name of grouping variableiterate(#)maximum number of iterations; default isiterate(10000) k(#)is kmeansStatistics>Multivariate analysis>Cluster analysis>Cluster data>Kmeanscluster kmediansStatistics>Multivariate analysis>Cluster analysis>Cluster data>KmediansDescriptioncluster kmeansandcluster kmediansperform kmeans and kmedians partition cluster analysis,respectively.

2cluster kmeans and kmedians— Kmeans and kmedians cluster analysis Options Main k(#) is required and indicates that # groups are to be formed by the cluster analysis.

Tags:

  Syntax

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Syntax - Stata

1 Kmeans and kmedians Kmeans and kmedians cluster analysisSyntaxMenuDescriptionOptionsRema rks and examplesMethods and formulasReferenceAlso seeSyntaxKmeans cluster analysiscluster kmeans[varlist][if][in], k(#)[options]Kmedians cluster analysiscluster kmedians[varlist][if][in], k(#)[options]optionDescriptionMain k(#)perform cluster analysis resulting in#groupsmeasure(measure)similarity or dissimilarity measure; default isL2(Euclidean)name(clname)name of resulting cluster analysisOptionsstart(startoption)obtaink initial group centers by usingstartoption; seeOptionsfor detailskeepcentersappend thekfinal group means or medians to the dataAdvancedgenerate(groupvar)name of grouping variableiterate(#)maximum number of iterations; default isiterate(10000) k(#)is kmeansStatistics>Multivariate analysis>Cluster analysis>Cluster data>Kmeanscluster kmediansStatistics>Multivariate analysis>Cluster analysis>Cluster data>KmediansDescriptioncluster kmeansandcluster kmediansperform kmeans and kmedians partition cluster analysis,respectively.

2 See [MV]clusterfor a general discussion of cluster analysis and a description of cluster kmeans and kmedians Kmeans and kmedians cluster analysisOptions Main k(#)is required and indicates that#groups are to be formed by the cluster (measure)specifies the similarity or dissimilarity measure. The default ismeasure(L2),Euclidean distance. This option is not case sensitive. See [MV]measureoptionfor detaileddescriptions of the supported (clname)specifies the name to attach to the resulting cluster analysis. Ifname()is not specified, Stata finds an available cluster name, displays it for your reference, and attaches the name to yourcluster analysis. Options start(startoption)indicates how thekinitial group centers are to be obtained. The availablestartoptions arekrandom[(seed#)], the default, specifies thatkunique observations be chosen at random, fromamong those to be clustered, as starting centers for thekgroups.

3 Optionally, a random-numberseed may be specified to cause the commandset seedseed#(see [R]set seed) to be appliedbefore thekrandom observations are [, exclude]specifies that the firstkobservations from among those to be clustered beused as the starting centers for thekgroups. With theexcludeoption, these firstkobservationsare not included among the observations to be [, exclude]specifies that the lastkobservations from among those to be clustered be usedas the starting centers for thekgroups. With theexcludeoption, these lastkobservations arenot included among the observations to be [(seed#)]specifies thatkrandom initial group centers be generated. The values arerandomly chosen from a uniform distribution over the range of the data. Optionally, a random-number seed may be specified to cause the commandset seedseed#(see [R]set seed) to beapplied before thekgroup centers are [(seed#)]specifies thatkpartitions be formed randomly among the observations to beclustered.

4 The group means or medians from thekgroups defined by this partitioning are tobe used as the starting group centers. Optionally, a random-number seed may be specified tocause the commandset seedseed#(see [R]set seed) to be applied before thekpartitionsare thatkpartitions be formed by assigning observations 1, 1+k, 1+2k,..tothe first group; assigning observations 2, 2+k, 2+2k,..to the second group; and so on, toformkgroups. The group means or medians from thesekgroups are to be used as the startinggroup thatknearly equal partitions be formed from the data. Approximately the firstN/kobservations are assigned to the first group, the secondN/kobservations are assigned tothe second group, and so on. The group means or medians from thesekgroups are to be usedas the starting group (varname)provides an initial grouping variable,varname, that defineskgroups among theobservations to be clustered.

5 The group means or medians from thesekgroups are to be usedas the starting group that the group means or medians from thekgroups that are produced beappended to the kmeans and kmedians Kmeans and kmedians cluster analysis 3 Advanced generate(groupvar)provides the name of the grouping variable to be created bycluster kmeansorcluster kmedians. By default, this will be the name specified inname().iterate(#)specifies the maximum number of iterations to allow in the kmeans or kmedians clusteringalgorithm. The default isiterate(10000).Remarks and examples are presented, one usingcluster kmeanswith continuous data and the other usingcluster kmeansandcluster kmedianswith binary data. Both commands work similarly withthe different types of 1 You have measured the flexibility, speed, and strength of the 80 students in your physical educationclass.

6 You want to split the class into four groups, based on their physical attributes, so that they canreceive the mix of flexibility, strength, and speed training that will best help them is a summary of the data and a matrix graph showing the data:. use summarize flex speed strengthVariableObs Mean Std. Dev. Min Maxflexibility80 .03 .03 .05 graph matrix flex speed strengthflexibilityspeedstrength05100510 0510051005100510As you expected, based on what you saw the first day of class, the data indicate a wide range of levelsof performance for the students. The graph seems to indicate that there are some distinct groups,which leads you to believe that your plan will work decide to perform a cluster analysis to create four groups, one for each of your class have had good experience with kmeans clustering in the past and generally like the behavior ofthe absolute-value cluster kmeans and kmedians Kmeans and kmedians cluster analysisYou do not really care what starting values are used in the cluster analysis, but you do want to beable to reproduce the same results if you ever decide to rerun your analysis.

7 You decide to use thekrandom()option to pickkof the observations at random as the initial group centers. You supplya random-number seed for reproducibility. You also add thekeepcentersoption so that the meansof the four groups will be added to the bottom of your cluster k flex speed strength, k(4) name(g4abs) s(kr(385617)) mea(abs) keepcen. cluster list g4absg4abs (type: partition, method: kmeans, dissimilarity: L1)vars: g4abs (group variable)other: cmd: cluster kmeans flex speed strength, k(4) name(g4abs)s(kr(385617)) mea(abs) keepcenvarlist: flexibility speed strengthk: 4start: krandom(385617)range: 0 .. table list flex speed strength in 81/L, abbrev(12)flexibility speed drop in 81/L(4 observations deleted)cluster kmeans and kmedians Kmeans and kmedians cluster analysis 5. tabstat flex speed strength, by(g4abs) stat(min mean max)Summary statistics: min, mean, maxby categories of: g4absg4absflexib~y speed.

8 03 ..03 . looking at the last 4 observations (which are the group means because you specifiedkeep-centers), you decide that what you really wanted to see was the minimum and maximum valuesand the mean for the four groups. You remove the last 4 observations and then use thetabstatcommand to view the desired 1, with 15 students, is already doing well in flexibility and speed but will need extrastrength training. Group 2, with 20 students, needs to emphasize speed training but could use someimprovement in the other categories as well. Group 3, the largest, with 35 students, has seriousproblems with both flexibility and speed, though they did well in the strength category. Group 4, thesmallest, with 10 students, needs help with flexibility and you like looking at graphs, you decide to view the matrix graph again but with groupnumbers used as plotting cluster kmeans and kmedians Kmeans and kmedians cluster analysis.

9 Graph matrix flex speed strength, m(i) mlabel(g4abs) mlabpos(0)431433111323142211223333122442 2343133233323233334241213221313331343332 3232433231431433111323142211223333122442 2343133233323233334241213221313331343332 3232433231431433111323142211223333122442 2343133233323233334241213221313331343332 3232433231431433111323142211223333122442 2343133233323233334241213221313331343332 3232433231431433111323142211223333122442 2343133233323233334241213221313331343332 3232433231431433111323142211223333122442 2343133233323233334241213221313331343332 3232433231flexibilityspeedstrength051005 100510051005100510 The groups, as shown in the graph, do appear reasonably distinct. However, you had hoped tohave groups that were about the same size. You are curious what clustering to three or five groupswould produce. For no good reason, you decide to use the firstkobservations as initial group centersfor clustering to three groups and random numbers within the range of the data for clustering to cluster k flex speed strength, k(3) name(g3abs) start(firstk) measure(abs).

10 Cluster k flex speed strength, k(5) name(g5abs) start(random(33576))> measure(abs). table g3abs g4abs, colg4absg3abs1 2 3 4 Total110 10218 35 53315 2 17cluster kmeans and kmedians Kmeans and kmedians cluster analysis 7. table g5abs g4abs, colg4absg5abs1 2 3 4 Total120 20215 1536 644 4535 35 With three groups, the unequal-group-size problem gets worse. With five groups, the smallest groupgets split. Four groups seem like the best option for this class. You will try to help the assistantassigned to group 3 in dealing with the larger might want to investigate the results of using different random seeds in the command usedto generate the 4 groups earlier in this example. Because these data do not have clearly defined,well-separated clusters, there is a good chance that clusters based on different starting values will 2 You have just started a women s club.


Related search queries