Example: tourism industry

Sampling - Department of Statistics

Samplingby David A. FreedmanDepartment of StatisticsUniversity of CaliforniaBerkeley, CA 94720 The basic idea in Sampling is extrapolation from the part to thewhole from the sample to the population. (The population is some-times rather mysteriously called the universe. ) There is an immediatecorollary: the sample must be chosen to fairly represent the for choosing samples are called designs. Good designs in-volve the use of probability methods, minimizing subjective judgment inthe choice of units to survey. samples drawn using probability methodsare called probability samples . Bias is a serious problem in applied work; probability samples min-imize bias. As it turns out, however, methods used to extrapolate from aprobability sample to the population should take into account the methodused to draw the sample; otherwise, bias may come in through the backdoor.

sample” consists of the people willing to be interviewed on certain days at certain shopping centers. This too is a convenience sample. The reason This too is a convenience sample. The reason

Tags:

  Samples, Sampling

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Sampling - Department of Statistics

1 Samplingby David A. FreedmanDepartment of StatisticsUniversity of CaliforniaBerkeley, CA 94720 The basic idea in Sampling is extrapolation from the part to thewhole from the sample to the population. (The population is some-times rather mysteriously called the universe. ) There is an immediatecorollary: the sample must be chosen to fairly represent the for choosing samples are called designs. Good designs in-volve the use of probability methods, minimizing subjective judgment inthe choice of units to survey. samples drawn using probability methodsare called probability samples . Bias is a serious problem in applied work; probability samples min-imize bias. As it turns out, however, methods used to extrapolate from aprobability sample to the population should take into account the methodused to draw the sample; otherwise, bias may come in through the backdoor.

2 The ideas will be illustrated for Sampling people or business records,but apply more broadly. There are sample surveys of buildings, farms, lawcases, schools, trees, trade union locals, and many other DESIGNP robability samples should be distinguished from samples of con-venience (also called grab samples ). A typical sample of conveniencecomprises the investigator s students in an introductory course. A mallsample consists of the people willing to be interviewed on certain daysat certain shopping centers. This too is a convenience sample. The reasonfor the nomenclature is apparent, and so is the downside: the sample maynot represent any definable population larger than draw a probability sample, we begin by identifying the populationof interest.

3 The next step is to create the Sampling frame, a list ofunits to be sampled. One easy design is simple random Sampling . Forinstance, to draw a simple random sample of 100 units, choose one unitat random from the frame; put this unit into the sample; choose anotherunit at random from the remaining ones in the frame; and so forth. Keepgoing until 100 units have been chosen. At each step along the way, allunits in the pool have the same chance of being A. FreedmanSimple random Sampling is often practical for a population of busi-ness records, even when that population is large. When it comes to people,especially when face-to-face interviews are to be conducted, simple ran-dom Sampling is seldom feasible: where would we get the frame?

4 Morecomplex design are therefore needed. If, for instance, we wanted to sam-ple people in a city, we could list all the blocks in the city to create theframe, draw a simple random sample of blocks, and interview all peoplein housing units in the selected blocks. This is a cluster sample, thecluster being the that the population has to be defined rather carefully: it con-sists of the people living in housing units in the city, at the time the sampleis taken. There are many variations. For example, one person in eachhousehold can be interviewed to get information on the whole , a person can be chosen at random within the household. The age of therespondent can be restricted; and so forth.

5 If telephone interviews are tobe conducted, random digit dialing often provides a reasonable approx-imation to simple random Sampling for the population with OF ERRORSS ince the sample is only part of the whole, extrapolation inevitablyleads to errors. These are of two kinds: Sampling error ( random error )and non- Sampling error ( systematic error ). The latter is often called bias, without connoting any prejudice. Sampling error results from theluck of the draw when choosing a sample: we get a few too many units ofone kind, and not enough of another. The likely impact of Sampling erroris usually quantified using the SE, or standard error. With probabilitysamples, the SE can be estimated using (i) the sample design and (ii) thesample the sample size (the number of units in the sample) increases,the SE goes down, albeit rather slowly.

6 If the population is relatively ho-mogeneous, the SE will be small: the degree of heterogeneity can usuallybe estimated from sample data, using the standard deviation or some anal-ogous statistic. Cluster samples especially with large clusters tend tohave large SEs, although such designs are often error is often the more serious problem in practicalwork, but it is harder to quantify and receives less attention than samplingerror. Non- Sampling error cannot be controlled by making the samplebigger. Indeed, bigger samples are harder to manage. Increasing thesize of the sample which is beneficial from the perspective of samplingerror may be counter-productive from the perspective of non-samplingSampling3error.

7 Non- Sampling error itself can be broken down into three maincategories: (i) selection bias, (ii) non-response bias, and (iii) responsebias. We discuss these in turn.(i) Selection bias is a systematic tendency to exclude one kind ofunit or another from the sample. With a convenience sample, selectionbias is a major issue. With a well-designed probability sample, selectionbias is minimal. That is the chief advantage of probability samples .(ii) Generally, the people who hang up on you are different fromthe ones who are willing to be interviewed. This difference exemplifiesnon-response bias. Extrapolation from respondents to non-respondents isproblematic, due to non-response bias.

8 If the response rate is high (mostinterviews are completed), non- response bias is minimal. If the responserate is low, non- response bias is a problem that needs to be considered. Atthe time of writing, government surveys that accept any respondentin the household have response rates over 95%. The best face-to-faceresearch surveys in the , interviewing a randomly-selected adult in ahousehold, get response rates over 80%. The best telephone surveys getresponse rates approaching 60%. Many commercial surveys have muchlower response rates, which is cause for concern.(iii) Respondents can easily be lead to shade the truth, by interviewerattitudes, the precise wording of questions, or even the juxtaposition ofone question with another.

9 These are typical sources of response error is well-defined for probability samples . Can the con-cept be stretched to cover convenience samples ? That is debatable (seebelow). Probability samples are expensive, but minimize selection bias,and provide a basis for estimating the likely impact of Sampling bias and non-response bias affect probability samples as wellas convenience NON-RESPONDENTS FOR RESPONDENTSMany surveys have a planned sample size: if a non-respondent is en-countered, a respondent is substituted. That may be helpful in controllingsampling error, but makes no contribution whatsoever to reducing bias. Ifthe survey is going to extrapolate from respondents to non-respondents, itis imperative to know how many non-respondents were BIG SHOULD THE SAMPLE BE?

10 There is no definitive statistical answer to this familiar question. Big-ger samples have less Sampling error. On the other hard, smaller samples4 David A. Freedmanmay be easier to manage, and have less non- Sampling error. Bigger sam-ples are more expensive than smaller ones: generally, resource constraintswill determine the sample size. If a pilot study is done, it may be possibleto judge the implications of sample size for accuracy of final size of the population is seldom a determining factor, providedthe focus is on relative errors. For example, the percentage breakdown ofthe popular vote in a presidential election with 200 million potentialvoters can be estimated reasonably well by taking a sample of severalthousand people.


Related search queries