Transcription of An Introduction to Secondary Data Analysis
1 1An Introduction to SecondaryData AnalysisWhat Are Secondary data ?In the fields of epidemiology and public health, the distinction betweenprimaryandsecondarydata depends on the relationship between theperson or research team who collected a data set and the person whois analyzing it. This is an important concept because the same data setcould be primary data in one Analysis and Secondary data in the data set in question was collected by the researcher (or a team ofwhich the researcher is a part) for the specific purpose or Analysis underconsideration, it isprimary data . If it was collected by someone else forsome other purpose, it issecondary data . Of course, there will alwaysbe cases in which this distinction is less clear, but it may be useful toconceptualize primary and Secondary data by considering two extremecases.
2 In the first, which is an example ofprimary data , a research teamconceives of and develops a research project, collects data designed toaddress specific questions posed by the project, and performs and pub-lishes their own analyses of the data they have collected. In this case, thepeople involved in analyzing the data have some involvement in, or atleast familiarity with, the research design and data collection process, andthe data were collected to answer the questions examined in the the second case, which is an example ofsecondary data , a researcherposes questions that are addressed through Analysis of data from theBehavioral Risk Factor Surveillance System (BRFSS), a data set col-lected annually in the United States through cooperation of the Centersfor Disease Control and Prevention and state health departments.
3 In thiscase, the person performing the Analysis did not participate in either the1 Cambridge University University Press978-0-521-87001-6 - Secondary data Sources for Public Health: A Practical GuideSarah BoslaughExcerptMore information2An Introduction to Secondary data Analysisresearch design or data collection process, and the data were not collectedto answer specific research an example of the same data set serving as both primary andsecondary data , consider the increasingly common practice of oneresearcher performing an Analysis of data collected by a research teamwith whom he or she has no connection. This type of Analysis is facili-tated by the ease of sharing data stored electronically and the concomi-tant creation of electronic data archives that allow access to secondaryusers; some of these archives are discussed in Chapter7.
4 Such analysesmay serve a variety of purposes, such as addressing questions not con-sidered in the original Analysis or examining how a different analyticapproach might change the conclusions reached from the first either case, the same data set serves asprimary datafor the originalresearch team andsecondary datafor the researcher performing the book deals primarily with Secondary data in the sense of data setsthat can be obtained and analyzed in detail by the individual is another type of Secondary data , again not mutually exclu-sive with the first, meaning statistical information about some geo-graphic region or other entity. This type of information is often useful toresearchers: when you place your research project in context by describ-ing the racial makeup or median house value in the metropolitan areawhere you conduct your research, the data used to compute those statis-tics were probably Secondary data .
5 Often these statistics are computedon data collected by the federal government, and Chapter7 discussesseveral websites that were created specifically to permit easy access tothese types of statistics. In addition, many of the data sets described inthis book are accessible through an online interface that allows the quickcomputation of basic statistics, without requiring the user to downloaddata and use a statistical program to analyze it. The availability of suchinterfaces has been noted in the sections pertaining to each data of the data sets discussed in this volume contain either data col-lected through surveys or censuses, such as the National Health InterviewSurvey and the Census, or administrative records such as the medicalclaims records submitted to the Medicare system.
6 There are other typesof Secondary data , including diaries, videorecordings, and transcripts of Cambridge University University Press978-0-521-87001-6 - Secondary data Sources for Public Health: A Practical GuideSarah BoslaughExcerptMore information3 Advantages and Disadvantages of Secondary data Analysisinterviews and focus groups: some of these are included in sources dis-cussed in Chapter7. data such as interview transcripts are often analyzedusing qualitative data methods rather than the quantitative techniquesappropriate for most of the data sets discussed in this volume. Secondaryanalysis of qualitative data is a topic unto itself and is not discussed inthis volume. The interested reader is referred to references such as Jamesand Sorenson (2000) and Heaton (2004).
7 Advantages and Disadvantages of Secondary data AnalysisThe choice of primary or Secondary data need not be an either/or ques-tion. Most researchers in epidemiology and public health will work withboth types of data in the course of their careers, and many researchprojects incorporate both types of data . A more useful approach to thisquestion is to focus on selecting data that are appropriate to the researchquestion being studied and the resources available to the researcher; thelatter include time, money, and personal expertise. In this spirit, we offera summary of the major advantages and disadvantages of working withsecondary, as opposed to primary, first major advantage of working with Secondary data is economy:because someone else has already collected the data , the researcher doesnot have to devote resources to this phase of research.
8 Even if the sec-ondary data set must be purchased, the cost is almost certainly lowerthan the expense of salaries, transportation, and so forth that would berequired to collect and process a similar data set from scratch. There is alsoa savings of time. Because the data are already collected, and frequentlyalso cleaned and stored in electronic format, the researcher can spendthe bulk of his or her time analyzing the data . There is also the influence ofpreference: Secondary data Analysis is an ideal focus for researchers whoprefer to spend their working hours thinking of and testing hypothesesusing existing data sets, rather than writing grants to finance the datacollection process and supervising student interviewers and data second major advantage of using Secondary data is the breadthof data available.
9 Few individual researchers would have the resources tocollect data from a representative sample of adults in every state in the Cambridge University University Press978-0-521-87001-6 - Secondary data Sources for Public Health: A Practical GuideSarah BoslaughExcerptMore information4An Introduction to Secondary data AnalysisUnited States, let alone repeat this data collection process every year, butthe federal government conducts numerous surveys on that scale. Datacollected on a national basis are particularly important in epidemiologyand public health, fields that focus primarily on the health of populationsrather than of individuals. In addition, some of the data sets discussed inChapters2 through7 collect data using a longitudinal design, and othersare designed so certain questions are included annually or at regularintervals, allowing researchers to examine the changes in health statusand health behaviors in the population over third advantage in using Secondary data is that often the data col-lection process is informed by expertise and professionalism that may notavailable to smaller research projects.
10 For instance, many of the federalhealth surveys discussed in this volume use a complex sample design andsystem of weighting that allows the researcher to compute population-based estimates of health conditions and behaviors. Although a localdata collection project could conceivably use similar techniques, moreoften a convenience sample, whose generalizability is questionable, isused instead. To take another example, data collection for many federaldata sets is often performed by staff members who specialize in that taskand who may have years of experience working on a particular is in contrast to many smaller research projects, in which data arecollected by students working at a part-time, temporary major disadvantage to using Secondary data is inherent in itsnature: because the data were not collected to answer your specificresearch questions, particular information that you would like to havemay not have been collected.