Transcription of THE ELEMENTS 0F QUANTITATIVE ANALYSIS KYLE …
1 CHAPTER11 KYLEGORMANANDDANIELEZRAIOHNSONA sociolinguistwhohasgatheredsomuchdatatha tithasbecomedifficulttomakesenseoftheraw observationsmayturntographicalpresentati on,andtodescriptivestatistics,techniques fordistillingacollectionofdataintoafewke ynumericalvalues,allowingtheresearcherto focusonspecific,meaningfulpropertiesofth edataset(seelohnsoninpress).However,asoc iolinguistisrarelysatisfiedwithameresnap shotoflinguisticbehavior,anddesiresnotju sttodescribe,butalsotoevaluatehypotheses abouttheconnectionsbetweenlinguisticbeha vror,speakers, ( ,Lucas,Bayley,8rValli2001:43).Asocioling uistwhosuspectsthatwomenandmen111acertai nspeechcommunitydifferintherateatwhichth eyrealizethefinalconsonantofawordendingi n<ing>withcoronal[n]ratherthanvelar[1]]wouldco llecttokensofthesewordsinthespeechofwome nandmen, ,intheformofadescrip tivestatisticoranappropriategraph,coulds uggestthatwomendifferfrommenintherateatw hichtheyusethesecompetingvariants,theset ech, , ,however, ,asingleinterviewmakesuponlyatinyfractio nofanyspeaker slifetimeoflanguage, ,wheretherearealwaysmorepossiblesubjects torunorstimulitopresent, ,itisalwayspossiblethatthesamplediffersq uantitativelyfromthepopuelation, ,butthewomeninasample,forinstance,maynot berepresenta ,usuallyanobserveddifference,inthesample doesextendtothepopulationiscalledthealte rnativehypothesis,whereastheopposingview thatthereisnorealdiffer ,ifasociolinguistisinterestedintheassoci ationbetweengenderandspeechrate,thenthen ullhypothesisisthatspeechrateisconstanta crossgenders, ( ,aZ-score,t statistic,F statistic,orchi-squarestatistic),thencom putetheprobability,henceforththep-value, thatateststatisticaslargeorlargerwouldha veoccurredunderthenullhypothesis( ,nodiffer enceinthepopulation).
2 Althoughthisthresholdisarbitrary,aresult wherep< ]Sciences, ,p< lation. intheforegoingexample,thealternativehypo thesisonlyrequiresthattherebesomediffere ncebetweengroups, , ,asthelabel significant , ,generallywithhelpfromacomputer,tocalcul ateateststatisticandp-valuefromasetofdat a; ,thecontentsofthesampleareshapedbyconven iencefactors,suchasspeakers forinstance,aresearcherinterestedinstigm atizedspeechmayunfor-tunatelydiscovertha tlow-prestigespeakersaretheleastlikelyto agreetoaninterviewwithastranger ,theresearchermaydeployproportionalstrat ifiedsampling( ,Cedergren1973);ifthepopulationconsistso fmiddleaclaSsspeak-ers,whoaccountfor25pe rcentofthepopulation,andworkingclassspea kers,accountingfortheremaining75percent, theresearcherensuresthatthis1:3ratioofmi ddle toworking classspeakers{andtokens} (Bayley2002:118).Whileitisinsomesenseimp ossibletoincludeeverypredictorthatmightb erelevanttotheoutcomesofinterest,astatis ticalmodelisoflittleuseforinferringacaus alconnectionbetweenpredictorsandoutcomes ifoneormoreimportantpredictorshavebeenom itted,Forinstance, ,andfindsthatbotharesignificant, , ,butwhentheyareQUANTITATIVEANALYSIS217co mbinedinthesameregressionmodel,onlyoneof thetwotag,phonologicalcontext)issignific anttheotherpredictor(cg,grammaticalcateg ory)issaidtohavebeensuppressedleg,Taglia monte&Templezoos).
3 Suchasituationcouldariseifthetwopredicto rsarecorrelated,forexample,ifcertaingram mati-calcategoriestendtoco-occurwithcert ainphonologicalcontexts( , ),but; dictorsstandinacausalrelationshipwiththe outcome( ,bothphonologicalcontextandgrammaticalca tegoryincreaserateofdeletion), , orthogonal, thatis, linear( ,stronglynonorthogonal) (2010)givesanexampleOfaspurioussocioling uisticfindingduetomulticollinearitybetwe enmeasuresofsocioeconomicstatus,anddemon stratesthemethodofresidualiaation, , ,bothinthefieldandthelaboratory,togather manydatapointsfromeachspeakerorsubject, ,itisnecessarytodistinguishbetweenagende reffectinthepopulationandthepresenceinth esampleofafewspeakerswhojusthappentobema leandfurthermoreare outliers fromtherestofthesample; ,evenaftergender,age,andsocialstatusaret akenintoaccount(Guy1980,1991:5),speakeri dentityisastrongprefdictoroflinguisticbe havior, ,etc;everytokenfrom CelesteS. alsohasthesamevalueforthegenderpredictor ( female"),age(45),etc, , whetherpredictorsoroutcomesionacontinuou sorintegerscale,butconvertsthesevaluesto afew valued(oftenbinary) {( 2 totreatdatathatarenaturallymanpvaluedasa fewvvalueciscale}itusuallyincreasesthech anceofTypellerror,theerroroffailingtorej ectthenullhypothesisinthecasewhenthisnul ihypothesisisinfactfalse(Cohen1983).)
4 Ifaresearcherpositsasoundchangeinprogres sinaspeechcommunity,thena78 yeareoldspeakershouldbelessadvancedwithr especttothischangethana60-year oldspeaker,butifthesetwospeakersareplace dtogetherintothe 60yearsofageandolder bin, :binningusuallyrequirestheresearchertoar bitrarilychoosethenumberandlocationofthe cutpointts)betweenbins, foundereffect ofVARBRUL anditsdescendants, ,itisincorrecttoassumethatVARBRUL Sfeaturesetdelimitsthesetofpossiblesocio linguisticanalyses,andtheuseofcontinuous predictorsand/oroutcomesinsociolinguisti csdatesbackatleastasfarasLennig s(1978') ,andmorespecifically, ,whichanumberofstudieshavefoundtobecurvi lin ear,withinteriorsocialclassesusingthehig hestratesofanonstandardvariantofastablel inguisticvariable(Labov2001:3if.).Insuch cases,theappropriateresponsetothisproble m,though,isnotadhocdichotomization,butra therfortheresearchertoexploretherelation shipsobservedinthedata( ,byplottingthepredictorandoutcome),andch oosingappropriate transformations , ,theexemplartheoryoflenition( ,Bybee2002)predictsarelationshipbetweent helogarithmofwordfrequencyandtherateofle nition, (2001:16 26) , (categoricalQUANTITATIVEANALYSIS219la, ).
5 Ihefollowmgsectionconsidersmethodsorcont inuousoutcomes,Withafocusonacousticmeasu rementsofvowelsTheconcludingsectiondiscu ssessomerecenttrendsinthefieldofstatisti csofrel-evancetosociolinguists. METHODSFORBINARYVARIABLESI nterpretingCross-TabulationsManyquantita tivesociolinguisticstudiescomparetwodist inctdiscretesen,tlcallyequivalentvariant sincomplementarydistribution.:3mmThe clii ,WilliamLabovelicitedtokensofthephrasefo urthfloor"fromemployeesinthreeManhattand epartmentstoresforthepurposeofstudyingth esocialstratificationofpost (Labov2006:chapter4)firstoch lishedin1966,doesnotincludeanyinferentia lstatistics,thecross:tabul:i)tio-oftheda ta( , )lendsitselftoaSimlestat' , spronouncepost-vocalicrin125tokens,anddo notin211tokens;rispresent aerctofthetime(:125/336). ,thedepartmentstorerepresentinothe;upper class,hasa48percentrateof1 'effectisduetochance,thesecountsareusedt ocomputeateststatisticcall1:Pearson schi- :abilityofateststatisticofthissizeorlarg erbeingobtainedforasamle112::Sizesimplyb ychanceusingthetwo-tailedchi squaredistributionTffe13115representingt hispossibilityisp:LIE-16, 'isuetoI CJEC ithenullhypothesisthattherearenodifferen cesintheErealizdt'on:Iarlipnlgtl egdifferentdepartmentstores,andtheaverag eratesof1 presenceiii:u(.
6 Highe::eUCi:ydCil<:a:t:e:-hatpost-vocali crisrealizedmoreoftenbyspeakersfromFishe r sexacttest,Thechi squaretestisnotveryappropriateforsmall:E iounts{ofd a tassinceitisbasedonanapproximationthatis trueundertheViouslyfalseassumptionofanmf initesample; ,Wefavorsrelatedtech:quenownas["lShCIsex acttest,whichcomputesthe exact"( ) ,theFisherpevalue220 METHODOLOGIESANDAPPROACHI SissomewhatsmallerthanthePearsonchi squarep value{ ), valueisoftendifficulttocomputebyhand,but sinceitcanbecomputedforhugedatasetsbyamo derncomputerintheblinkofaneye,itshouldal waysbeusedratherthanthechi ,Laborfeignedmisunderstandingafterthefir st fourthfloor, usuallycausingthespeakertorepeathim orherself, ,Labovrecordedwhethereachtokencomesfrom fourth or floor." ;wordanddepartmentstorearesignificantpre -dictors, ,itisprefer ,thep ,whichpredictsbinaryoutcomeusingoneormor eindependentpredictorts),andwhichwillbef amiliartomanyreadersasthemodelunderlying VARBRUL, ,theoutcomeiseitherrorzero;thepredictors ,allcategorical,areword( fourth vs.]}}
7 Floor ),repetition( ),andstore( s}.Modernregressionsoftwarealsoallowsthe usertoincludewhataregenerallycalledinter actioneffects, (r)cross-tabulation,chi square,andFisherexacttestxi1 azeroat]rpivaluepevaluc(chi-square)(Fish erexact) fourth o7"loor" , 'I'IVEANALYSESIn.)[Jt_1inthiscase,aninte ractionbetweenwordanddepartmentstoreallo wstheresearchertoprobewhether,inaddition toanydifferencesbetween fourth"and floor andthedifferentdepartmentstores,thereisa nydifferenceinthedifferencebetween fourth and floor" fourth versus floor atSaksdifferentfrom fourth"versus floor s?Thereisnoobviousreasontohypothesizesuc haninteractioninthiscase, ,whichreportsnumbersinaformthatwillbefam iliarbothtousersofVARBRUL andothersoftwarepackages(whomayknowlog-o ddsasbetas,coefncients,orestimates) ,absenceofristreatedasruleapplication,so anincreaseinthelog-oddsortheweightsindic atesfewerr variatetestsmentionedearlier, ,withthesec0ndrepetitionbeingmorelikelyt ocontainanovert1 thanthefirst, ,whichtakentogetherarenonsignificant,the reisonesuggestivetrend: fourth hasmorerthan floor s, :howdoesonedecidewhichpredictorstoinclud eandwhichtoomit?
8 Ausefulprocedure,adaptedfromGelmanandHil l( :69), (1')fixed-effectslogisticregressionlog-o ddsweightp value(intercept) ,Klein s "floor" fourth sand fourth" o,2390, sand"floor sand"fourth 'sand"floor "lourth" ," " :to[ , ,buttheestimate(orfactorweight)goesinthe expecteddirection, ,andtheestimategoesinanunexpecteddirecti on, ,buttheestimategoesinanunexpecteddirecti on, , tionsinthesensethatthesamepredictorsares ignificant, ,intheabsenceofmulti-collinearity,thep ,andpotentiallyreportingnon-significante ffects,con-trastswiththeuseofautomatedst epwisemodelselectiontechniques,suchasisf oundinVARBRUL,whichmaybefamiliartomanyso ciolinguistsbutwhicharethetargetofderisi onbymanystatisticians( ,Harrell2001:56,79f). ,astheybeginwithafullmodel(containingall thepredictors),butthereisnocompelrlingre asontheresearchershouldn ,itisbeneficial,andifitdoesnot, , sdistributionaccordingtodepartmentstore, thegrammar-internaleffectsofdifferentpho nologicaicontext( fourth vs. floor ),andcontrastswithrespecttostyle(repetit ion).]
9 Sincetherearenomorethanfourtokensperspea ker,and264speakersinthesample,thereisnor easontobelievethatsomespeakeroutlierisdr ivingthetrend:evenifsomespeakersinthissa mpledodifferdrasticallyfromtherestofthep opulationintheirusageofpost vocalicr, ,itisgenerallyunderstoodthatspeakersmayd ii (seeabove).Asalreadymentioned, ,andthenperforminferenceoverthecoefficie ntsoftheindividualmodels( :chapter12;Rousseau8:Sankoff1978;Guy1980 ),butthisdoesnotallowustoconstrainspeake rsfromthesamespeechcommunitytobehavethes amewithrespecttogrammaticaiconstraintson variation,despiteourstrongbiasthatspeake rsfromthesamecommunitysharetheseconstrai nts{Guy1980)Mixed effectsRegressionMixed effectsmodels(Pinheiro&Bates2000)arearec entinnovationinregressionwhichallowfor,i nadditiontothefamiliarstratumoffixed-eff ectspredicwtors, effectsmodelaugmentsastandardregressionw itharandomintercept,whichisapredictorcon sistingofmanylevels(suchasuniqueidentifi ersforthedifferentspeakersinthesameple). Duringmodelfitting,thevarianceattributab letodifferentlevelsoftherandomintercepti sestimated,andeachleveloftherandom ,grammaticaleffects,andsoon,arecontrolle dfor,thereisnoeffectofwordidentitvonsoci ophoneticvariables,buttherearemanyreport sofpurelylexicaleffectsinvariation( ,Neu1980:50).}
10 However,wordsandgrammaticalcategorymaybe inanestingrelationship, ,itreturnsreduced,andmoreaccurate,signif icancelevels( ,smallerpevalues)com, grantsintheUnitedKingdomcollectedbySchle el ,Clark,andMeyerhoff(2011);here, (ing) <ing> ,thedataalsocontainsathirdcategory,where thevariableisrealizedwithanoralvelarstop ( ,{inkl}.Henceforth,thisfinalvariantisign ored, ,afixedreffectsregressionidenti fiesthreesignificantbetweenewordpredicto rs(precedingphonologicalseg-ment,grammat icalcategory,andlexicalfrequency)andthre esignificantbetweenespeakerpredictors(ge nder, ), ,whileahigherdegreeofEnglishproficiencyr esultsinahigherrateofthecomenalvariant, fectmeasuresofthespeaker scontactwithfirst languageEnglish, ,gender, (Labov2001 ),Polishwomen(22percentcoronaltokensfrom 12females)favorthestigmatizedcoronalvari antmorethanmen{9percentcoronaltokensfrom ninemen).Whilethisdifferentinrateissomew hatsmallinabsoluteterms,thefixed effectsmodeltreatsgenderassignificant( ). wordpre-dictorsarestillfoundtobesignific ant{thereportedsignificancelevelsarenowr oughlyp= ), , ,noneofthethreebetween-speakerpredictors reachessignificance,andonecannotrejectth enullhypothesisthattheyhavenoeffectontin g).}}