Example: bankruptcy

Ramses F Sadek, PhD GRU Cancer Center - …

BasicBiostatisticsforClinical ResearchRamses F Sadek, PhDGRU CancerCenter1 Part One1. Basic Concepts2. Data & Their Presentation21. Basic Concepts Statistics Biostatistics Populations and samples Statistics and parameters Statistical inferences variables Random Variables simple random sample3 StatisticsStatistics is a field of study concerned with1-collection, organization, summarization and analysis of data. 2-drawing of inferences about a population when only a part of the data is try to interpret and communicate the results to others. 4 Biostatistics Biostatisticscan be defined as the application of the mathematical tools used in statistics to the fields of biological sciences and medicine. Biostatistics is a growing field with applications in many areas of biology including epidemiology, medical sciences, health sciences, educational research and environmental A variableis an object, characteristic or property that can have different values in different places, persons, or things.

1. Basic Concepts • Statistics • Biostatistics • Populations and samples • Statistics and parameters • Statistical inferences • variables • Random Variables • Simple

Tags:

  Simple

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Ramses F Sadek, PhD GRU Cancer Center - …

1 BasicBiostatisticsforClinical ResearchRamses F Sadek, PhDGRU CancerCenter1 Part One1. Basic Concepts2. Data & Their Presentation21. Basic Concepts Statistics Biostatistics Populations and samples Statistics and parameters Statistical inferences variables Random Variables simple random sample3 StatisticsStatistics is a field of study concerned with1-collection, organization, summarization and analysis of data. 2-drawing of inferences about a population when only a part of the data is try to interpret and communicate the results to others. 4 Biostatistics Biostatisticscan be defined as the application of the mathematical tools used in statistics to the fields of biological sciences and medicine. Biostatistics is a growing field with applications in many areas of biology including epidemiology, medical sciences, health sciences, educational research and environmental A variableis an object, characteristic or property that can have different values in different places, persons, or things.

2 A quantitative variablecan be measured in some way. Examples: Heart rate, heights, weight, age, size of tumor, volume of a dose. A qualitative (categorical) variableis characterized by its inability to be measured but it can be sorted into categories. Examples: gender, race, drug name, disease and Samples A populationis the collection or set of all of the values that a variable may have. A sampleis a part of a population. We use the data from the sample to make inference about the population The sample mean is not true mean but might be very close. Closeness depends on sample ofinterestsample7 Sampling Approaches-1 Convenience Sampling: select the most accessible and available subjects in target population. Inexpensive, less time consuming, but sample is nearly always non-representative of target population. Random Sampling ( simple ): select subjects at random from the target population. Need to identify all in target population first. Provides representative sample frequently.

3 8 Sampling Approaches-2 Systematic Sampling: Identify all in target population, and select every xthperson as a subject. Stratified Sampling: Identify important sub-groups in your target population. Sample from these groups randomly or by convenience. Ensures that important sub-groups are included in sample. May not be representative. More complex sampling9 Sampling Error The discrepancy between the true population parameter and the sample statistic Sampling error likely exists in most studies, but can be reduced by using larger sample sizes Sampling error approximates 1 / n Note that larger sample sizes also require time and expense to obtain, and that large sample sizes do not eliminate sampling error10 Parametersvs. Statistics Aparameterisapopulationcharacteristic A statistic isasamplecharacteristic Example:we estimate the sample mean to tell us about the true populationmean the sample mean is a statistic thepopulationmeanisa parameter 11 Descriptive & Inferential Statistics12 Descriptive Statisticsdeal with the enumeration, organization and graphical representation of data from a sampleInferential Statisticsdeal with reaching conclusions from incomplete information, that is, generalizing from the specific sample Inferential statistics use available information in a sample to draw inferences about the population from which the sample was selectedRandom Variables A random variableis one that cannot be predicted in advance because it arises by or measurements are used to obtain the value of a random variable.

4 A discrete random variablehas gaps or interruptions in the values that it can have. The values may be whole numbers or have spaces between them. A continuous random variabledoes not have gaps in the values it can assume. Its properties are like the real and Their Presentation Data Data sources Records Surveys Experiments Types of data Categorical variables Frequency tables Numerical variables Categorization Bar charts Histograms Box plots Bar charts by another variable Histogram by another variable Box plots by another variable Scatter plots14 Data The raw material of Statistics is data. We may define data as figures. Figures result from the process of counting or from taking a measurement. Example: -When a hospital administrator counts the number of patients (counting). -When a nurse weighs a patient (measurement)15 Sources of DataData are obtained from Records Surveys Experiments16 Data Sources: Records, Reports and Other SourcesLook for data to serve as the raw material for our kept medical records contain immense amounts of information on accounting records contain a wealth of data on the facility s business data needed to answer a question may already exist in the form of published reports, commercially available data banks, or the research literature, someone else has already asked the same Sources: SurveysSurveymay be necessary if the data needed is about answering certain : If the administrator of a clinic wishes to obtain information regarding the mode of transportation used by patients to visit the clinic, then a survey may be conducted among patients to obtain this information18 Data Sources: ExperimentsFrequently the data needed to answer a question are available only as the result of an example.

5 If a nurse wishes to know which of several strategies is best for maximizing patient compliance, she might conduct an experiment in which the different strategies of motivating compliance are tried with different patients. Clinical trials is the most obvious of Data Data are made up of a set of variables: Categorical variable Numerical variables20 Categorical Variables Any variable that is not numerical (values have no numerical meaning) ( gender, race, drug, disease status) Nominal variables The data are unordered ( RACE: 1=Caucasian, 2=Asian American, 3=African American, 4=others) A subset of these variables are Binary or dichotomous variables: have only two categories ( GENDER: 1=male, 2=female) Ordinal variables The data are ordered ( AGE: 1=10-19 years, 2=20-29 years, 3=30-39 years; likelihood of participating in a vaccine trial). Income: Low, medium, Categorical variables are summarized by Frequency counts how many are in each category Relative frequency or percent (a number from 0 to 100) Or proportion (a number from 0 to 1)Gender of new HIV clinic patients, 2006-2007, Mbarara, (%)Male415 (39)Female645 (61) Total1060 (100)22 NumericalVariables (Quantitative) Naturally measured as numbers for which meaningful arithmetic operations make sense ( height, weight, age, salary, viral load, CD4 cell counts) Discrete variables: can be counted ( number of children in household: 0, 1, 2, 3, etc.)

6 Continuous variables: can take any value within a given range ( weight: g, g)23 Manipulation of Variables Continuous variables can be discretized , age can be rounded to whole numbers Continuous or discrete variables can be categorized , age categories Categorical variables can be re-categorized , lumping from 5 categories down to 224 Categorization Continuous variables can categorized in meaningful ways Choice of cut-off points Even intervals (5 year age intervals) Meaningful cut-points related to a health outcome or decision Meaningful CD4 count (below 200, -350, -500, 500+) Equal percentage of the data falling into each category (quartiles, centiles,..)25 Organizing Dataand PresentationSome of common methods: Frequency Table Frequency Histogram Relative Frequency Histogram Frequency polygon Relative Frequency polygon Bar chart Pie chart Box plot Scatter TablesCD4 cell counts (mm3) of newly diagnosed HIV positives at MulagoHospital, Kampala (N=268)n (%)< 5040 ( )50-20072 ( )201-35058 ( )>35098 ( )

7 27 Bar Charts General graph for categorical variables Graphical equivalent of a frequency table The x-axis does not have to be numericalAlcohol consumption in Mulago Hospital patients enrolling in VCT study, n= >1 year agoWithin the pastyearProportion28 Histograms Bar chart for numerical data The number of bins and the bin width will make a difference in the appearance of this plot and may affect interpretation051015 Percent050010001500CD4 cell countCD4 among new HIV positives at Mulago29 Histograms This histogram has less detail but gives us the % of persons with CD4 <350 cells/mm30204060 Percent050010001500CD4 cell countCD4 among new HIV positives at Mulago30 What Does This Graph Tell Us? freq0102030 DaysDays drank alcohol among current drinkers31 Box Plots Middle line=median (50thpercentile) Middle box=25thto 75thpercentiles (interquartilerange) Bottom whisker: Data point at or above 25thpercentile *IQR Top whisker: Data point at or below 75thpercentile + *IQR0102030 Days drank alcohol32 Box Plots05001,0001,500cd4countCD4 count among new HIV positives at Mulago33 Box Plots By Another Variable We can divide up our graphs by another variable What type of variable is gender?

8 0102030malefemaleDays drank alcoholGraphs by a1. sex34 Histograms By Another freqDays consumed alcohol of prior 3035 Frequency Polygon Use to identify the distribution of your data012345678920-30-40-50-60-69 Age in yearsFrequencyFemaleMale 36 Scatter Plots050010001500CD4 cell count102030405060a4. how old are you?CD4 cell count versus age37 Part TwoNumerical Variable Summaries and Measures38 Measures of Central Tendency and Dispersion Where is the Center of the data? Median Mean Mode How variable the data are? Range, Interquartile range Variance, Standard Deviation, Standard Error Coefficient of variation39 Measures of Central Tendency: Median Median the 50thpercentile = the middle value If n is odd: the median is the (n+1)/2 observations ( if n=31 then median is the 16thhighest observation) If n is even: the median is the average of the two middle observations ( if n=30 then the median is the average of the 15thand16th observation Example; Median Progression Free Survival in a Cancer Clinical trial sample= 195 of Central Tendency: Mode Mode the value (or range of values) that occurs most frequently Sometimes there is more than one mode, a bi-modal distribution (both modes do not have to be the same height) The mode only makes sense when the values are discrete, rounded off, or binned0510152025306267727782879297 Gradesf41 Measures of Central Tendency: Mean Mean arithmetic average Means are sensitive to very large or small values (outliers), while the median is not.)

9 Mean CD4 cell count: Mean age: When data is highly skewed, the median is preferred. Both measures are close when data are not skewed. niixnxMean11 :42 Interpreting the Formula is the symbol for the sum of the elements immediately to the right of the symbol These elements are indexed ( subscripted) with the letter i The index letter could be any letter, though iis commonly used) The elements are lined up in a list, and the first one in the list is denoted as x1, the second one is x2, the third one is x3and the last one is xn. n is the number of elements in the list. niixnxMean11 :nniixxxx ..21143 Measures of Variation: Range , IQR 1. Range Minimum to maximum or difference ( age range 15-58 or range=43) CD4 cell count range: (0-1368) 2. Interquartile range (IQR) 25thand 75thpercentiles ( IQR for age: 23-36) or difference ( 13) Less sensitive to extreme values CD4 cell count IQR: (92-422)44 Measures of Variation: Variance SD, SE Sample variance Amount of spread around the mean, calculated in a sample by Sample standard deviation (SD)or just s is the square root of the variance The standard deviation has the same units as the mean Standard error (SE) = s of CD4 cell count = s of Age = 1)(122 nxxsnii1)(12 nxxsniiNS/45 Meanand Standard deviation7 7 7 7 7 7 7 8 7 7 7 6 3 2 7 8 13 9 Mean = 7SD=0 Mean = 7SD= = 7SD= of Variation: CV Coefficient of variation For the same relative spread around a mean, the variance will be larger for a larger mean Can use to compare variability across measurements that are on a different scale ( IQ and head circumference) CV for CD4 cell count: CV for age.

10 *xsCV 47 Empirical RuleForaNormaldistributionapproximately, a) 68% of the measurements fall within one standard deviation around the meanb) 95% of the measurements fall within two standard deviations around the meanc) of the measurements fall within three standard deviations around the mean48 Suppose the reaction time of a particular drug has a Normal distribution with a mean of 10 minutes and a standard deviation of 2 minutesApproximately,a) 68% of the subjects taking the drug will have reaction time between 8 and 12 minutesb) 95% of the subjects taking the drug will have reaction tome between 6 and 14 minutesc) of the subjects taking the drug will have reaction tome between 4 and 16 minutes49 Part ThreeBiostatistics in Clinical Research Role of statistics Statistician role In research In development Study design Controlled designs Observational studies Cohort Case-control50 Role of Statistics In general, statistics is a collection of techniques for extracting information from data, and for ensuring that the data collected contains the desired information.


Related search queries