Transcription of Data Science Cheatsheet 2
{{id}} {{{paragraph}}}
Data Science Cheatsheet Updated June 19, 2021 DistributionsDiscreteBinomial-xsuccesses innevents, each withpprobability (nx)pxqn x, with =npand 2=npq If n = 1, this is a Bernoulli distributionGeometric- first success withpprobability on thenthtrial qn 1p, with = 1/pand 2=1 pp2 Negative Binomial- number of failures beforersuccessesHypergeometric-xsuccesse s inndraws, no replacement,from a sizeNpopulation withXitems of that feature (Xx)(N Xn x)(Nn), with =nXNPoisson- number of successesxin a fixed time interval, wheresuccess occurs at an average rate xe x!, with = 2= ContinuousUniform- all values betweenaandbare equally likely 1b awith =a+b2and 2=(b a)212orn2 112if discreteNormal/GaussianN( , ), Standard NormalZ N(0,1) Central Limit Theorem - sample mean of dataapproaches normal distribution Empirical Rule - 68%, 95%, and of values lie withinone, two, and three standard deviations of the mean Normal Approximation - discrete distributions such asBinomial and Poisson can be approximated using z-scoreswhennp,nq, and are greater than 10 Exponential- mem
Principal Component Analysis Projects data onto orthogonal vectors that maximize variance. Remember, given an n nmatrix A, a nonzero vector ~x, and a scaler , if A~x= ~xthen ~xand are an eigenvector and eigenvalue of A. In PCA, the eigenvectors are uncorrelated and represent principal components. 1.Start with the covariance matrix of ...
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}