Transcription of INTRODUCTION TO MACHINE LEARNING - Amazon S3
1 INTRODUCTION TO MACHINE LEARNINGM easuring model performance or errorIntroduction to MACHINE LEARNING Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression ClusteringIntroduction to MACHINE LEARNING Classification Accuracy and Error System is right or wrong Accuracy goes up when Error goes downAccuracy =correctly classified instancestotal amount of classified instancesError = 1 - AccuracyIntroduction to MACHINE LEARNING Example Squares with 2 features: small/big and solid/do!ed Label: colored/not colored Binary classification problemIntroduction to MACHINE LEARNING Example ==3560% TruthPredicted INTRODUCTION to MACHINE LEARNING Example ==3560% TruthPredicted INTRODUCTION to MACHINE LEARNING Limits of accuracy Classifying very rare heart disease Classify all as negative (not sick) Predict 99 correct (not sick) and miss 1 Accuracy: 99% you miss every positive case!
2 INTRODUCTION to MACHINE LEARNING Confusion matrix Rows and columns contain all available labels Each cell contains frequency of instances that are classified in a certain wayIntroduction to MACHINE LEARNING Confusion matrix Binary classifier: positive or negative (1 or 0)PredictionPNTruthpTPFNnFPTNI ntroduction to MACHINE LEARNING Confusion matrixPredictionPNTruthpTPFNnFPTNTrue Positives Prediction: P Truth: P Binary classifier: positive or negative (1 or 0) INTRODUCTION to MACHINE LEARNING Confusion matrixPredictionPNTruthpTPFNnFPTNTrue Negatives Prediction: N Truth: N Binary classifier: positive or negative (1 or 0) INTRODUCTION to MACHINE LEARNING Confusion matrix Binary classifier: positive or negative (1 or 0)PredictionPNTruthpTPFNnFPTNF alse Negatives Prediction: N Truth: PIntroduction to MACHINE LEARNING Binary classifier: positive or negative (1 or 0)Confusion matrixPredictionPNTruthpTPFNnFPTNF alse Positives Prediction: P Truth: N INTRODUCTION to MACHINE LEARNING Accuracy Precision RecallRatios in the confusion matrixPredictionPNTruthpTPFNnFPTNI ntroduction to MACHINE LEARNING PredictionPNTruthpTPFNnFPTNR atios in the confusion matrixPrecision TP/(TP+FP) Accuracy Precision RecallIntroduction to MACHINE LEARNING PredictionPNTruthpTPFNnFPTNR atios in the confusion matrixPrecision TP/(TP+FP) Accuracy Precision RecallIntroduction to MACHINE LEARNING PredictionPNTruthpTPFNnFPTNR atios in the confusion matrixRecall TP/(TP+FN) Accuracy Precision RecallIntroduction to MACHINE LEARNING PredictionPNTruthpTPFNnFPTNR atios in the confusion matrixRecall TP/(TP+FN)
3 Accuracy Precision RecallIntroduction to MACHINE LEARNING Back to the squaresPredictionPNTruthp11n12 TruthPredictedIntroduction to MACHINE LEARNING Back to the squaresPredictionPNTruthp11n12 TruthPredicted INTRODUCTION to MACHINE LEARNING Back to the squaresPredictionPNTruthp11n12 TruthPredicted INTRODUCTION to MACHINE LEARNING Back to the squaresPredictionPNTruthp11n12 TruthPredicted INTRODUCTION to MACHINE LEARNING Back to the squaresPredictionPNTruthp11n12 TruthPredicted INTRODUCTION to MACHINE LEARNING Back to the squares Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50%PredictionPNTruthp11n12 TruthPredicted INTRODUCTION to MACHINE LEARNING Back to the squares Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50%PredictionPNTruthp11n12 TruthPredicted INTRODUCTION to MACHINE LEARNING Back to the squares Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50%PredictionPNTruthp11n12 TruthPredicted INTRODUCTION to MACHINE LEARNING Back to the squaresPredictionPNTruthp11n12 TruthPredicted Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60% Precision: TP/(TP+FP) = 1/(1+1) = 50% Recall: TP/(TP+FN) = 1/(1+1) = 50% INTRODUCTION to MACHINE LEARNING Rare heart disease Accuracy: 99/(99+1) = 99% Recall: 0/1 = 0% Precision: undefined no positive predictionsPredictionPNTruthp01n099 INTRODUCTION to MACHINE LEARNING Regression.
4 RMSE Root Mean Squared Error (RMSE) Mean distance between estimates and regression line 67891011126789101112X1X2 INTRODUCTION to MACHINE LEARNING Clustering No label information Need distance metric between pointsIntroduction to MACHINE LEARNING Clustering Performance measure consists of 2 elements Similarity within each cluster Similarity between clusters INTRODUCTION to MACHINE LEARNING 50510 50510X1X2 Within cluster similarity Within sum of squares (WSS) Diameter MinimizeIntroduction to MACHINE LEARNING 50510 50510X1X2 Between cluster similarity Between cluster sum of squares (BSS) Intercluster distance MaximizeIntroduction to MACHINE LEARNING Dunn s index 50510 50510X1X2 minimal intercluster distancemaximal diameterINTRODUCTION TO MACHINE LEARNINGLet s practice!
5 INTRODUCTION TO MACHINE LEARNINGT raining set and test setIntroduction to MACHINE LEARNING MACHINE LEARNING - statistics Predictive power vs. descriptive power Supervised LEARNING : model must predict unseen observations Classical statistics: model must fit data explain or describe dataIntroduction to MACHINE LEARNING Predictive model Training not on complete dataset training set Test set to evaluate performance of model Sets are disjoint: NO OVERLAP Model tested on unseen observations -> Generalization! INTRODUCTION to MACHINE LEARNING Split the dataset N instances (=observations): X K features: F Class labels: ,1x1, ,Ky1x2x2,1x2, , ,1xr, ,Kyrxr+1xr+1,1xr+1, +1,Kyr+1xr+2xr+2,1xr+2, +2,Kyr+ ,1xN, ,KyNTraining setTest setIntroduction to MACHINE LEARNING Split the dataset N instances (=observations): X K features: F Class labels: ,1x1, ,Ky1x2x2,1x2, , ,1xr, ,Kyrxr+1xr+1,1xr+1, +1,Kyr+1xr+2xr+2,1xr+2, +2,Kyr+ ,1xN, ,KyNTest setTraining setIntroduction to MACHINE LEARNING Split the dataset N instances (=observations): X K features: F Class labels: ,1x1, ,Ky1x2x2,1x2, , ,1xr, ,Kyrxr+1xr+1,1xr+1, +1,Kyr+1xr+2xr+2,1xr+2, +2,Kyr+ ,1xN, ,KyNTraining setTest setIntroduction to MACHINE LEARNING ,1x1, ,Ky1x2x2,1x2, , ,1xr, ,Kyrxr+1xr+1,1xr+1, +1,Kyr+1xr+2xr+2,1xr+2, +2,Kyr+ ,1xN, ,KyNSplit the dataset N instances (=observations): X K features: F Class labels.
6 YTraining setTest setIntroduction to MACHINE LEARNING Split the ,1x1, ,Ky1x2x2,1x2, , ,1xr, ,Kyrxr+1xr+1,1xr+1, +1,Kyr+1xr+2xr+2,1xr+2, +2,Kyr+ ,1xN, ,KyNTraining setTest setIntroduction to MACHINE LEARNING Split the ,1x1, ,Ky1x2x2,1x2, , ,1xr, ,Kyrxr+1xr+1,1xr+1, +1,Kyr+1xr+2xr+2,1xr+2, +2,Kyr+ ,1xN, ,KyNUse to predict y: Training setTest setIntroduction to MACHINE LEARNING Split the ,1x1, ,Ky1x2x2,1x2, , ,1xr, ,Kyrxr+1xr+1,1xr+1, +1,Kyr+1xr+2xr+2,1xr+2, +2,Kyr+ ,1xN, ,KyNUse to predict y: real ycompare themTraining setTest setIntroduction to MACHINE LEARNING When to use training/test set? Supervised LEARNING Not for unsupervised (clustering) Data not labeledIntroduction to MACHINE LEARNING Predictive power of modelTrain modelTest modelTraining setTest setPerformance measurePredictive powerUse modelIntroduction to MACHINE LEARNING How to split the sets? Which observations go where? Training set larger test set Typically about 3/1 Quite arbitrary Generally: more data = be!
7 Er model Test set not too smallIntroduction to MACHINE LEARNING Distribution of the sets Classification classes must have similar distributions avoid a class not being available in a set Classification & regression shuffle dataset before spli"ingIntroduction to MACHINE LEARNING Effect of sampling Sampling can affect performance measures Add robustness to these measures: cross-validation Idea: sample multiple times, with different separationsIntroduction to MACHINE LEARNING Cross-validationTest setTest setTest setTest setTraining setTraining setTraining setTraining set : 4-fold cross-validationIntroduction to MACHINE LEARNING Cross-validation : 4-fold cross-validationTest setTest setTest setTest setTraining setTraining setTraining setTraining setIntroduction to MACHINE LEARNING Cross-validation : 4-fold cross-validationTest setTest setTest setTest setTraining setTraining setTraining setTraining setIntroduction to MACHINE LEARNING Cross-validation : 4-fold cross-validationaggregate results for robust measureTest setTest setTest setTest setTraining setTraining setTraining setTraining setIntroduction to MACHINE LEARNING n-fold cross-validation Fold test set over dataset n times Each test set is 1/n size of total datasetINTRODUCTION TO MACHINE LEARNINGLet s practice!
8 INTRODUCTION TO MACHINE LEARNINGBias and VarianceIntroduction to MACHINE LEARNING What you ve learned? Accuracy and other performance measures Training and test setIntroduction to MACHINE LEARNING Kni!ing it all together Effect of spli"ing dataset (train/test) on accuracy Over- and underfi"ingIntroduction to MACHINE LEARNING IntroducingBIASVARIANCEI ntroduction to MACHINE LEARNING Bias and Variance Main goal of supervised LEARNING : prediction Prediction error ~ reducible + irreducible errorIntroduction to MACHINE LEARNING Irreducible - reducible error Irreducible: noise don t minimize Reducible: error due to unfit model minimize Reducible error is split into bias and varianceIntroduction to MACHINE LEARNING Bias Error due to bias: wrong assumptions Difference predictions and truth using models trained by specific LEARNING algorithmIntroduction to MACHINE LEARNING ExampleIntroduction to MACHINE LEARNING Example Quadratic dataIntroduction to MACHINE LEARNING Example Quadratic data Assumption: data is linear use linear regressionIntroduction to MACHINE LEARNING Example Quadratic data Assumption.
9 Data is linear use linear regression Error due to bias is high: more restrictions on modelIntroduction to MACHINE LEARNING Bias Complexity of model More restrictions lead to high bias INTRODUCTION to MACHINE LEARNING Variance Error due to variance: error due to the sampling of the training set Model with high variance fits training set closelyIntroduction to MACHINE LEARNING Example Quadratic data Few restrictions: fit polynomial perfectly through training set If you change training set, model will change completelyhigh variance: generalizes bad to test setIntroduction to MACHINE LEARNING Bias-variance tradeoffBIASVARIANCElow variance - high biaslow bias - high varianceIntroduction to MACHINE LEARNING Overfi!ing Accuracy will depend on dataset split (train/test) High variance will heavily depend on split Overfi!ing = model fits training set a lot be!er than test set Too specificIntroduction to MACHINE LEARNING Underfi!ing Restricting your model too much High bias Too generalIntroduction to MACHINE LEARNING Example - spam or not?
10 TruthA lot of capital letters?yesA lot of exclamation marks?yesspamEmails training setcapital letters exclamation marksno spamno spamnonoexception with 50 capital letters30 exclamation marksis no spamIntroduction to MACHINE LEARNING Emails training setcapital letters exclamation marksexception with 50 capital letters30 exclamation marksis no spamno spamyesyesOverfitA lot of capital letters?A lot of exclamation marks?yesno spamno spamnonoyes50 capital letters?spamno30 exclamation marks?spamnoExample - spam or not?too specific! INTRODUCTION to MACHINE LEARNING UnderfitMore than 10 capital letters?yesspamEmails training setcapital letters exclamation marksno spamnoExample - spam or not?too general! INTRODUCTION TO MACHINE LEARNINGLet s practice!