Introduction to Machine Learning Final Exam

CS 189 Spring 2020 Introduction toMachine LearningFinal Exam The exam is open book, open notes, and open web. However, you may not consult or communicate with other people(besides your exam proctors). You will submit your answers to the multiple-choice questions through Gradescope via the assignment Final Exam Multiple Choice ; pleasedo notsubmit your multiple-choice answers on paper. By contrast, you will submit youranswers to the written questions by writing them on paper by hand, scanning them, and submitting them through Grade-scope via the assignment Final Exam Writeup. Please write your name at the top of each page of your written answers.

(You may do this before the exam.) You have 180 minutes to complete the midterm exam (3:00 6:00 PM). (If you are in the DSP program and have anallowance of 150% or 200% time, that comes to 270 minutes or 360 minutes, respectively.) When the exam ends (6:00 PM),stop writing. You must submit your multiple-choice answers before 6:00 PM multiple-choice submissions will be penalized at a rate of 5 points per minute after 6:00 PM. (The multiple-choicequestions are worth 60 points total.) From 6:00 PM, you have 15 minutes to scan the written portion of your exam and turn it into Gradescope via theassignment Final Exam Writeup.

Most of you will use your cellphone and a third-party scanning app. If you have aphysical scanner, you may use that. Late written submissions will be penalized at a rate of 5 points per minute after 6:15PM. Mark your answers to multiple-choice questions directly into Gradescope. Write your answers to written questions onblank label all written questions and all subparts of each written question. Show your work inwritten questions. Following the exam, you must use Gradescope spage selection mechanismto mark which questions are on which pagesof your exam (as you do for the homeworks). The total number of points is 150. There are 16 multiple choice questions worth 4 points each, and six written questionsworth 86 points total.

For multiple answer questions, fill in the bubbles forALL correct choices:there may be more than one correct choice,but there is always at least one correct partial crediton multiple answer questions: the set of all correctanswers must be nameLast nameSID1Q1. [64 pts] Multiple AnswerFill in the bubbles forALL correct choices: there may be more than one correct choice, but there is always at least one partial credit: the set of all correct answers must be checked.(1)[4 pts] Which of the following are true for thek-nearest neighbor(k-NN) algorithm? A:k-NN can be used for both classification andregression. B: Askincreases, the bias usually increases.

C: The decision boundary looks smoother withsmaller values ofk. D: Askincreases, the variance usually increases.(2)[4 pts] LetXbe a matrix withsingular value decompositionX=U V>. Which of the following are true for allX? A: rank(X)=rank( ). B: If all the singular values are unique, then theSVD is unique. C: The first column ofVis an eigenvector ofX>X. D: The singular values and the eigenvalues ofX>Xare the is correct because the number of non-zero singular values is equal to the rank. B is incorrect because you could change bothU U,V V. C is correct because the SVD and eigendecomposition ofX>XisV 2V>. D is correct asX>Xis positivesemidefinite, so the eigenvalues can t be negative.

(3)[4 pts] Lasso (with a fictitious dimension), random forests, and principal component analysis (PCA)all.. A: can be used for dimensionality reduction or feature subset selection B: compute linear transformations of the input features C: are supervised Learning techniques D: aretranslation invariant: changing the origin of the coordinate system ( , translating all the training andtest data together) does not change the predictions or the principal component directionsOption B is incorrect because random forests don t compute linear transformations. Option C is incorrect because PCA isunsupervised.(4)[4 pts] Suppose your training set for two-class classification in one dimension (d=1;xi R) contains three samplepoints: pointx1=3 with labely1=1, pointx2=1 with labely2=1, and pointx3= 1 with labely3= 1.

What arethe values ofwandbgiven by ahard-margin SVM? A:w=1,b=1 B:w=0,b=1 C:w=1,b=0 D:w= ,b=0(5)[4 pts] Use the same training set as part (d). What is the value ofwandbgiven bylogistic regression(with no regular-ization)? A:w=1,b=1 B:w=0,b=1 C:w=1,b=0 D:w= ,b=0(6)[4 pts] Below are some choices you might make while training a neural network. Select all of the options that willgenerally make itmore difficultfor your network to achieve high accuracy on the test A: Initializing the weights to all zeros B: Normalizing the training data but leaving thetest data unchanged C: Using momentum D: Reshuffling the training data at the beginningof each epochA) Initializing weights with zeros makes it impossible to learn.

B) Mean and standard deviation should be computed on thetraining set and then used to standardize the validation and test sets, so that the distributions are matched for each set. C) Thisdescribes momentum and will generally help training. D) This is best (7)[4 pts] To the left of each graph below is a number. Select the choices for which the number is the multiplicity of theeigenvalue zero in theLaplacian matrixof the graph. A: 1 B: 1 C: 2 D: 4 The multiplicity is equal to the number of connected components in the graph.(8)[4 pts] Given the spectral graph clustering optimization problemFindythat minimizesy>Lysubject toy>y=nand1>y=0,which of the following optimization problems produce a vectorythat leads tothe same sweep cutas the optimizationproblem above?

Mis a diagonal mass matrix with different masses on the diagonal. A:Minimizey>Lysubject toy>y=1and1>y=0 B:Minimizey>Lysubject to i,yi=1 oryi= 1and1>y=0 C:Minimizey>Ly/(y>y)subject to1>y=0 D:Minimizey>Lysubject toy>My=1and1>My=0(9)[4 pts] Which of the following methods will cluster the data in panel (a) of the figure below into the two clusters (redcircle and blue horizontal line) shown in panel (b)? Every dot in the circle and the line is a data point. In all the optionsthat involve hierarchical clustering, the algorithm is run until we obtain two clusters.(a) Unclustered(b) Desired clustering A: Hierarchical agglomerative clustering withEuclidean distance and complete linkage B: Hierarchical agglomerative clustering withEuclidean distance and single linkage4 C: Hierarchical agglomerative clustering withEuclidean distance and centroid linkage D:k-means clustering withk=2 Single linkage uses the minimum distance between two clusters as a metric for merging clusters.

Since the two clusters aredensely packed with points and the minimum distance between the two clusters is greater than the within-cluster distancesbetween points, single linkage doesn t link the circle to the line until the very other three methods will all join some of the points at the left end of the line with the circle, before they are joined with theright end of the (10)[4 pts] Which of the following statement(s) aboutkernelsare true? A: The dimension of the lifted feature vectors ( ), whose inner products the kernel function computes, can beinfinite. B: For any desired lifting (x), we can design a kernel functionk(x,z) that will evaluate (x)> (z) more quicklythan explicitly computing (x) and (z).

Introduction to Machine Learning Final Exam

Tags:

Information

Transcription of Introduction to Machine Learning Final Exam

Related search queries

Introduction to Machine Learning Final Exam

Tags:

Information

Documents from same domain

Related documents

Related search queries