Transcription of The GLMSELECT Procedure - SAS
1 SAS/STAT User s GuideThe GLMSELECTP rocedureThis document is an individual chapter fromSAS/STAT User s correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. User s , NC: SAS Institute 2013, SAS Institute Inc., Cary, NC, USAAll rights reserved. Produced in the United States of a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or byany means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the timeyou acquire this scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher isillegal and punishable by law.
2 Please purchase only authorized electronic editions and do not participate in or encourage electronicpiracy of copyrighted materials. Your support of others rights is Government License Rights; Restricted Rights:The Software and its documentation is commercial computer softwaredeveloped at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication ordisclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, asapplicable, FAR , DFAR (a), DFAR (a) and DFAR and, to the extent required under law, the minimum restricted rights as set out in FAR (DEC 2007). If FAR is applicable, this provisionserves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation.
3 TheGovernment s rights in Software and documentation shall be only those set forth in this Institute Inc., SAS Campus Drive, Cary, North Carolina 2013 SAS provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. Formore information about our offerings, call and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in theUSA and other countries. indicates USA brand and product names are trademarks of their respective and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.
4 2013 SAS Institute Inc. All rights reserved. all that you need on your journey to knowledge and additional books and Greater Insight into Your SAS Software with SAS 47 The GLMSELECT ProcedureContentsOverview: GLMSELECT Procedure ..3706 Features ..3706 Getting Started: GLMSELECT Procedure ..3708 Syntax: GLMSELECT Procedure ..3716 PROC GLMSELECT Statement ..3716BY Statement ..3724 CLASS Statement ..3725 CODE Statement ..3729 EFFECT Statement ..3729 FREQ Statement ..3731 MODEL Statement ..3731 MODELAVERAGE Statement (Experimental) ..3741 OUTPUT Statement ..3744 PARTITION Statement ..3746 PERFORMANCE Statement ..3746 SCORE Statement ..3747 STORE Statement ..3748 WEIGHT Statement ..3748 Details: GLMSELECT Procedure .
5 3749 Model-Selection Methods ..3749 Full Model Fitted (NONE) ..3749 Forward Selection (FORWARD) ..3749 Backward Elimination (BACKWARD) ..3751 Stepwise Selection(STEPWISE) ..3752 Least Angle Regression (LAR) ..3754 Lasso Selection (LASSO) ..3755 Adaptive LASSO Selection ..3755 Elastic Net Selection (ELASTICNET) ..3756 Model Selection Issues ..3758 Criteria Used in Model Selection Methods ..3759 CLASS Variable Parameterization and the SPLIT Option ..3761 Macro Variables Containing Selected Models ..3763 Using the STORE Statement ..3766 Building the SSCP Matrix ..3767 Model Averaging ..3768 Using Validation and Test Data ..37693706 FChapter 47: The GLMSELECT ProcedureCross Validation ..3771 External Cross Validation ..3773 Displayed Output.
6 3776 ODS Table Names ..3781 ODS Graphics ..3782 Examples: GLMSELECT Procedure ..3789 Example : Modeling Baseball Salaries Using Performance Statistics ..3789 Example : Using Validation and Cross Validation ..3801 Example : Scatter Plot Smoothing by Selecting Spline Functions ..3819 Example : Multimember Effects and the Design Matrix ..3827 Example : Model Averaging ..3834 Example : Elastic Net and External Cross Validation ..3846 References ..3857 Overview: GLMSELECT ProcedureThe GLMSELECT Procedure performs effect selection in the framework of general linear models. Avariety of model selection methods are available, including the LASSO method of Tibshirani (1996) and therelated LAR method of Efron et al. (2004).
7 The Procedure offers extensive capabilities for customizing theselection with a wide variety of selection and stopping criteria, from traditional and computationally efficientsignificance-level-based criteria to more computationally intensive validation-based criteria. The procedurealso provides graphical summaries of the selection GLMSELECT Procedure compares most closely to REG and GLM. The REG Procedure supports avariety of model-selection methods but does not support a CLASS statement. The GLM Procedure supportsa CLASS statement but does not include effect selection methods. The GLMSELECT Procedure fillsthis gap. GLMSELECT focuses on the standard independently and identically distributed general linearmodel for univariate responses and offers great flexibility for and insight into the model selection provides results (displayed tables, output data sets, and macro variables) that make it easy totake the selected model and explore it in more detail in a subsequent Procedure such as REG or main features of the GLMSELECT Procedure are as follows.
8 Model Specification supports different parameterizations for classification effects supports any degree of interaction (crossed effects) and nested effects supports hierarchy among effects supports partitioning of data into training, validation, and testing roles supports constructed effects including spline and multimember effectsFeaturesF3707 Selection Control provides multiple effect selection methods enables selection from a very large number of effects (tens of thousands) offers selection of individual levels of classification effects provides effect selection based on a variety of selection criteria provides stopping rules based on a variety of model evaluation criteria provides leave-one-out,k-fold cross validation, andk-fold external cross validation supports data resampling and model averaging Display and Output produces graphical representation of selection process produces output data sets containing predicted values and residuals produces an output data set containing the design matrix produces macro variables containing selected models supports parallel processing of BY groups supports multiple SCORE statementsThe GLMSELECT Procedure supports the following effect selection methods.
9 For more information aboutthese methods, see the section Model-Selection Methods on page selectionstarts with no effects in the model and adds eliminationstarts with all effects in the model and deletes regressionis similar to forward selection except that effects already in the model do notnecessarily stay angle regression (LAR)is similar to forward selection in that it starts with no effects in the modeland adds effects. The parameter estimates at any step are shrunk whencompared to the corresponding least squares and deletes parameters based on a version of ordinary least squareswhere the sum of the absolute regression coefficients is netis an extension of LASSO that estimates parameters based on a version ofordinary least squares in which both the sum of the absolute regression coef-ficients and the sum of the squared regression coefficients are GLMSELECT also supports hybrid versions of the LAR and LASSO methods.
10 They use LAR andLASSO to select the model but then estimate the regression coefficients by ordinary weighted least GLMSELECT Procedure is intended primarily as a model selection Procedure and does not includeregression diagnostics or other postselection facilities such as hypothesis testing, testing of contrasts, andLS-means analyses. The intention is that you use PROC GLMSELECT to select a model or a set of candidatemodels. Further investigation of these models can be done by using these models in existing 47: The GLMSELECT ProcedureGetting Started: GLMSELECT set contains salary and performance information for Major League Baseballplayers who played at least one game in both the 1986 and 1987 seasons, excluding pitchers. The salaries(Sports Illustrated,April 20, 1987) are for the 1987 season and the performance measures are from 1986(Collier Books,The 1987 Baseball Encyclopedia Update).