/* Beal Example AIC Beal, Dennis J. (2005),"SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria”, Proceedings, Southeast SAS Users Group Conference. A multivariate data set with 10 independent variables and one dependent variable was simulated from a known “true” model that is a linear function of a subset of the independent variables. The following SAS code simulates 10000 observations for these 10 independent X variables and one dependent Y variable. The 10 independent X variables come from normal, lognormal, exponential and uniform distributions with various means and variances. Variables X5, X6 and X9 are correlated with other variables. */ data a; do i = 1 to 10000; x1 = 10 + 5*rannor(0); * Normal(10, 25); x2 = exp(3*rannor(0)); * lognormal; x3 = 5 + 10*ranuni(0); * uniform; x4 = 100 + 50*rannor(0); * Normal(100, 2500); x5 = x1 + 3*rannor(0); * normal bimodal; x6 = 2*x2 + ranexp(0); * lognormal and exponential mixture; x7 = 0.5*exp(4*rannor(0)); * lognormal; x8 = 10 + 8*ranuni(0); * uniform; x9 = x2 + x8 + 2*rannor(0); * lognormal, uniform and normal mix; x10 = 200 + 90*rannor(0); * normal(200, 8100); y = 3*x2 - 4*x8 + 5*x9 + 3*rannor(0); * true model with no intercept term; output; end; proc reg data=a outest=est; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / selection=adjrsq sse aic ; output out=out p=p r=r; run; quit; proc reg data=a outest=est0; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / noint selection=adjrsq sse aic ; output out=out0 p=p r=r; run; quit; data estout; set est est0; run; proc sort data=estout; by _aic_; proc print data=estout(obs=8); run; /* The following SAS code performs the forward selection method by specifying the option selection=forward. The model diagnostics are output into the data set est1. */ proc reg data=a outest=est1; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / slstay=0.15 slentry=0.15 selection=forward ss2 sse aic; output out=out1 p=p r=r; run; quit; /*The following SAS code performs the backward elimination method by specifying the option selection=backward. The model diagnostics are output into the data set est2. */ proc reg data=a outest=est2; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / slstay=0.15 slentry=0.15 selection=backward ss2 sse aic; output out=out1 p=p r=r; run; quit; /*The following SAS code performs stepwise regression by specifying the option selection=stepwise. The model diagnostics are output into the data set est3. */ proc reg data=a outest=est3; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / slstay=0.15 slentry=0.15 selection=stepwise ss2 sse aic; output out=out3 p=p r=r; run; quit; /* The following SAS code calculates the RMSE for each possible subset model, sorts the models from smallest to largest RMSE and then prints the best 10 models. Specifying adjrsq in the option selection=adjrsq is not crucial since the goal is to minimize RMSE. Other choices for the selection option are rsquare or CP. The model diagnostics are output into the data sets est4 and est5. */ proc reg data=a outest=est4; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / selection=adjrsq sse aic adjrsq; output out=out p=p r=r; run; quit; proc reg data=a outest=est5; model y=x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / noint selection=adjrsq sse aic adjrsq; output out=out p=p r=r; run; quit; data both; set est4 est5; run; proc sort data=both; by _rmse_; run; proc print data=both(obs=10); run;